-
Notifications
You must be signed in to change notification settings - Fork 371
Constrained generation support #235
Comments
It should be possible to implement this on top of our existing sampler abstraction. For the JSON case, we set up a JSON parsing state machine and only sample tokens that would be valid to sample from the current parser state. In the long run I would like to have a port of Guidance or similar as it's much more general, but I'm not sure how much work would be involved there. |
Hello @philpax, have you seen ggerganov/llama.cpp#1773 on this topic ? |
Hi there! Yes, I have - it's quite impressive, but it's quite specific to llama.cpp's needs. With #359 landing soon, we'll have a modular sampling solution where these kinds of constraints can hopefully be defined in a reusable fashion. |
Has there been any recent movement on this? I'm hoping to do some constrained JSON generation using this crate. Is this the right package to push for this or should I be looking at the llm-samplers crate? |
Hi there! Unfortunately, it's not a priority; our current focus is on catching up to llama.cpp and the rest of the ecosystem. You may be able to implement this yourself; @KerfuffleV2 may also have some ideas as to how to implement this with |
This might help you: KerfuffleV2/llm-samplers#7 (comment) (See the third item.) Note that I didn't really look at it closely so I can't explain it or anything. I do hope to have something like that in If you want to try to implement it yourself (as a standalone thing or a |
This is an open-ended issue; I expect there will be more than one solution to this.
There have been a couple of solutions for constraining the output of generations:
The idea's pretty simple: the user supplies some kind of schema, and then generation is forced to match that schema by only sampling/feeding the tokens that fit that schema.
jsonformer
is a good place to look for this: it will feed in the JSON structure up to the point where the LLM should generate something, and then samples only the tokens that would be valid in that context.Given that there are many potential ways to solve this problem and potential output formats, I'm not sure we should bake in one particular solution. My feeling is that we should offer additional crates for this kind of work, but not bake it into
llm
specifically.An example might be a
llm-json
crate, which extendsInferenceSession
with a trait that takes anyserde
-able type and produces structured output:This could also potentially live in
llm-chain
(and might be better suited to there), but I'm not sure if their abstraction allows for controlled sampling like this. Would need to chat to them.The text was updated successfully, but these errors were encountered: