Skip to content
This repository has been archived by the owner on Jun 24, 2024. It is now read-only.

Constrained generation support #235

Open
philpax opened this issue May 17, 2023 · 6 comments
Open

Constrained generation support #235

philpax opened this issue May 17, 2023 · 6 comments
Labels
issue:enhancement New feature or request topic:api-design API design considerations, including new functionality and changes

Comments

@philpax
Copy link
Collaborator

philpax commented May 17, 2023

This is an open-ended issue; I expect there will be more than one solution to this.

There have been a couple of solutions for constraining the output of generations:

The idea's pretty simple: the user supplies some kind of schema, and then generation is forced to match that schema by only sampling/feeding the tokens that fit that schema. jsonformer is a good place to look for this: it will feed in the JSON structure up to the point where the LLM should generate something, and then samples only the tokens that would be valid in that context.

Given that there are many potential ways to solve this problem and potential output formats, I'm not sure we should bake in one particular solution. My feeling is that we should offer additional crates for this kind of work, but not bake it into llm specifically.

An example might be a llm-json crate, which extends InferenceSession with a trait that takes any serde-able type and produces structured output:

#[derive(Serialize, Deserialize)]
struct Steps {
    steps: Vec<String>,
}

let steps = session.infer_json::<Steps>(/* ... */, format!("The following paragraph describes a process.\n{{paragraph}}\nPlease transcode it to JSON using the following schema: [[SCHEMA]]"))?;

dbg!(steps.steps);

This could also potentially live in llm-chain (and might be better suited to there), but I'm not sure if their abstraction allows for controlled sampling like this. Would need to chat to them.

@philpax philpax added the issue:enhancement New feature or request label May 17, 2023
@spion spion mentioned this issue May 22, 2023
@philpax
Copy link
Collaborator Author

philpax commented Jun 17, 2023

It should be possible to implement this on top of our existing sampler abstraction. For the JSON case, we set up a JSON parsing state machine and only sample tokens that would be valid to sample from the current parser state.

In the long run I would like to have a port of Guidance or similar as it's much more general, but I'm not sure how much work would be involved there.

@philpax philpax added the topic:api-design API design considerations, including new functionality and changes label Jun 19, 2023
@michael-dm
Copy link

Hello @philpax, have you seen ggerganov/llama.cpp#1773 on this topic ?

@philpax
Copy link
Collaborator Author

philpax commented Aug 6, 2023

Hi there! Yes, I have - it's quite impressive, but it's quite specific to llama.cpp's needs. With #359 landing soon, we'll have a modular sampling solution where these kinds of constraints can hopefully be defined in a reusable fashion.

@Reichenbachian
Copy link

Has there been any recent movement on this? I'm hoping to do some constrained JSON generation using this crate. Is this the right package to push for this or should I be looking at the llm-samplers crate?

@philpax
Copy link
Collaborator Author

philpax commented Nov 13, 2023

Hi there! Unfortunately, it's not a priority; our current focus is on catching up to llama.cpp and the rest of the ecosystem. You may be able to implement this yourself; @KerfuffleV2 may also have some ideas as to how to implement this with llm-samplers.

@KerfuffleV2
Copy link
Contributor

This might help you: KerfuffleV2/llm-samplers#7 (comment) (See the third item.)

Note that I didn't really look at it closely so I can't explain it or anything. I do hope to have something like that in llm-samplers eventually but it doesn't currently exist. One thing that kind of has to happen first is a resource system overhaul.

If you want to try to implement it yourself (as a standalone thing or a Sampler in llm-samplers) probably the simplest way is to have some kind of parser and then just ban every token that doesn't match the parser's current state. I believe this is basically how llama.cpp's grammar sampler works also. Then once you've banned everything that doesn't conform to the grammar, you can let the normal samplers run.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
issue:enhancement New feature or request topic:api-design API design considerations, including new functionality and changes
Projects
None yet
Development

No branches or pull requests

4 participants