Constrained generation support #235

philpax · 2023-05-17T07:12:06Z

This is an open-ended issue; I expect there will be more than one solution to this.

There have been a couple of solutions for constraining the output of generations:

The idea's pretty simple: the user supplies some kind of schema, and then generation is forced to match that schema by only sampling/feeding the tokens that fit that schema. jsonformer is a good place to look for this: it will feed in the JSON structure up to the point where the LLM should generate something, and then samples only the tokens that would be valid in that context.

Given that there are many potential ways to solve this problem and potential output formats, I'm not sure we should bake in one particular solution. My feeling is that we should offer additional crates for this kind of work, but not bake it into llm specifically.

An example might be a llm-json crate, which extends InferenceSession with a trait that takes any serde-able type and produces structured output:

#[derive(Serialize, Deserialize)]
struct Steps {
    steps: Vec<String>,
}

let steps = session.infer_json::<Steps>(/* ... */, format!("The following paragraph describes a process.\n{{paragraph}}\nPlease transcode it to JSON using the following schema: [[SCHEMA]]"))?;

dbg!(steps.steps);

This could also potentially live in llm-chain (and might be better suited to there), but I'm not sure if their abstraction allows for controlled sampling like this. Would need to chat to them.

The text was updated successfully, but these errors were encountered:

philpax · 2023-06-17T14:00:11Z

It should be possible to implement this on top of our existing sampler abstraction. For the JSON case, we set up a JSON parsing state machine and only sample tokens that would be valid to sample from the current parser state.

In the long run I would like to have a port of Guidance or similar as it's much more general, but I'm not sure how much work would be involved there.

michael-dm · 2023-08-01T08:20:56Z

Hello @philpax, have you seen ggerganov/llama.cpp#1773 on this topic ?

philpax · 2023-08-06T20:06:23Z

Hi there! Yes, I have - it's quite impressive, but it's quite specific to llama.cpp's needs. With #359 landing soon, we'll have a modular sampling solution where these kinds of constraints can hopefully be defined in a reusable fashion.

Reichenbachian · 2023-11-11T21:19:45Z

Has there been any recent movement on this? I'm hoping to do some constrained JSON generation using this crate. Is this the right package to push for this or should I be looking at the llm-samplers crate?

philpax · 2023-11-13T00:16:54Z

Hi there! Unfortunately, it's not a priority; our current focus is on catching up to llama.cpp and the rest of the ecosystem. You may be able to implement this yourself; @KerfuffleV2 may also have some ideas as to how to implement this with llm-samplers.

KerfuffleV2 · 2023-11-13T01:02:47Z

This might help you: KerfuffleV2/llm-samplers#7 (comment) (See the third item.)

Note that I didn't really look at it closely so I can't explain it or anything. I do hope to have something like that in llm-samplers eventually but it doesn't currently exist. One thing that kind of has to happen first is a resource system overhaul.

If you want to try to implement it yourself (as a standalone thing or a Sampler in llm-samplers) probably the simplest way is to have some kind of parser and then just ban every token that doesn't match the parser's current state. I believe this is basically how llama.cpp's grammar sampler works also. Then once you've banned everything that doesn't conform to the grammar, you can let the normal samplers run.

philpax added the issue:enhancement New feature or request label May 17, 2023

spion mentioned this issue May 22, 2023

Lower level API? #267

Closed

philpax mentioned this issue Jun 18, 2023

Implement jsonformer #314

Closed

philpax added the topic:api-design API design considerations, including new functionality and changes label Jun 19, 2023

ealmloff mentioned this issue Oct 29, 2023

More efficient softmax implementation KerfuffleV2/llm-samplers#7

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Constrained generation support #235

Constrained generation support #235

philpax commented May 17, 2023 •

edited

Loading

philpax commented Jun 17, 2023

michael-dm commented Aug 1, 2023

philpax commented Aug 6, 2023

Reichenbachian commented Nov 11, 2023

philpax commented Nov 13, 2023

KerfuffleV2 commented Nov 13, 2023

Constrained generation support #235

Constrained generation support #235

Comments

philpax commented May 17, 2023 • edited Loading

philpax commented Jun 17, 2023

michael-dm commented Aug 1, 2023

philpax commented Aug 6, 2023

Reichenbachian commented Nov 11, 2023

philpax commented Nov 13, 2023

KerfuffleV2 commented Nov 13, 2023

philpax commented May 17, 2023 •

edited

Loading