-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comparison with LookAhead #2
Comments
Thank you for the compliment!
TBH I don't yet understand lookahead decoding completely so can't comment here
Are you suggesting this for the draft model or main model? This might help in making draft tokens faster, but I feel this won't give good results since prev token is probably very important when predicting the next token. Medusa requires some training to be able to do this https://github.com/FasterDecoding/Medusa |
Yeah, I'm not sure it would work, but may be worth a try. I think that guessing the previous token randomly is pretty bad because token prediction depends so much on the previous one. However, if a null embedding (and/or attention mask) is placed on token i, there may be some way of getting a reasonable estimate of token i+1. But yea, the prediction may still be too bad. Medusa is a cool concept, but it's really annoying to have to train the in-built draft model. |
If someone can figure out a 'training free Medusa', that's probably a million dollar idea 😸 |
AttributeError: 'MistralForCausalLM' object has no attribute '_extend_attention_mask' |
This is a cool project.
I guess you're using the prompt for look ahead, but could also pull in some future guess tokens as well into the ngram look up table. Maybe as LookaheadDecoding is doing?
I was also thinking that it should be possible to use an LLM to predict forward tokens just by passing blank (zero embedding vectors) for a few positions ahead. See more here
The text was updated successfully, but these errors were encountered: