Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comparison with LookAhead #2

Open
RonanKMcGovern opened this issue Dec 20, 2023 · 4 comments
Open

Comparison with LookAhead #2

RonanKMcGovern opened this issue Dec 20, 2023 · 4 comments

Comments

@RonanKMcGovern
Copy link

This is a cool project.

I guess you're using the prompt for look ahead, but could also pull in some future guess tokens as well into the ngram look up table. Maybe as LookaheadDecoding is doing?

I was also thinking that it should be possible to use an LLM to predict forward tokens just by passing blank (zero embedding vectors) for a few positions ahead. See more here

@apoorvumang
Copy link
Owner

Thank you for the compliment!

I guess you're using the prompt for look ahead, but could also pull in some future guess tokens as well into the ngram look up table. Maybe as LookaheadDecoding is doing?

TBH I don't yet understand lookahead decoding completely so can't comment here

I was also thinking that it should be possible to use an LLM to predict forward tokens just by passing blank (zero embedding vectors) for a few positions ahead. hao-ai-lab/LookaheadDecoding#37

Are you suggesting this for the draft model or main model? This might help in making draft tokens faster, but I feel this won't give good results since prev token is probably very important when predicting the next token. Medusa requires some training to be able to do this https://github.com/FasterDecoding/Medusa

@RonanKMcGovern
Copy link
Author

This might help in making draft tokens faster, but I feel this won't give good results since prev token is probably very important when predicting the next token.

Yeah, I'm not sure it would work, but may be worth a try. I think that guessing the previous token randomly is pretty bad because token prediction depends so much on the previous one. However, if a null embedding (and/or attention mask) is placed on token i, there may be some way of getting a reasonable estimate of token i+1. But yea, the prediction may still be too bad.

Medusa is a cool concept, but it's really annoying to have to train the in-built draft model.

@apoorvumang
Copy link
Owner

If someone can figure out a 'training free Medusa', that's probably a million dollar idea 😸

@riyaj8888
Copy link

AttributeError: 'MistralForCausalLM' object has no attribute '_extend_attention_mask'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants