-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Medusa performance degrades with batch size larger than 1 #2482
Labels
Performance
Issue about performance number
Comments
@SoundProvider could you tell me the method of your performance evaluations? |
@hello-11 hello.
I just measured the
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I'm trying to use medusa with trt-llm, referencing this page
It's working fine with vicuna 7B and its medusa heads, as reference in the example page.
In the example, it's stated that
Note: Increasing the batch size may have a negative impact on performance
My understanding is that, when the batch size increases, each sequence should wait for the other sequences to reach its position, resulting performance degradation.
But when I tested with vicuna 7B, the performance still dropped with 4 batch, each sequence using the same input. This is contradicting from my understanding.
I tested batch size variation with same inputs(4batch with same inputs)
What would be the reason?? It would be really nice if someone could explain.
Thank you
The text was updated successfully, but these errors were encountered: