-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mixtral 8x7-v0.1 Hangs after serving a few requests #457
Comments
@aaditya-srivathsan We are reviewing this ticket will get back to with updates. |
@ganeshku1 any update on this? |
@aaditya-srivathsan We are working on resolving this issue. Will update this thread once this issue is resolved. cc: @dyastremsky |
Hi @aaditya-srivathsan, I've seen some similar issues reported that were solved by setting Can you try this to see if it helps? |
Sure let me try this and ill let you know if this works or not |
This did help thank you very much! |
System Info
A100 160GB(2*80)
Who can help?
@byshiue @kaiyux
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Build for source by cloning the main on tensorrtllm_backend
Download weights from HF
Set Directory and generate engines
Then Start your triton server like so
Finally in a separate terminal
Expected behavior
The expected behavior should be getting thoughput and latency numbers
actual behavior
Command just hangs and doesnt return anything
additional notes
I wrote a custom script which uses gprc over tritonclient to send synchronous requests. Initially it completes the request in 8seconds but after 40 such requests it just hangs.
The tritonserver logs after verbosity are like this
And never returns a response back and just hands
Quantization to int 4 doesnt help either
The text was updated successfully, but these errors were encountered: