-
Notifications
You must be signed in to change notification settings - Fork 176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Issue serving Mixtral 8x7B on H100 #443
Comments
I ran into the same problem with V100s with exact same error output. Was fixed when I switched to A100s |
Yea - can confirm this works on A100, but not on H100 |
Thanks for reporting this. It seems there was a bug introduced in the latest release when we added FP6 quantization support. I will investigate and fix the bug. Thank you! |
@JamesTheZ may know about this. |
Seems because the current implementation only compiles |
I'm encountering an issue with I've attempted using multiple versions of deepspeed-mii (0.2.1, 0.2.2, and 0.2.3), as well as different versions of PyTorch (2.2.1, 2.1.2, and 2.1.0), but none of these combinations seem to work. Additionally, even went as far as compiling directly from the source, but unfortunately, I haven't had any success. Is anyone else experiencing the same issue or has any suggestions on how to resolve it?
|
Downgrading to this will work: |
Refine the guards of FP6 kernel compilation. Fix the `undefined symbol` problem of FP6 kernels on non-Ampere architectures. Related issue: deepspeedai/DeepSpeed-MII#443. --------- Co-authored-by: Logan Adams <[email protected]> Co-authored-by: Michael Wyatt <[email protected]>
…dai#5333) Refine the guards of FP6 kernel compilation. Fix the `undefined symbol` problem of FP6 kernels on non-Ampere architectures. Related issue: deepspeedai/DeepSpeed-MII#443. --------- Co-authored-by: Logan Adams <[email protected]> Co-authored-by: Michael Wyatt <[email protected]>
…dai#5333) Refine the guards of FP6 kernel compilation. Fix the `undefined symbol` problem of FP6 kernels on non-Ampere architectures. Related issue: deepspeedai/DeepSpeed-MII#443. --------- Co-authored-by: Logan Adams <[email protected]> Co-authored-by: Michael Wyatt <[email protected]>
…dai#5333) Refine the guards of FP6 kernel compilation. Fix the `undefined symbol` problem of FP6 kernels on non-Ampere architectures. Related issue: deepspeedai/DeepSpeed-MII#443. --------- Co-authored-by: Logan Adams <[email protected]> Co-authored-by: Michael Wyatt <[email protected]>
Any update on this issue? |
I found that it's an issue from the upstream FasterTransformer, check these lines. But faster transformer is already migrated to TensorRT-LLM, which indeed have an implementation under sm_90. Do you have a plan to solve it? Or is PR welcomed? |
Running into issues when serving Mixtral 8x7B on 4 x H100 (TP=4) with deepspeed-mii v0.2.3 with all other arguments default in the base image from nvidia
nvidia/cuda:12.3.1-devel-ubuntu22.04
The traces showed
There's also a warning:
FP6 quantization kernel is only supported on Ampere architectures
, but I did not specify quantization when launching the server. Seems like there's an unused kernel getting imported but it's not registered on Grace Hopper devices.When I downgrade to v0.2.2, I ran into the following error
The text was updated successfully, but these errors were encountered: