-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deepseek-v3 int4 weight only inference outputs garbage words with TP 8 on nvidia H20 GPU #2683
Comments
Hi @handoku thanks for reporting this issue, we'll take a look firstly. |
I encountered the same issue when using the 4-bit weight-only version of DeepSeekV3. |
Hi @handoku it's a known issue for deepseek-v3 int4/int8 quantization. Since the Deepseek-v3 didn't publish the int4/int8 metrics yet, we don't recommend quantize the deepseek-v3 with non-fp8 recipe at this moment. |
Too bad. Trtllm currently doesn't support deepseek-v3's fp8 inference either. Thus, we can't do inference but in bf16/fp16 precision? This model is too large, we can't afford to run it in fp16. And trtllm_backend multi-node serving is not that convenient... |
I came across this while following the DeepSeekV3 documentation. Based on the description in the documentation, it appears to support INT8/INT4 quantized inference. However, in practice, it turns out to be completely unusable. The memory consumption in FP16 is entirely unaffordable, making the whole situation quite tricky. |
we're going to support fp8 inference soon. |
I am using sglang to serve deepseek-v3 for now. Though it support fp8 and MLA optimization, it still need two 8-gpu nodes to do inference. I was hopping trtllm-int4 can save resource and improve throughput. |
Exactly! INT4 is particularly appealing since it allows running on a single node, and A100/A800 GPUs don’t support the FP8 data format. This makes INT4 a great choice, especially for MoE models. |
HI, how did you build and install the deepseek branch of trt? |
@Harley-ZP Find a computer connected to the Internet, and try pulling docker images from ngc, then build trtllm wheel with this cmd. Otherwise, installing dependencies sometimes can really be tricky... |
I built and installed trtllm using deepseek branch. Following doc, I got a int4 weight only engine.
However, example run outputs garbage words:
The text was updated successfully, but these errors were encountered: