Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]使用TensorRT-llm 的Deepseek分支 部署4bit weight only的deepseekV3回答乱码 #272

Open
Songyanfei opened this issue Jan 14, 2025 · 5 comments

Comments

@Songyanfei
Copy link

Describe the bug
使用TensorRT-llm 的Deepseek分支 部署4bit weight only的deepseekV3回答乱码

To Reproduce
我参考DeepseekV3readme文件的描述使用了如下的引导获得了4bit weight only版本的引擎文件(先转bf16再量化):
https://github.com/NVIDIA/TensorRT-LLM/tree/deepseek/examples/deepseek_v3
但是转换出的模型是乱码的, 在TensorRT llm的issue中也看到了类似的问题, 请问大家有尝试过这个路线来部署么?
image

Expected behavior
正常输出结果, 小幅降低精度.

Screenshots
image
image

@Songyanfei
Copy link
Author

补充一个使用测试脚本的结果:
mpirun --allow-run-as-root -np 8 python3 ../run.py --input_text "Today is a nice day." \ --max_output_len 30 \ --tokenizer_dir /data-123/syf/DeepSeekV3-trtllm_engine_8gpu_W4A16 \ --engine_dir /data-123/syf/DeepSeekV3-trtllm_engine_8gpu_W4A16 \ --top_p 0.95 \ --temperature 0.3
image

@mowentian
Copy link
Contributor

谢谢,不过这个问题恐怕得 trtllm 来解决

@Songyanfei
Copy link
Author

@mowentian 我昨天看到了NV的回复, 表示这是已知的DeepSeekV3在INT4/INT8量化的问题. 可能trtllm上的示例不太合适, 会浪费很多时间.

Hi @handoku it's a known issue for deepseek-v3 int4/int8 quantization. Since the Deepseek-v3 didn't publish the int4/int8 metrics yet, we don't recommend quantize the deepseek-v3 with non-fp8 recipe at this moment.

Originally posted by @nv-guomingz in #2683

@Harley-ZP
Copy link

大佬请教一下,你是怎么安装的trt的deepseek分支呢,我看如果沿着源码安装思路走链路巨长,请问你是怎么配的呀

@Songyanfei
Copy link
Author

大佬请教一下,你是怎么安装的trt的deepseek分支呢,我看如果沿着源码安装思路走链路巨长,请问你是怎么配的呀

就是巨长, 浪费了我很多时间在上面, 跑出乱码整个人都不好了,哈哈哈.
你可以试试编译docker的方式,相对来说会容易一些.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants