-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG]使用TensorRT-llm 的Deepseek分支 部署4bit weight only的deepseekV3回答乱码 #272
Comments
谢谢,不过这个问题恐怕得 trtllm 来解决 |
@mowentian 我昨天看到了NV的回复, 表示这是已知的DeepSeekV3在INT4/INT8量化的问题. 可能trtllm上的示例不太合适, 会浪费很多时间.
Originally posted by @nv-guomingz in #2683 |
大佬请教一下,你是怎么安装的trt的deepseek分支呢,我看如果沿着源码安装思路走链路巨长,请问你是怎么配的呀 |
就是巨长, 浪费了我很多时间在上面, 跑出乱码整个人都不好了,哈哈哈. |
Describe the bug
使用TensorRT-llm 的Deepseek分支 部署4bit weight only的deepseekV3回答乱码
To Reproduce
我参考DeepseekV3readme文件的描述使用了如下的引导获得了4bit weight only版本的引擎文件(先转bf16再量化):
https://github.com/NVIDIA/TensorRT-LLM/tree/deepseek/examples/deepseek_v3
但是转换出的模型是乱码的, 在TensorRT llm的issue中也看到了类似的问题, 请问大家有尝试过这个路线来部署么?
Expected behavior
正常输出结果, 小幅降低精度.
Screenshots
The text was updated successfully, but these errors were encountered: