Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

有没有int4量化版,int4量化版推理需要多少什么显卡配置 #244

Open
bbeyondllove opened this issue Jan 8, 2025 · 8 comments

Comments

@bbeyondllove
Copy link

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

@YMMF007
Copy link

YMMF007 commented Jan 9, 2025

https://modelscope.cn/models/OPEA/DeepSeek-V3-int4-sym-gptq-inc/summary
7*80G

@bbeyondllove
Copy link
Author

https://modelscope.cn/models/OPEA/DeepSeek-V3-int4-sym-gptq-inc/summary 7*80G

谢谢。那需要7张A100显卡了,成本比较高

@intelyoungway
Copy link

GPTQ可能会和calibrate数据集相关,感觉还是AWQ的好一些?
https://modelscope.cn/models/OPEA/DeepSeek-V3-int4-sym-awq-inc-cpu

@yishibakaien
Copy link

https://modelscope.cn/models/OPEA/DeepSeek-V3-int4-sym-gptq-inc/summary 7*80G

谢谢。那需要7张A100显卡了,成本比较高

英伟达 新出的 project digits 128g 通用 vram,4 台可以部署,单价 $3000

@bbeyondllove
Copy link
Author

GPTQ可能会和calibrate数据集相关,感觉还是AWQ的好一些? https://modelscope.cn/models/OPEA/DeepSeek-V3-int4-sym-awq-inc-cpu

好的,谢谢

@bbeyondllove
Copy link
Author

https://modelscope.cn/models/OPEA/DeepSeek-V3-int4-sym-gptq-inc/summary 7*80G

谢谢。那需要7张A100显卡了,成本比较高

英伟达 新出的 project digits 128g 通用 vram,4 台可以部署,单价 $3000

如果用v2的量化版要求是不是低一点

@chenatu
Copy link

chenatu commented Jan 14, 2025

@YMMF007 请问这个配置下推理速度怎么样,单token耗时大概多少

@YMMF007
Copy link

YMMF007 commented Jan 14, 2025

@YMMF007 请问这个配置下推理速度怎么样,单token耗时大概多少

我这边显卡不够,就没跑了

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants