在使用llamafactory-cli api加载qwen2-72B-Instruct-AWQ出现CUDA out of memory #4326

ToviHe · 2024-06-17T06:45:39Z

跑在内网环境，暂时只能通过截图

我看了下Qwen2官网提供的显存数据，确实需要40G以上，是AWQ量化版本不支持多卡推理吗？

目前机器是有四张A100 40G显卡，容器启动只指定了两张，现在运行情况看，只用了其中一张（因为显存不够，直接终止程序），另一张显卡压根没有使用到

希望能提供参数支持AWQ量化模型或正常模型成功推理运行。现在翻了issue和文档没看见怎么指定多卡推理的

No response

hiyouga · 2024-06-17T10:18:16Z

fixed

github-actions bot added the pending This problem is yet to be addressed label Jun 17, 2024

hiyouga closed this as completed in e2665e7 Jun 17, 2024

hiyouga added solved This problem has been already solved and removed pending This problem is yet to be addressed labels Jun 17, 2024

Provide feedback