Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

在使用llamafactory-cli api加载qwen2-72B-Instruct-AWQ出现CUDA out of memory #4326

Closed
1 task done
ToviHe opened this issue Jun 17, 2024 · 1 comment
Closed
1 task done
Labels
solved This problem has been already solved

Comments

@ToviHe
Copy link

ToviHe commented Jun 17, 2024

Reminder

  • I have read the README and searched the existing issues.

System Info

image

跑在内网环境,暂时只能通过截图

Reproduction

image

我看了下Qwen2官网提供的显存数据,确实需要40G以上,是AWQ量化版本不支持多卡推理吗?
image

目前机器是有四张A100 40G显卡,容器启动只指定了两张,现在运行情况看,只用了其中一张(因为显存不够,直接终止程序),另一张显卡压根没有使用到

image

Expected behavior

希望能提供参数支持AWQ量化模型或正常模型成功推理运行。现在翻了issue和文档没看见怎么指定多卡推理的

Others

No response

@github-actions github-actions bot added the pending This problem is yet to be addressed label Jun 17, 2024
@hiyouga hiyouga added solved This problem has been already solved and removed pending This problem is yet to be addressed labels Jun 17, 2024
@hiyouga
Copy link
Owner

hiyouga commented Jun 17, 2024

fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
solved This problem has been already solved
Projects
None yet
Development

No branches or pull requests

2 participants