-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
推理时使用多卡会报错 #294
Comments
使用的是src/cli_demo.py |
我也遇到了这个问题 |
请更新代码后重试 |
今天更新了代码, 在使用 accelerate launch 多卡推理的时候, 加载模型完成后, 直接死机... 尝试多次都一样 |
@Benstime 多机推理请不要使用 Accelerate |
这边是多卡而不是多机. |
@Benstime 多卡推理同样使用 python 启动,24G 理论上可以运行 13B |
单卡 llama2-7b 是可以推理的, 单卡 llama2-13b加载模型过程中报显存溢出 这种情况可以启动, 但推理出现这样的错误: ===================问题2================================ |
无论是直接使用
CUDA_VISIBLE_DEVICES=x,x python xxx
还是使用accelerate launch
来运行,均无法使用多卡推理,错误如下:Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1
The text was updated successfully, but these errors were encountered: