Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

推理时使用多卡会报错 #294

Closed
MattMaBX opened this issue Aug 1, 2023 · 8 comments
Closed

推理时使用多卡会报错 #294

MattMaBX opened this issue Aug 1, 2023 · 8 comments
Labels
solved This problem has been already solved

Comments

@MattMaBX
Copy link

MattMaBX commented Aug 1, 2023

无论是直接使用CUDA_VISIBLE_DEVICES=x,x python xxx还是使用accelerate launch来运行,均无法使用多卡推理,错误如下:
Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1

@MattMaBX
Copy link
Author

MattMaBX commented Aug 1, 2023

使用的是src/cli_demo.py

@hiyouga hiyouga added the pending This problem is yet to be addressed label Aug 1, 2023
@laputa-cici
Copy link

我也遇到了这个问题

@hiyouga hiyouga closed this as completed in e6a3894 Aug 1, 2023
@hiyouga
Copy link
Owner

hiyouga commented Aug 1, 2023

请更新代码后重试

@hiyouga hiyouga reopened this Aug 1, 2023
@microbenh
Copy link

今天更新了代码, 在使用 accelerate launch 多卡推理的时候, 加载模型完成后, 直接死机... 尝试多次都一样

@hiyouga
Copy link
Owner

hiyouga commented Aug 2, 2023

@Benstime 多机推理请不要使用 Accelerate

@microbenh
Copy link

@Benstime 多机推理请不要使用 Accelerate

这边是多卡而不是多机.
单卡(4090 24G显存) 运行不了 llama2-13b的模型, 这么说多卡也没法解决这个问题?

@hiyouga
Copy link
Owner

hiyouga commented Aug 2, 2023

@Benstime 多卡推理同样使用 python 启动,24G 理论上可以运行 13B

@microbenh
Copy link

单卡 llama2-7b 是可以推理的, 单卡 llama2-13b加载模型过程中报显存溢出
由于llama2-13b 无法加载, 先用 llama2-7b测试多卡推理
=======================问题1============================
系统: ubuntu
模型:llama2-7b
机器: 4*4090, 128G内存
启动方式: python web_demo.py
可见gpu: CUDA_VISIBLE_DEVICES=0,1,2,3

这种情况可以启动, 但推理出现这样的错误:
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

===================问题2================================
使用上面的配置, 多卡加载 llama2-13b, 直接死机

@hiyouga hiyouga added solved This problem has been already solved and removed pending This problem is yet to be addressed labels Aug 11, 2023
@hiyouga hiyouga closed this as completed Aug 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
solved This problem has been already solved
Projects
None yet
Development

No branches or pull requests

4 participants