We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
System: Ubuntu 20.04.2 LTS GPU: NVIDIA A100-SXM4-80GB Docker: 24.0.0 Docker Compose: v2.17.3 llamafactory: 0.8.2.dev0 vllm: 0.5.1
Dockerfile: https://github.com/hiyouga/LLaMA-Factory/blob/67040f149c0b3fbae443ba656ed0dcab0ebaf730/docker/docker-cuda/Dockerfile
Build Command:
docker build -f ./Dockerfile \ --build-arg INSTALL_BNB=true \ --build-arg INSTALL_VLLM=true \ --build-arg INSTALL_DEEPSPEED=true \ --build-arg INSTALL_FLASHATTN=true \ --build-arg PIP_INDEX=https://pypi.tuna.tsinghua.edu.cn/simple \ -t llamafactory:latest .
Launch Command:
docker run -dit --gpus=all \ -v ./hf_cache:/root/.cache/huggingface \ -v ./ms_cache:/root/.cache/modelscope \ -v ./data:/app/data \ -v ./output:/app/output \ -p 7860:7860 \ -p 8000:8000 \ --shm-size 16G \ --name llamafactory \ llamafactory:latest docker exec -it llamafactory bash llamafactory-cli webui
The error below occurs when loading Qwen2-7B-Instruct in the chat tab of webui using vllm with multi-gpu:
(VllmWorkerProcess pid=263) Process VllmWorkerProcess: (VllmWorkerProcess pid=263) Traceback (most recent call last): (VllmWorkerProcess pid=263) File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap (VllmWorkerProcess pid=263) self.run() (VllmWorkerProcess pid=263) File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run (VllmWorkerProcess pid=263) self._target(*self._args, **self._kwargs) (VllmWorkerProcess pid=263) File "/usr/local/lib/python3.10/dist-packages/vllm/executor/multiproc_worker_utils.py", line 210, in _run_worker_process (VllmWorkerProcess pid=263) worker = worker_factory() (VllmWorkerProcess pid=263) File "/usr/local/lib/python3.10/dist-packages/vllm/executor/gpu_executor.py", line 68, in _create_worker (VllmWorkerProcess pid=263) wrapper.init_worker(**self._get_worker_kwargs(local_rank, rank, (VllmWorkerProcess pid=263) File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker_base.py", line 334, in init_worker (VllmWorkerProcess pid=263) self.worker = worker_class(*args, **kwargs) (VllmWorkerProcess pid=263) File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py", line 85, in __init__ (VllmWorkerProcess pid=263) self.model_runner: GPUModelRunnerBase = ModelRunnerClass( (VllmWorkerProcess pid=263) File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 217, in __init__ (VllmWorkerProcess pid=263) self.attn_backend = get_attn_backend( (VllmWorkerProcess pid=263) File "/usr/local/lib/python3.10/dist-packages/vllm/attention/selector.py", line 45, in get_attn_backend (VllmWorkerProcess pid=263) backend = which_attn_to_use(num_heads, head_size, num_kv_heads, (VllmWorkerProcess pid=263) File "/usr/local/lib/python3.10/dist-packages/vllm/attention/selector.py", line 151, in which_attn_to_use (VllmWorkerProcess pid=263) if torch.cuda.get_device_capability()[0] < 8: (VllmWorkerProcess pid=263) File "/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py", line 430, in get_device_capability (VllmWorkerProcess pid=263) prop = get_device_properties(device) (VllmWorkerProcess pid=263) File "/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py", line 444, in get_device_properties (VllmWorkerProcess pid=263) _lazy_init() # will define _get_device_properties (VllmWorkerProcess pid=263) File "/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py", line 279, in _lazy_init (VllmWorkerProcess pid=263) raise RuntimeError( (VllmWorkerProcess pid=263) RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method ERROR 07-11 13:53:53 multiproc_worker_utils.py:120] Worker VllmWorkerProcess pid 263 died, exit code: 1 INFO 07-11 13:53:53 multiproc_worker_utils.py:123] Killing local vLLM worker processes
Successfully loading model using vllm with multi-gpu.
No response
The text was updated successfully, but these errors were encountered:
I have this problem too.
Sorry, something went wrong.
same problem while loading model for inference. fixed by adding VLLM_WORKER_MULTIPROC_METHOD=spawn before command
VLLM_WORKER_MULTIPROC_METHOD=spawn
642c6d6
fix hiyouga#4780
193f235
Successfully merging a pull request may close this issue.
Reminder
System Info
System: Ubuntu 20.04.2 LTS
GPU: NVIDIA A100-SXM4-80GB
Docker: 24.0.0
Docker Compose: v2.17.3
llamafactory: 0.8.2.dev0
vllm: 0.5.1
Reproduction
Dockerfile: https://github.com/hiyouga/LLaMA-Factory/blob/67040f149c0b3fbae443ba656ed0dcab0ebaf730/docker/docker-cuda/Dockerfile
Build Command:
Launch Command:
The error below occurs when loading Qwen2-7B-Instruct in the chat tab of webui using vllm with multi-gpu:
Expected behavior
Successfully loading model using vllm with multi-gpu.
Others
No response
The text was updated successfully, but these errors were encountered: