Failed to launch using a docker images with vllm #4145

hzhaoy · 2024-06-07T06:14:14Z

Reminder

I have read the README and searched the existing issues.

System Info

System: Ubuntu 20.04.2 LTS
Docker: 24.0.0
Docker Compose: v2.17.3
llamafactory: 0.7.2.dev0

Reproduction

Dockerfile：

FROM nvcr.io/nvidia/pytorch:24.01-py3

WORKDIR /app

COPY requirements.txt /app/
RUN pip install -r requirements.txt

COPY . /app/
RUN pip install -e .[metrics,bitsandbytes,qwen,vllm,deepspeed]

VOLUME [ "/root/.cache/huggingface/", "/app/data", "/app/output" ]
EXPOSE 7860

CMD [ "llamafactory-cli", "webui" ]

Build Command:
docker build -f ./Dockerfile -t llama-factory:latest .

docker-compose.yml

name: llm-fct-dev

services:
  webui-dev:
    image: llama-factory:latest
    volumes:
      - ./hf_cache:/root/.cache/huggingface/
      - ./data:/app/data
      - ./output:/app/output
      - /nfs/llmckpt:/models
    environment:
      - CUDA_VISIBLE_DEVICES=0
    ports:
      - "24529:7860"
    ipc: host
    deploy:
      resources:
        reservations:
          devices:
          - driver: nvidia
            count: "all"
            capabilities: [gpu]
    restart: unless-stopped

Startup Command:
docker compose -f docker-compose.yml up -d

Error:

llm-fct-dev-webui-dev-1  | Traceback (most recent call last):
llm-fct-dev-webui-dev-1  |   File "/usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py", line 1535, in _get_module
llm-fct-dev-webui-dev-1  |     return importlib.import_module("." + module_name, self.__name__)
llm-fct-dev-webui-dev-1  |   File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module
llm-fct-dev-webui-dev-1  |     return _bootstrap._gcd_import(name[level:], package, level)
llm-fct-dev-webui-dev-1  |   File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
llm-fct-dev-webui-dev-1  |   File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
llm-fct-dev-webui-dev-1  |   File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
llm-fct-dev-webui-dev-1  |   File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
llm-fct-dev-webui-dev-1  |   File "<frozen importlib._bootstrap_external>", line 883, in exec_module
llm-fct-dev-webui-dev-1  |   File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
llm-fct-dev-webui-dev-1  |   File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 28, in <module>
llm-fct-dev-webui-dev-1  |     from ..integrations.deepspeed import is_deepspeed_zero3_enabled
llm-fct-dev-webui-dev-1  |   File "/usr/local/lib/python3.10/dist-packages/transformers/integrations/deepspeed.py", line 51, in <module>
llm-fct-dev-webui-dev-1  |     from accelerate.utils.deepspeed import HfDeepSpeedConfig as DeepSpeedConfig
llm-fct-dev-webui-dev-1  |   File "/usr/local/lib/python3.10/dist-packages/accelerate/__init__.py", line 16, in <module>
llm-fct-dev-webui-dev-1  |     from .accelerator import Accelerator
llm-fct-dev-webui-dev-1  |   File "/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py", line 35, in <module>
llm-fct-dev-webui-dev-1  |     from .checkpointing import load_accelerator_state, load_custom_state, save_accelerator_state, save_custom_state
llm-fct-dev-webui-dev-1  |   File "/usr/local/lib/python3.10/dist-packages/accelerate/checkpointing.py", line 24, in <module>
llm-fct-dev-webui-dev-1  |     from .utils import (
llm-fct-dev-webui-dev-1  |   File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/__init__.py", line 181, in <module>
llm-fct-dev-webui-dev-1  |     from .bnb import has_4bit_bnb_layers, load_and_quantize_model
llm-fct-dev-webui-dev-1  |   File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/bnb.py", line 29, in <module>
llm-fct-dev-webui-dev-1  |     from ..big_modeling import dispatch_model, init_empty_weights
llm-fct-dev-webui-dev-1  |   File "/usr/local/lib/python3.10/dist-packages/accelerate/big_modeling.py", line 24, in <module>
llm-fct-dev-webui-dev-1  |     from .hooks import (
llm-fct-dev-webui-dev-1  |   File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 30, in <module>
llm-fct-dev-webui-dev-1  |     from .utils.other import recursive_getattr
llm-fct-dev-webui-dev-1  |   File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/other.py", line 36, in <module>
llm-fct-dev-webui-dev-1  |     from .transformer_engine import convert_model
llm-fct-dev-webui-dev-1  |   File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/transformer_engine.py", line 21, in <module>
llm-fct-dev-webui-dev-1  |     import transformer_engine.pytorch as te
llm-fct-dev-webui-dev-1  |   File "/usr/local/lib/python3.10/dist-packages/transformer_engine/pytorch/__init__.py", line 6, in <module>
llm-fct-dev-webui-dev-1  |     from .module import LayerNormLinear
llm-fct-dev-webui-dev-1  |   File "/usr/local/lib/python3.10/dist-packages/transformer_engine/pytorch/module/__init__.py", line 6, in <module>
llm-fct-dev-webui-dev-1  |     from .layernorm_linear import LayerNormLinear
llm-fct-dev-webui-dev-1  |   File "/usr/local/lib/python3.10/dist-packages/transformer_engine/pytorch/module/layernorm_linear.py", line 15, in <module>
llm-fct-dev-webui-dev-1  |     from .. import cpp_extensions as tex
llm-fct-dev-webui-dev-1  |   File "/usr/local/lib/python3.10/dist-packages/transformer_engine/pytorch/cpp_extensions/__init__.py", line 6, in <module>
llm-fct-dev-webui-dev-1  |     from transformer_engine_extensions import *
llm-fct-dev-webui-dev-1  | ImportError: /usr/local/lib/python3.10/dist-packages/transformer_engine_extensions.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN2at4_ops5zeros4callEN3c108ArrayRefINS2_6SymIntEEENS2_8optionalINS2_10ScalarTypeEEENS6_INS2_6LayoutEEENS6_INS2_6DeviceEEENS6_IbEE
llm-fct-dev-webui-dev-1  |
llm-fct-dev-webui-dev-1  | The above exception was the direct cause of the following exception:
llm-fct-dev-webui-dev-1  |
llm-fct-dev-webui-dev-1  | Traceback (most recent call last):
llm-fct-dev-webui-dev-1  |   File "/usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py", line 1535, in _get_module
llm-fct-dev-webui-dev-1  |     return importlib.import_module("." + module_name, self.__name__)
llm-fct-dev-webui-dev-1  |   File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module
llm-fct-dev-webui-dev-1  |     return _bootstrap._gcd_import(name[level:], package, level)
llm-fct-dev-webui-dev-1  |   File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
llm-fct-dev-webui-dev-1  |   File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
llm-fct-dev-webui-dev-1  |   File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
llm-fct-dev-webui-dev-1  |   File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
llm-fct-dev-webui-dev-1  |   File "<frozen importlib._bootstrap_external>", line 883, in exec_module
llm-fct-dev-webui-dev-1  |   File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
llm-fct-dev-webui-dev-1  |   File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 45, in <module>
llm-fct-dev-webui-dev-1  |     from .generation import GenerationConfig, GenerationMixin
llm-fct-dev-webui-dev-1  |   File "<frozen importlib._bootstrap>", line 1075, in _handle_fromlist
llm-fct-dev-webui-dev-1  |   File "/usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py", line 1525, in __getattr__
llm-fct-dev-webui-dev-1  |     module = self._get_module(self._class_to_module[name])
llm-fct-dev-webui-dev-1  |   File "/usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py", line 1537, in _get_module
llm-fct-dev-webui-dev-1  |     raise RuntimeError(
llm-fct-dev-webui-dev-1  | RuntimeError: Failed to import transformers.generation.utils because of the following error (look up to see its traceback):
llm-fct-dev-webui-dev-1  | /usr/local/lib/python3.10/dist-packages/transformer_engine_extensions.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN2at4_ops5zeros4callEN3c108ArrayRefINS2_6SymIntEEENS2_8optionalINS2_10ScalarTypeEEENS6_INS2_6LayoutEEENS6_INS2_6DeviceEEENS6_IbEE
llm-fct-dev-webui-dev-1  |
llm-fct-dev-webui-dev-1  | The above exception was the direct cause of the following exception:
llm-fct-dev-webui-dev-1  |
llm-fct-dev-webui-dev-1  | Traceback (most recent call last):
llm-fct-dev-webui-dev-1  |   File "/usr/local/bin/llamafactory-cli", line 5, in <module>
llm-fct-dev-webui-dev-1  |     from llamafactory.cli import main
llm-fct-dev-webui-dev-1  |   File "/app/src/llamafactory/__init__.py", line 3, in <module>
llm-fct-dev-webui-dev-1  |     from .cli import VERSION
llm-fct-dev-webui-dev-1  |   File "/app/src/llamafactory/cli.py", line 7, in <module>
llm-fct-dev-webui-dev-1  |     from . import launcher
llm-fct-dev-webui-dev-1  |   File "/app/src/llamafactory/launcher.py", line 1, in <module>
llm-fct-dev-webui-dev-1  |     from llamafactory.train.tuner import run_exp
llm-fct-dev-webui-dev-1  |   File "/app/src/llamafactory/train/tuner.py", line 4, in <module>
llm-fct-dev-webui-dev-1  |     from transformers import PreTrainedModel
llm-fct-dev-webui-dev-1  |   File "<frozen importlib._bootstrap>", line 1075, in _handle_fromlist
llm-fct-dev-webui-dev-1  |   File "/usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py", line 1525, in __getattr__
llm-fct-dev-webui-dev-1  |     module = self._get_module(self._class_to_module[name])
llm-fct-dev-webui-dev-1  |   File "/usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py", line 1537, in _get_module
llm-fct-dev-webui-dev-1  |     raise RuntimeError(
llm-fct-dev-webui-dev-1  | RuntimeError: Failed to import transformers.modeling_utils because of the following error (look up to see its traceback):
llm-fct-dev-webui-dev-1  | Failed to import transformers.generation.utils because of the following error (look up to see its traceback):
llm-fct-dev-webui-dev-1  | /usr/local/lib/python3.10/dist-packages/transformer_engine_extensions.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN2at4_ops5zeros4callEN3c108ArrayRefINS2_6SymIntEEENS2_8optionalINS2_10ScalarTypeEEENS6_INS2_6LayoutEEENS6_INS2_6DeviceEEENS6_IbEE

Expected behavior

Successfully started running

Others

Seems that this is the reason: chenfei-wu/TaskMatrix#116 (comment)

vllm==0.4.3 requires torch==2.3.0 and the torch version in nvcr.io/nvidia/pytorch:24.01-py3 is 2.2.0a0+81ea7a4 which is unsatisfied. This causes installing torch from pip install such that the error occurs.

The text was updated successfully, but these errors were encountered:

hzhaoy · 2024-06-08T06:34:52Z

@hiyouga
This Dockerfile works:

FROM nvcr.io/nvidia/pytorch:24.01-py3

ARG BUNDLE_VLLM=true

WORKDIR /app

COPY requirements.txt /app/
RUN pip install -r requirements.txt

COPY . /app/
RUN <<EOF
if [ "$BUNDLE_VLLM" = "true" ]; then
  pip install -e .[metrics,bitsandbytes,qwen,deepspeed,vllm]
  pip uninstall transformer-engine -y
else
  pip install -e .[metrics,bitsandbytes,qwen,deepspeed]
fi
EOF

VOLUME [ "/root/.cache/huggingface/", "/app/data", "/app/output" ]
EXPOSE 7860

CMD [ "llamafactory-cli", "webui" ]

hiyouga · 2024-06-10T16:20:31Z

fixed

hiyouga added the pending This problem is yet to be addressed label Jun 7, 2024

hiyouga closed this as completed in 949e990 Jun 10, 2024

hiyouga added solved This problem has been already solved and removed pending This problem is yet to be addressed labels Jun 10, 2024

camposs1979 mentioned this issue Jun 13, 2024

采用最新代码，运行vllm（0.4.3）报错：undefined symbol: _ZN2at4_ops5zeros4ca... #4264

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failed to launch using a docker images with vllm #4145

Failed to launch using a docker images with vllm #4145

hzhaoy commented Jun 7, 2024

hzhaoy commented Jun 8, 2024 •

edited

Loading

hiyouga commented Jun 10, 2024

Failed to launch using a docker images with vllm #4145

Failed to launch using a docker images with vllm #4145

Comments

hzhaoy commented Jun 7, 2024

Reminder

System Info

Reproduction

Expected behavior

Others

hzhaoy commented Jun 8, 2024 • edited Loading

hiyouga commented Jun 10, 2024

hzhaoy commented Jun 8, 2024 •

edited

Loading