Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to launch using a docker images with vllm #4145

Closed
1 task done
hzhaoy opened this issue Jun 7, 2024 · 2 comments
Closed
1 task done

Failed to launch using a docker images with vllm #4145

hzhaoy opened this issue Jun 7, 2024 · 2 comments
Labels
solved This problem has been already solved

Comments

@hzhaoy
Copy link
Contributor

hzhaoy commented Jun 7, 2024

Reminder

  • I have read the README and searched the existing issues.

System Info

System: Ubuntu 20.04.2 LTS
Docker: 24.0.0
Docker Compose: v2.17.3
llamafactory: 0.7.2.dev0

Reproduction

Dockerfile:

FROM nvcr.io/nvidia/pytorch:24.01-py3

WORKDIR /app

COPY requirements.txt /app/
RUN pip install -r requirements.txt

COPY . /app/
RUN pip install -e .[metrics,bitsandbytes,qwen,vllm,deepspeed]

VOLUME [ "/root/.cache/huggingface/", "/app/data", "/app/output" ]
EXPOSE 7860

CMD [ "llamafactory-cli", "webui" ]

Build Command:
docker build -f ./Dockerfile -t llama-factory:latest .

docker-compose.yml

name: llm-fct-dev

services:
  webui-dev:
    image: llama-factory:latest
    volumes:
      - ./hf_cache:/root/.cache/huggingface/
      - ./data:/app/data
      - ./output:/app/output
      - /nfs/llmckpt:/models
    environment:
      - CUDA_VISIBLE_DEVICES=0
    ports:
      - "24529:7860"
    ipc: host
    deploy:
      resources:
        reservations:
          devices:
          - driver: nvidia
            count: "all"
            capabilities: [gpu]
    restart: unless-stopped

Startup Command:
docker compose -f docker-compose.yml up -d

Error:

llm-fct-dev-webui-dev-1  | Traceback (most recent call last):
llm-fct-dev-webui-dev-1  |   File "/usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py", line 1535, in _get_module
llm-fct-dev-webui-dev-1  |     return importlib.import_module("." + module_name, self.__name__)
llm-fct-dev-webui-dev-1  |   File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module
llm-fct-dev-webui-dev-1  |     return _bootstrap._gcd_import(name[level:], package, level)
llm-fct-dev-webui-dev-1  |   File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
llm-fct-dev-webui-dev-1  |   File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
llm-fct-dev-webui-dev-1  |   File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
llm-fct-dev-webui-dev-1  |   File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
llm-fct-dev-webui-dev-1  |   File "<frozen importlib._bootstrap_external>", line 883, in exec_module
llm-fct-dev-webui-dev-1  |   File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
llm-fct-dev-webui-dev-1  |   File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 28, in <module>
llm-fct-dev-webui-dev-1  |     from ..integrations.deepspeed import is_deepspeed_zero3_enabled
llm-fct-dev-webui-dev-1  |   File "/usr/local/lib/python3.10/dist-packages/transformers/integrations/deepspeed.py", line 51, in <module>
llm-fct-dev-webui-dev-1  |     from accelerate.utils.deepspeed import HfDeepSpeedConfig as DeepSpeedConfig
llm-fct-dev-webui-dev-1  |   File "/usr/local/lib/python3.10/dist-packages/accelerate/__init__.py", line 16, in <module>
llm-fct-dev-webui-dev-1  |     from .accelerator import Accelerator
llm-fct-dev-webui-dev-1  |   File "/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py", line 35, in <module>
llm-fct-dev-webui-dev-1  |     from .checkpointing import load_accelerator_state, load_custom_state, save_accelerator_state, save_custom_state
llm-fct-dev-webui-dev-1  |   File "/usr/local/lib/python3.10/dist-packages/accelerate/checkpointing.py", line 24, in <module>
llm-fct-dev-webui-dev-1  |     from .utils import (
llm-fct-dev-webui-dev-1  |   File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/__init__.py", line 181, in <module>
llm-fct-dev-webui-dev-1  |     from .bnb import has_4bit_bnb_layers, load_and_quantize_model
llm-fct-dev-webui-dev-1  |   File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/bnb.py", line 29, in <module>
llm-fct-dev-webui-dev-1  |     from ..big_modeling import dispatch_model, init_empty_weights
llm-fct-dev-webui-dev-1  |   File "/usr/local/lib/python3.10/dist-packages/accelerate/big_modeling.py", line 24, in <module>
llm-fct-dev-webui-dev-1  |     from .hooks import (
llm-fct-dev-webui-dev-1  |   File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 30, in <module>
llm-fct-dev-webui-dev-1  |     from .utils.other import recursive_getattr
llm-fct-dev-webui-dev-1  |   File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/other.py", line 36, in <module>
llm-fct-dev-webui-dev-1  |     from .transformer_engine import convert_model
llm-fct-dev-webui-dev-1  |   File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/transformer_engine.py", line 21, in <module>
llm-fct-dev-webui-dev-1  |     import transformer_engine.pytorch as te
llm-fct-dev-webui-dev-1  |   File "/usr/local/lib/python3.10/dist-packages/transformer_engine/pytorch/__init__.py", line 6, in <module>
llm-fct-dev-webui-dev-1  |     from .module import LayerNormLinear
llm-fct-dev-webui-dev-1  |   File "/usr/local/lib/python3.10/dist-packages/transformer_engine/pytorch/module/__init__.py", line 6, in <module>
llm-fct-dev-webui-dev-1  |     from .layernorm_linear import LayerNormLinear
llm-fct-dev-webui-dev-1  |   File "/usr/local/lib/python3.10/dist-packages/transformer_engine/pytorch/module/layernorm_linear.py", line 15, in <module>
llm-fct-dev-webui-dev-1  |     from .. import cpp_extensions as tex
llm-fct-dev-webui-dev-1  |   File "/usr/local/lib/python3.10/dist-packages/transformer_engine/pytorch/cpp_extensions/__init__.py", line 6, in <module>
llm-fct-dev-webui-dev-1  |     from transformer_engine_extensions import *
llm-fct-dev-webui-dev-1  | ImportError: /usr/local/lib/python3.10/dist-packages/transformer_engine_extensions.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN2at4_ops5zeros4callEN3c108ArrayRefINS2_6SymIntEEENS2_8optionalINS2_10ScalarTypeEEENS6_INS2_6LayoutEEENS6_INS2_6DeviceEEENS6_IbEE
llm-fct-dev-webui-dev-1  |
llm-fct-dev-webui-dev-1  | The above exception was the direct cause of the following exception:
llm-fct-dev-webui-dev-1  |
llm-fct-dev-webui-dev-1  | Traceback (most recent call last):
llm-fct-dev-webui-dev-1  |   File "/usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py", line 1535, in _get_module
llm-fct-dev-webui-dev-1  |     return importlib.import_module("." + module_name, self.__name__)
llm-fct-dev-webui-dev-1  |   File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module
llm-fct-dev-webui-dev-1  |     return _bootstrap._gcd_import(name[level:], package, level)
llm-fct-dev-webui-dev-1  |   File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
llm-fct-dev-webui-dev-1  |   File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
llm-fct-dev-webui-dev-1  |   File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
llm-fct-dev-webui-dev-1  |   File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
llm-fct-dev-webui-dev-1  |   File "<frozen importlib._bootstrap_external>", line 883, in exec_module
llm-fct-dev-webui-dev-1  |   File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
llm-fct-dev-webui-dev-1  |   File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 45, in <module>
llm-fct-dev-webui-dev-1  |     from .generation import GenerationConfig, GenerationMixin
llm-fct-dev-webui-dev-1  |   File "<frozen importlib._bootstrap>", line 1075, in _handle_fromlist
llm-fct-dev-webui-dev-1  |   File "/usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py", line 1525, in __getattr__
llm-fct-dev-webui-dev-1  |     module = self._get_module(self._class_to_module[name])
llm-fct-dev-webui-dev-1  |   File "/usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py", line 1537, in _get_module
llm-fct-dev-webui-dev-1  |     raise RuntimeError(
llm-fct-dev-webui-dev-1  | RuntimeError: Failed to import transformers.generation.utils because of the following error (look up to see its traceback):
llm-fct-dev-webui-dev-1  | /usr/local/lib/python3.10/dist-packages/transformer_engine_extensions.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN2at4_ops5zeros4callEN3c108ArrayRefINS2_6SymIntEEENS2_8optionalINS2_10ScalarTypeEEENS6_INS2_6LayoutEEENS6_INS2_6DeviceEEENS6_IbEE
llm-fct-dev-webui-dev-1  |
llm-fct-dev-webui-dev-1  | The above exception was the direct cause of the following exception:
llm-fct-dev-webui-dev-1  |
llm-fct-dev-webui-dev-1  | Traceback (most recent call last):
llm-fct-dev-webui-dev-1  |   File "/usr/local/bin/llamafactory-cli", line 5, in <module>
llm-fct-dev-webui-dev-1  |     from llamafactory.cli import main
llm-fct-dev-webui-dev-1  |   File "/app/src/llamafactory/__init__.py", line 3, in <module>
llm-fct-dev-webui-dev-1  |     from .cli import VERSION
llm-fct-dev-webui-dev-1  |   File "/app/src/llamafactory/cli.py", line 7, in <module>
llm-fct-dev-webui-dev-1  |     from . import launcher
llm-fct-dev-webui-dev-1  |   File "/app/src/llamafactory/launcher.py", line 1, in <module>
llm-fct-dev-webui-dev-1  |     from llamafactory.train.tuner import run_exp
llm-fct-dev-webui-dev-1  |   File "/app/src/llamafactory/train/tuner.py", line 4, in <module>
llm-fct-dev-webui-dev-1  |     from transformers import PreTrainedModel
llm-fct-dev-webui-dev-1  |   File "<frozen importlib._bootstrap>", line 1075, in _handle_fromlist
llm-fct-dev-webui-dev-1  |   File "/usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py", line 1525, in __getattr__
llm-fct-dev-webui-dev-1  |     module = self._get_module(self._class_to_module[name])
llm-fct-dev-webui-dev-1  |   File "/usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py", line 1537, in _get_module
llm-fct-dev-webui-dev-1  |     raise RuntimeError(
llm-fct-dev-webui-dev-1  | RuntimeError: Failed to import transformers.modeling_utils because of the following error (look up to see its traceback):
llm-fct-dev-webui-dev-1  | Failed to import transformers.generation.utils because of the following error (look up to see its traceback):
llm-fct-dev-webui-dev-1  | /usr/local/lib/python3.10/dist-packages/transformer_engine_extensions.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN2at4_ops5zeros4callEN3c108ArrayRefINS2_6SymIntEEENS2_8optionalINS2_10ScalarTypeEEENS6_INS2_6LayoutEEENS6_INS2_6DeviceEEENS6_IbEE

Expected behavior

Successfully started running

Others

Seems that this is the reason: chenfei-wu/TaskMatrix#116 (comment)

vllm==0.4.3 requires torch==2.3.0 and the torch version in nvcr.io/nvidia/pytorch:24.01-py3 is 2.2.0a0+81ea7a4 which is unsatisfied. This causes installing torch from pip install such that the error occurs.

@hiyouga hiyouga added the pending This problem is yet to be addressed label Jun 7, 2024
@hzhaoy
Copy link
Contributor Author

hzhaoy commented Jun 8, 2024

@hiyouga
This Dockerfile works:

FROM nvcr.io/nvidia/pytorch:24.01-py3

ARG BUNDLE_VLLM=true

WORKDIR /app

COPY requirements.txt /app/
RUN pip install -r requirements.txt

COPY . /app/
RUN <<EOF
if [ "$BUNDLE_VLLM" = "true" ]; then
  pip install -e .[metrics,bitsandbytes,qwen,deepspeed,vllm]
  pip uninstall transformer-engine -y
else
  pip install -e .[metrics,bitsandbytes,qwen,deepspeed]
fi
EOF

VOLUME [ "/root/.cache/huggingface/", "/app/data", "/app/output" ]
EXPOSE 7860

CMD [ "llamafactory-cli", "webui" ]

@hiyouga hiyouga added solved This problem has been already solved and removed pending This problem is yet to be addressed labels Jun 10, 2024
@hiyouga
Copy link
Owner

hiyouga commented Jun 10, 2024

fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
solved This problem has been already solved
Projects
None yet
Development

No branches or pull requests

2 participants