Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bugfix] Update Dockerfile.cpu to fix NameError: name 'vllm_ops' is not defined #5009

Merged
merged 2 commits into from
May 23, 2024

Conversation

LetianLee
Copy link
Contributor

The current directory /workspace/vllm has issues importing Python packages, such as from vllm import vllm_ops. Setting the outer directory /workspace/ as WORKDIR resolves these import errors.

FIX [Bug]: CPU Inference vllm_ops not defined #4275 (#4275)


PR Checklist (Click to Expand)

Thank you for your contribution to vLLM! Before submitting the pull request, please ensure the PR meets the following criteria. This helps vLLM maintain the code quality and improve the efficiency of the review process.

PR Title and Classification

Only specific types of PRs will be reviewed. The PR title is prefixed appropriately to indicate the type of change. Please use one of the following:

  • [Bugfix] for bug fixes.
  • [CI/Build] for build or continuous integration improvements.
  • [Doc] for documentation fixes and improvements.
  • [Model] for adding a new model or improving an existing model. Model name should appear in the title.
  • [Frontend] For changes on the vLLM frontend (e.g., OpenAI API server, LLM class, etc.)
  • [Kernel] for changes affecting CUDA kernels or other compute kernels.
  • [Core] for changes in the core vLLM logic (e.g., LLMEngine, AsyncLLMEngine, Scheduler, etc.)
  • [Hardware][Vendor] for hardware-specific changes. Vendor name should appear in the prefix (e.g., [Hardware][AMD]).
  • [Misc] for PRs that do not fit the above categories. Please use this sparingly.

Note: If the PR spans more than one category, please include all relevant prefixes.

Code Quality

The PR need to meet the following code quality standards:

  • We adhere to Google Python style guide and Google C++ style guide.
  • Pass all linter checks. Please use format.sh to format your code.
  • The code need to be well-documented to ensure future contributors can easily understand the code.
  • Include sufficient tests to ensure the project to stay correct and robust. This includes both unit tests and integration tests.
  • Please add documentation to docs/source/ if the PR modifies the user-facing behaviors of vLLM. It helps vLLM user understand and utilize the new features or changes.

Notes for Large Changes

Please keep the changes as concise as possible. For major architectural changes (>500 LOC excluding kernel/data/config/test), we would expect a GitHub issue (RFC) discussing the technical design and justification. Otherwise, we will tag it with rfc-required and might not go through the PR.

What to Expect for the Reviews

The goal of the vLLM team is to be a transparent reviewing machine. We would like to make the review process transparent and efficient and make sure no contributor feel confused or frustrated. However, the vLLM team is small, so we need to prioritize some PRs over others. Here is what you can expect from the review process:

  • After the PR is submitted, the PR will be assigned to a reviewer. Every reviewer will pick up the PRs based on their expertise and availability.
  • After the PR is assigned, the reviewer will provide status update every 2-3 days. If the PR is not reviewed within 7 days, please feel free to ping the reviewer or the vLLM team.
  • After the review, the reviewer will put an action-required label on the PR if there are changes required. The contributor should address the comments and ping the reviewer to re-review the PR.
  • Please respond to all comments within a reasonable time frame. If a comment isn't clear or you disagree with a suggestion, feel free to ask for clarification or discuss the suggestion.

Thank You

Finally, thank you for taking the time to read these guidelines and for your interest in contributing to vLLM. Your contributions make vLLM a great tool for everyone!

LetianLee added 2 commits May 23, 2024 14:56
The current directory `/workspace/vllm` has issues importing Python packages, such as `from vllm import vllm_ops`. Setting the outer directory `/workspace/` as WORKDIR resolves these import errors.
Modified run-cpu-test.sh to update the file path for the corresponding test file. This change ensures that the script points to the correct location of the test file, which is necessary for accurate and successful test execution.
@simon-mo simon-mo merged commit 2ba80be into vllm-project:main May 23, 2024
55 of 63 checks passed
@s-smits
Copy link

s-smits commented May 28, 2024

Will this fix:

[email protected]:/vllm-workspace/vllm$ python3 -m vllm.entrypoints.openai.api_server --model CohereForAI/aya-23-8B --chat-template ./examples/template_chatml.jinja --tensor-parallel-size 2
/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:1132: FutureWarning: resume_download is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True.
warnings.warn(
2024-05-28 19:32:20,366 WARNING utils.py:580 -- Detecting docker specified CPUs. In previous versions of Ray, CPU detection in containers was incorrect. Please ensure that Ray has enough CPUs allocated. As a temporary workaround to revert to the prior behavior, set RAY_USE_MULTIPROCESSING_CPU_COUNT=1 as an env var before starting Ray. Set the env var: RAY_DISABLE_DOCKER_CPU_WARNING=1 to mute this warning.
2024-05-28 19:32:20,367 WARNING utils.py:592 -- Ray currently does not support initializing Ray with fractional cpus. Your num_cpus will be truncated from 20.48 to 20.
2024-05-28 19:32:20,530 INFO worker.py:1749 -- Started a local Ray instance.
INFO 05-28 19:32:21 llm_engine.py:103] Initializing an LLM engine (v0.4.2) with config: model='CohereForAI/aya-23-8B', speculative_config=None, tokenizer='CohereForAI/aya-23-8B', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, rope_scaling=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=8192, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=2, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), seed=0, served_model_name=CohereForAI/aya-23-8B)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:1132: FutureWarning: resume_download is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True.
warnings.warn(
INFO 05-28 19:32:27 selector.py:158] Cannot use FlashAttention-2 backend because the vllm_flash_attn package is not found. pip install vllm-flash-attn for better performance.
INFO 05-28 19:32:27 selector.py:51] Using XFormers backend.
(RayWorkerWrapper pid=13066) INFO 05-28 19:32:27 selector.py:158] Cannot use FlashAttention-2 backend because the vllm_flash_attn package is not found. pip install vllm-flash-attn for better performance.
(RayWorkerWrapper pid=13066) INFO 05-28 19:32:27 selector.py:51] Using XFormers backend.
INFO 05-28 19:32:28 utils.py:643] Found nccl from library /root/.config/vllm/nccl/cu12/libnccl.so.2.18.1
INFO 05-28 19:32:28 pynccl.py:65] vLLM is using nccl==2.18.1
(RayWorkerWrapper pid=13066) INFO 05-28 19:32:28 utils.py:643] Found nccl from library /root/.config/vllm/nccl/cu12/libnccl.so.2.18.1
(RayWorkerWrapper pid=13066) INFO 05-28 19:32:28 pynccl.py:65] vLLM is using nccl==2.18.1
(RayWorkerWrapper pid=13066) INFO 05-28 19:32:30 selector.py:158] Cannot use FlashAttention-2 backend because the vllm_flash_attn package is not found. pip install vllm-flash-attn for better performance.
(RayWorkerWrapper pid=13066) INFO 05-28 19:32:30 selector.py:51] Using XFormers backend.
INFO 05-28 19:32:30 selector.py:158] Cannot use FlashAttention-2 backend because the vllm_flash_attn package is not found. pip install vllm-flash-attn for better performance.
INFO 05-28 19:32:30 selector.py:51] Using XFormers backend.
INFO 05-28 19:32:31 weight_utils.py:207] Using model weights format ['.safetensors']
(RayWorkerWrapper pid=13066) INFO 05-28 19:32:31 weight_utils.py:207] Using model weights format ['
.safetensors']
INFO 05-28 19:32:34 model_runner.py:146] Loading model weights took 7.4788 GB
(RayWorkerWrapper pid=13066) INFO 05-28 19:32:36 model_runner.py:146] Loading model weights took 7.4788 GB
ERROR 05-28 19:32:37 worker_base.py:146] Error executing method determine_num_available_blocks. This might cause deadlock in distributed execution.
ERROR 05-28 19:32:37 worker_base.py:146] Traceback (most recent call last):
ERROR 05-28 19:32:37 worker_base.py:146] File "/vllm-workspace/vllm/vllm/worker/worker_base.py", line 138, in execute_method
ERROR 05-28 19:32:37 worker_base.py:146] return executor(*args, **kwargs)
ERROR 05-28 19:32:37 worker_base.py:146] File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
ERROR 05-28 19:32:37 worker_base.py:146] return func(*args, **kwargs)
ERROR 05-28 19:32:37 worker_base.py:146] File "/vllm-workspace/vllm/vllm/worker/worker.py", line 154, in determine_num_available_blocks
ERROR 05-28 19:32:37 worker_base.py:146] self.model_runner.profile_run()
ERROR 05-28 19:32:37 worker_base.py:146] File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
ERROR 05-28 19:32:37 worker_base.py:146] return func(*args, **kwargs)
ERROR 05-28 19:32:37 worker_base.py:146] File "/vllm-workspace/vllm/vllm/worker/model_runner.py", line 812, in profile_run
ERROR 05-28 19:32:37 worker_base.py:146] self.execute_model(seqs, kv_caches)
ERROR 05-28 19:32:37 worker_base.py:146] File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
ERROR 05-28 19:32:37 worker_base.py:146] return func(*args, **kwargs)
ERROR 05-28 19:32:37 worker_base.py:146] File "/vllm-workspace/vllm/vllm/worker/model_runner.py", line 731, in execute_model
ERROR 05-28 19:32:37 worker_base.py:146] hidden_states = model_executable(**execute_model_kwargs)
ERROR 05-28 19:32:37 worker_base.py:146] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
ERROR 05-28 19:32:37 worker_base.py:146] return self._call_impl(*args, **kwargs)
ERROR 05-28 19:32:37 worker_base.py:146] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
ERROR 05-28 19:32:37 worker_base.py:146] return forward_call(*args, **kwargs)
ERROR 05-28 19:32:37 worker_base.py:146] File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
ERROR 05-28 19:32:37 worker_base.py:146] return func(*args, **kwargs)
ERROR 05-28 19:32:37 worker_base.py:146] File "/vllm-workspace/vllm/vllm/model_executor/models/commandr.py", line 327, in forward
ERROR 05-28 19:32:37 worker_base.py:146] hidden_states = self.model(input_ids, positions, kv_caches,
ERROR 05-28 19:32:37 worker_base.py:146] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
ERROR 05-28 19:32:37 worker_base.py:146] return self._call_impl(*args, **kwargs)
ERROR 05-28 19:32:37 worker_base.py:146] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
ERROR 05-28 19:32:37 worker_base.py:146] return forward_call(*args, **kwargs)
ERROR 05-28 19:32:37 worker_base.py:146] File "/vllm-workspace/vllm/vllm/model_executor/models/commandr.py", line 292, in forward
ERROR 05-28 19:32:37 worker_base.py:146] hidden_states, residual = layer(
ERROR 05-28 19:32:37 worker_base.py:146] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
ERROR 05-28 19:32:37 worker_base.py:146] return self._call_impl(*args, **kwargs)
ERROR 05-28 19:32:37 worker_base.py:146] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
ERROR 05-28 19:32:37 worker_base.py:146] return forward_call(*args, **kwargs)
ERROR 05-28 19:32:37 worker_base.py:146] File "/vllm-workspace/vllm/vllm/model_executor/models/commandr.py", line 248, in forward
ERROR 05-28 19:32:37 worker_base.py:146] hidden_states_attention = self.self_attn(
ERROR 05-28 19:32:37 worker_base.py:146] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
ERROR 05-28 19:32:37 worker_base.py:146] return self._call_impl(*args, **kwargs)
ERROR 05-28 19:32:37 worker_base.py:146] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
ERROR 05-28 19:32:37 worker_base.py:146] return forward_call(*args, **kwargs)
ERROR 05-28 19:32:37 worker_base.py:146] File "/vllm-workspace/vllm/vllm/model_executor/models/commandr.py", line 214, in forward
ERROR 05-28 19:32:37 worker_base.py:146] q, k = self.rotary_emb(positions, q, k)
ERROR 05-28 19:32:37 worker_base.py:146] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
ERROR 05-28 19:32:37 worker_base.py:146] return self._call_impl(*args, **kwargs)
ERROR 05-28 19:32:37 worker_base.py:146] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
ERROR 05-28 19:32:37 worker_base.py:146] return forward_call(*args, **kwargs)
ERROR 05-28 19:32:37 worker_base.py:146] File "/vllm-workspace/vllm/vllm/model_executor/layers/rotary_embedding.py", line 158, in forward
ERROR 05-28 19:32:37 worker_base.py:146] ops.rotary_embedding(positions, query, key, self.head_size,
ERROR 05-28 19:32:37 worker_base.py:146] File "/vllm-workspace/vllm/vllm/_custom_ops.py", line 101, in rotary_embedding
ERROR 05-28 19:32:37 worker_base.py:146] vllm_ops.rotary_embedding(positions, query, key, head_size, cos_sin_cache,
ERROR 05-28 19:32:37 worker_base.py:146] NameError: name 'vllm_ops' is not defined
[rank0]: Traceback (most recent call last):
[rank0]: File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
[rank0]: return _run_code(code, main_globals, None,
[rank0]: File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
[rank0]: exec(code, run_globals)
[rank0]: File "/vllm-workspace/vllm/vllm/entrypoints/openai/api_server.py", line 186, in
[rank0]: engine = AsyncLLMEngine.from_engine_args(
[rank0]: File "/vllm-workspace/vllm/vllm/engine/async_llm_engine.py", line 382, in from_engine_args
[rank0]: engine = cls(
[rank0]: File "/vllm-workspace/vllm/vllm/engine/async_llm_engine.py", line 336, in init
[rank0]: self.engine = self._init_engine(*args, **kwargs)
[rank0]: File "/vllm-workspace/vllm/vllm/engine/async_llm_engine.py", line 458, in _init_engine
[rank0]: return engine_class(*args, **kwargs)
[rank0]: File "/vllm-workspace/vllm/vllm/engine/llm_engine.py", line 178, in init
[rank0]: self._initialize_kv_caches()
[rank0]: File "/vllm-workspace/vllm/vllm/engine/llm_engine.py", line 255, in _initialize_kv_caches
[rank0]: self.model_executor.determine_num_available_blocks())
[rank0]: File "/vllm-workspace/vllm/vllm/executor/distributed_gpu_executor.py", line 38, in determine_num_available_blocks
[rank0]: num_blocks = self._run_workers("determine_num_available_blocks", )
[rank0]: File "/vllm-workspace/vllm/vllm/executor/ray_gpu_executor.py", line 246, in _run_workers
[rank0]: driver_worker_output = self.driver_worker.execute_method(
[rank0]: File "/vllm-workspace/vllm/vllm/worker/worker_base.py", line 147, in execute_method
[rank0]: raise e
[rank0]: File "/vllm-workspace/vllm/vllm/worker/worker_base.py", line 138, in execute_method
[rank0]: return executor(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: File "/vllm-workspace/vllm/vllm/worker/worker.py", line 154, in determine_num_available_blocks
[rank0]: self.model_runner.profile_run()
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: File "/vllm-workspace/vllm/vllm/worker/model_runner.py", line 812, in profile_run
[rank0]: self.execute_model(seqs, kv_caches)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: File "/vllm-workspace/vllm/vllm/worker/model_runner.py", line 731, in execute_model
[rank0]: hidden_states = model_executable(**execute_model_kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: File "/vllm-workspace/vllm/vllm/model_executor/models/commandr.py", line 327, in forward
[rank0]: hidden_states = self.model(input_ids, positions, kv_caches,
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/vllm-workspace/vllm/vllm/model_executor/models/commandr.py", line 292, in forward
[rank0]: hidden_states, residual = layer(
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/vllm-workspace/vllm/vllm/model_executor/models/commandr.py", line 248, in forward
[rank0]: hidden_states_attention = self.self_attn(
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/vllm-workspace/vllm/vllm/model_executor/models/commandr.py", line 214, in forward
[rank0]: q, k = self.rotary_emb(positions, q, k)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/vllm-workspace/vllm/vllm/model_executor/layers/rotary_embedding.py", line 158, in forward
[rank0]: ops.rotary_embedding(positions, query, key, self.head_size,
[rank0]: File "/vllm-workspace/vllm/vllm/_custom_ops.py", line 101, in rotary_embedding
[rank0]: vllm_ops.rotary_embedding(positions, query, key, head_size, cos_sin_cache,
[rank0]: NameError: name 'vllm_ops' is not defined
(RayWorkerWrapper pid=13066) ERROR 05-28 19:32:37 worker_base.py:146] Error executing method determine_num_available_blocks. This might cause deadlock in distributed execution.
(RayWorkerWrapper pid=13066) ERROR 05-28 19:32:37 worker_base.py:146] Traceback (most recent call last):
(RayWorkerWrapper pid=13066) ERROR 05-28 19:32:37 worker_base.py:146] File "/vllm-workspace/vllm/vllm/worker/worker_base.py", line 138, in execute_method
(RayWorkerWrapper pid=13066) ERROR 05-28 19:32:37 worker_base.py:146] return executor(*args, **kwargs)
(RayWorkerWrapper pid=13066) ERROR 05-28 19:32:37 worker_base.py:146] File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
(RayWorkerWrapper pid=13066) ERROR 05-28 19:32:37 worker_base.py:146] return func(*args, **kwargs)
(RayWorkerWrapper pid=13066) ERROR 05-28 19:32:37 worker_base.py:146] File "/vllm-workspace/vllm/vllm/worker/worker.py", line 154, in determine_num_available_blocks
(RayWorkerWrapper pid=13066) ERROR 05-28 19:32:37 worker_base.py:146] self.model_runner.profile_run()
(RayWorkerWrapper pid=13066) ERROR 05-28 19:32:37 worker_base.py:146] File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
(RayWorkerWrapper pid=13066) ERROR 05-28 19:32:37 worker_base.py:146] return func(*args, **kwargs)
(RayWorkerWrapper pid=13066) ERROR 05-28 19:32:37 worker_base.py:146] File "/vllm-workspace/vllm/vllm/worker/model_runner.py", line 812, in profile_run
(RayWorkerWrapper pid=13066) ERROR 05-28 19:32:37 worker_base.py:146] self.execute_model(seqs, kv_caches)
(RayWorkerWrapper pid=13066) ERROR 05-28 19:32:37 worker_base.py:146] File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
(RayWorkerWrapper pid=13066) ERROR 05-28 19:32:37 worker_base.py:146] return func(*args, **kwargs)
(RayWorkerWrapper pid=13066) ERROR 05-28 19:32:37 worker_base.py:146] File "/vllm-workspace/vllm/vllm/worker/model_runner.py", line 731, in execute_model
(RayWorkerWrapper pid=13066) ERROR 05-28 19:32:37 worker_base.py:146] hidden_states = model_executable(**execute_model_kwargs)
(RayWorkerWrapper pid=13066) ERROR 05-28 19:32:37 worker_base.py:146] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(RayWorkerWrapper pid=13066) ERROR 05-28 19:32:37 worker_base.py:146] return self._call_impl(*args, **kwargs)
(RayWorkerWrapper pid=13066) ERROR 05-28 19:32:37 worker_base.py:146] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(RayWorkerWrapper pid=13066) ERROR 05-28 19:32:37 worker_base.py:146] return forward_call(*args, **kwargs)
(RayWorkerWrapper pid=13066) ERROR 05-28 19:32:37 worker_base.py:146] File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
(RayWorkerWrapper pid=13066) ERROR 05-28 19:32:37 worker_base.py:146] return func(*args, **kwargs)
(RayWorkerWrapper pid=13066) ERROR 05-28 19:32:37 worker_base.py:146] File "/vllm-workspace/vllm/vllm/model_executor/models/commandr.py", line 327, in forward
(RayWorkerWrapper pid=13066) ERROR 05-28 19:32:37 worker_base.py:146] hidden_states = self.model(input_ids, positions, kv_caches,
(RayWorkerWrapper pid=13066) ERROR 05-28 19:32:37 worker_base.py:146] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(RayWorkerWrapper pid=13066) ERROR 05-28 19:32:37 worker_base.py:146] return self._call_impl(*args, **kwargs)
(RayWorkerWrapper pid=13066) ERROR 05-28 19:32:37 worker_base.py:146] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(RayWorkerWrapper pid=13066) ERROR 05-28 19:32:37 worker_base.py:146] return forward_call(*args, **kwargs)
(RayWorkerWrapper pid=13066) ERROR 05-28 19:32:37 worker_base.py:146] File "/vllm-workspace/vllm/vllm/model_executor/models/commandr.py", line 292, in forward
(RayWorkerWrapper pid=13066) ERROR 05-28 19:32:37 worker_base.py:146] hidden_states, residual = layer(
(RayWorkerWrapper pid=13066) ERROR 05-28 19:32:37 worker_base.py:146] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(RayWorkerWrapper pid=13066) ERROR 05-28 19:32:37 worker_base.py:146] return self._call_impl(*args, **kwargs)
(RayWorkerWrapper pid=13066) ERROR 05-28 19:32:37 worker_base.py:146] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(RayWorkerWrapper pid=13066) ERROR 05-28 19:32:37 worker_base.py:146] return forward_call(*args, **kwargs)
(RayWorkerWrapper pid=13066) ERROR 05-28 19:32:37 worker_base.py:146] File "/vllm-workspace/vllm/vllm/model_executor/models/commandr.py", line 248, in forward
(RayWorkerWrapper pid=13066) ERROR 05-28 19:32:37 worker_base.py:146] hidden_states_attention = self.self_attn(
(RayWorkerWrapper pid=13066) ERROR 05-28 19:32:37 worker_base.py:146] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(RayWorkerWrapper pid=13066) ERROR 05-28 19:32:37 worker_base.py:146] return self._call_impl(*args, **kwargs)
(RayWorkerWrapper pid=13066) ERROR 05-28 19:32:37 worker_base.py:146] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(RayWorkerWrapper pid=13066) ERROR 05-28 19:32:37 worker_base.py:146] return forward_call(*args, **kwargs)
(RayWorkerWrapper pid=13066) ERROR 05-28 19:32:37 worker_base.py:146] File "/vllm-workspace/vllm/vllm/model_executor/models/commandr.py", line 214, in forward
(RayWorkerWrapper pid=13066) ERROR 05-28 19:32:37 worker_base.py:146] q, k = self.rotary_emb(positions, q, k)
(RayWorkerWrapper pid=13066) ERROR 05-28 19:32:37 worker_base.py:146] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(RayWorkerWrapper pid=13066) ERROR 05-28 19:32:37 worker_base.py:146] return self._call_impl(*args, **kwargs)
(RayWorkerWrapper pid=13066) ERROR 05-28 19:32:37 worker_base.py:146] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(RayWorkerWrapper pid=13066) ERROR 05-28 19:32:37 worker_base.py:146] return forward_call(*args, **kwargs)
(RayWorkerWrapper pid=13066) ERROR 05-28 19:32:37 worker_base.py:146] File "/vllm-workspace/vllm/vllm/model_executor/layers/rotary_embedding.py", line 158, in forward
(RayWorkerWrapper pid=13066) ERROR 05-28 19:32:37 worker_base.py:146] ops.rotary_embedding(positions, query, key, self.head_size,
(RayWorkerWrapper pid=13066) ERROR 05-28 19:32:37 worker_base.py:146] File "/vllm-workspace/vllm/vllm/_custom_ops.py", line 101, in rotary_embedding
(RayWorkerWrapper pid=13066) ERROR 05-28 19:32:37 worker_base.py:146] vllm_ops.rotary_embedding(positions, query, key, head_size, cos_sin_cache,
(RayWorkerWrapper pid=13066) ERROR 05-28 19:32:37 worker_base.py:146] NameError: name 'vllm_ops' is not defined

with
python3 -m vllm.entrypoints.openai.api_server --model CohereForAI/aya-23-8B --tokenizer CohereForAI/aya-23-8B --max-model-len 4096 -e HUGGING_FACE_HUB_TOKEN=HF_TOKEN
and
--runtime nvidia --gpus all -v ./workspace:/root/.cache/huggingface -p 8000:8000 -e HUGGING_FACE_HUB_TOKEN=HF_TOKEN

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants