[Bug]: Multi-Node Tensor-Parallel in #11256 forces TP > cuda_device_count per node #12132

drikster80 · 2025-01-16T21:59:38Z

Your current environment

Running in Docker container (Kubernetes) on (4) GH200 nodes. 1 GPU per node.

Model Input Dumps

python3 -m vllm.entrypoints.openai.api_server --model /models/my-model
--tensor-parallel-size 4
--gpu-memory-utilization 0.95
--served-model-name my-model
--trust-remote-code
--api-key "NONE"
--rope-scaling '{"rope_type":"dynamic","factor":4.0}'
--enable-prefix-caching
--max-model-len 131072

🐛 Describe the bug

@youkaichao, it looks like #11256 forces --tensor-parallel-size to be > per node GPU.

https://github.com/vllm-project/vllm/blob/main/vllm/platforms/cuda.py#L156

        # Use confusing message for more common TP-only case.
        assert tensor_parallel_size <= cuda_device_count, (
            f"please set tensor_parallel_size ({tensor_parallel_size}) "
            f"to less than max local gpu count ({cuda_device_count})")

Currently testing main with (4) nodes, (1) GPU per node results in (same model/code/execution works perfectly in v0.6.4.post1):

Traceback (most recent call last):
  File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 382, in run_mp_engine
    raise e
  File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 371, in run_mp_engine
    engine = MQLLMEngine.from_engine_args(engine_args=engine_args,
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 115, in from_engine_args
    engine_config = engine_args.create_engine_config(usage_context)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/engine/arg_utils.py", line 1244, in create_engine_config
    config = VllmConfig(
             ^^^^^^^^^^^
  File "<string>", line 19, in __init__
  File "/usr/local/lib/python3.12/dist-packages/vllm/config.py", line 3204, in __post_init__
    current_platform.check_and_update_config(self)
  File "/usr/local/lib/python3.12/dist-packages/vllm/platforms/cuda.py", line 156, in check_and_update_config
    assert tensor_parallel_size <= cuda_device_count, (
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: please set tensor_parallel_size (4) to less than max local gpu count (1)

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

The text was updated successfully, but these errors were encountered:

youkaichao · 2025-01-17T01:55:09Z

will fix soon

youkaichao · 2025-01-17T13:19:54Z

@drikster80 can you please help check if #12166 fixes it?

drikster80 · 2025-01-17T14:46:18Z

@drikster80 can you please help check if #12166 fixes it?

Thanks. Rebuilding now and will test. Should take about 90 min to build/test.

drikster80 · 2025-01-17T15:53:54Z

@drikster80 can you please help check if #12166 fixes it?

Thanks. Rebuilding now and will test. Should take about 90 min to build/test.

@youkaichao, just tested and verified that it works as expected. Thanks for your help! Once merged, this issue can be closed.

drikster80 added the bug label Jan 16, 2025

youkaichao self-assigned this Jan 17, 2025

youkaichao mentioned this issue Jan 17, 2025

[misc] fix cross-node TP #12166

Merged

youkaichao linked a pull request Jan 18, 2025 that will close this issue

[misc] fix cross-node TP #12166

Merged

youkaichao closed this as completed in #12166 Jan 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GitHub Sponsors

[Bug]: Multi-Node Tensor-Parallel in #11256 forces TP > cuda_device_count per node #12132

[Bug]: Multi-Node Tensor-Parallel in #11256 forces TP > cuda_device_count per node #12132

drikster80 commented Jan 16, 2025

youkaichao commented Jan 17, 2025

youkaichao commented Jan 17, 2025

drikster80 commented Jan 17, 2025

drikster80 commented Jan 17, 2025

[Bug]: Multi-Node Tensor-Parallel in #11256 forces TP > cuda_device_count per node #12132

[Bug]: Multi-Node Tensor-Parallel in #11256 forces TP > cuda_device_count per node #12132

Comments

drikster80 commented Jan 16, 2025

Your current environment

Model Input Dumps

🐛 Describe the bug

Before submitting a new issue...

youkaichao commented Jan 17, 2025

youkaichao commented Jan 17, 2025

drikster80 commented Jan 17, 2025

drikster80 commented Jan 17, 2025