Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature, Hardware] Enable SGLang on XPU GPUs via PyTorch #1480

Merged
merged 44 commits into from
Oct 12, 2024

Conversation

liangan1
Copy link
Contributor

@liangan1 liangan1 commented Sep 20, 2024

Pytorch already support XPU device since 2.4 release and xpu is also supported in OpenAI Trition. So, it should works with the Trition attention backend in SGLang. In this PR, We add 'xpu' device into SGLang.

Blocked issue: The vllm is only compatible with 2.3 now and he binary whl support for torch xpu should be ready since torch-2.5(should be ready in the Oct/2024), so we should wait the vllm to be compatible with torch-2.5.

Status

  • Both XPU & CUDA works with latency benchmark
    LLama-2-7b works for the latency benchmark.
    VLLM_TEST_COMPILE_NO_CUSTOM_OPS=1 python -m sglang.bench_latency --model-path ~/.cache/huggingface/hub/models--meta-llama--Llama-2-7b-hf/snapshots/6fdf2e60f86ff2481f2241aaee459f85b5b0bbb9/ --device xxxx

  • Both XPU & CUDA generate same outputs with launch_server
    python -m sglang.launch_server --model-path ~/models/llama7b/6fdf2e60f86ff2481f2241aaee459f85b5b0bbb9/ --port 30000 --device xxx

ToDO In other PRs:

Functionality

  • Add the BKC to prepare the the xpu enabled pytorch and resolve the software combability with vllm when using the XPU enabled pytorch.
  • Enable other benchmarks.
  • Add UTs for the XPU device.

Performance

  • Customized ops support.

@liangan1 liangan1 marked this pull request as draft September 20, 2024 07:58
@liangan1 liangan1 changed the title Enable XPU device [Hardware|Feature] Enable XPU device Sep 20, 2024
@liangan1 liangan1 changed the title [Hardware|Feature] Enable XPU device [Feature, Hardware] Enable SGLang on XPU GPUs via PyTorch Sep 20, 2024
python/pyproject.toml Outdated Show resolved Hide resolved
@Ying1123 Ying1123 mentioned this pull request Sep 21, 2024
37 tasks
python/sglang/srt/model_executor/model_runner.py Outdated Show resolved Hide resolved
python/sglang/srt/model_executor/model_runner.py Outdated Show resolved Hide resolved
python/sglang/srt/model_executor/model_runner.py Outdated Show resolved Hide resolved
python/sglang/srt/model_executor/model_runner.py Outdated Show resolved Hide resolved
python/sglang/srt/sampling/sampling_batch_info.py Outdated Show resolved Hide resolved
python/sglang/srt/server_args.py Outdated Show resolved Hide resolved
@merrymercy
Copy link
Contributor

merrymercy commented Sep 24, 2024

@liangan1 Thanks for the contribution. Could you fix the unit tests https://github.com/sgl-project/sglang/tree/main/test?

@liangan1
Copy link
Contributor Author

@liangan1 Thanks for the contribution. Could you fix the unit tests https://github.com/sgl-project/sglang/tree/main/test?

Sure. I will work on it and let you know when all UTs pass.

@merrymercy
Copy link
Contributor

It is almost there! There are only a few remaining issues for multi-gpu test cases.

@merrymercy
Copy link
Contributor

We will push some big refactor soon, starting from #1534. To prevent too many conflicts, it is better to merge this PR soon or split this PR into multiple smaller ones.

@liangan1
Copy link
Contributor Author

liangan1 commented Sep 30, 2024

We will push some big refactor soon, starting from #1534. To prevent too many conflicts, it is better to merge this PR soon or split this PR into multiple smaller ones.

Sorry, I don't have enough GPUs to reproduce this tensor parallel related UTs, do you have any comments abouts about this issues? According to the UT logs, the distributed backend and model init has been finished and the timeout occurs during the CUDAGraph initialization, but I don't need to change anything for this part.

@liangan1 liangan1 force-pushed the liangan1/xpu-support branch from d28c647 to 3bddad3 Compare October 8, 2024 06:57
@liangan1 liangan1 mentioned this pull request Oct 8, 2024
@liangan1
Copy link
Contributor Author

liangan1 commented Oct 8, 2024

I took a look at the code and could not find the issue. Maybe you can split the PR into several smaller ones (e.g., change is_hip() to not flashinfer_is_available()) and we iterate on that together.

Now I also need to manually trigger the CI run for you because you have never contributed to this repo. If you have contributed one commit, the CI will be triggered for you automatically.

Spliting this PR into smaller ones.

@liangan1 liangan1 marked this pull request as ready for review October 11, 2024 00:58
python/pyproject.toml Show resolved Hide resolved
@liangan1 liangan1 requested a review from merrymercy October 12, 2024 11:33
@merrymercy merrymercy enabled auto-merge (squash) October 12, 2024 17:54
@merrymercy merrymercy merged commit 5d638c9 into sgl-project:main Oct 12, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants