[Feature, Hardware] Enable SGLang on XPU GPUs via PyTorch #1480

liangan1 · 2024-09-20T07:58:38Z

Pytorch already support XPU device since 2.4 release and xpu is also supported in OpenAI Trition. So, it should works with the Trition attention backend in SGLang. In this PR, We add 'xpu' device into SGLang.

Blocked issue: The vllm is only compatible with 2.3 now and he binary whl support for torch xpu should be ready since torch-2.5(should be ready in the Oct/2024), so we should wait the vllm to be compatible with torch-2.5.

Status

Both XPU & CUDA works with latency benchmark
LLama-2-7b works for the latency benchmark.
VLLM_TEST_COMPILE_NO_CUSTOM_OPS=1 python -m sglang.bench_latency --model-path ~/.cache/huggingface/hub/models--meta-llama--Llama-2-7b-hf/snapshots/6fdf2e60f86ff2481f2241aaee459f85b5b0bbb9/ --device xxxx
Both XPU & CUDA generate same outputs with launch_server
python -m sglang.launch_server --model-path ~/models/llama7b/6fdf2e60f86ff2481f2241aaee459f85b5b0bbb9/ --port 30000 --device xxx

ToDO In other PRs:

Functionality

Add the BKC to prepare the the xpu enabled pytorch and resolve the software combability with vllm when using the XPU enabled pytorch.
Enable other benchmarks.
Add UTs for the XPU device.

Performance

Customized ops support.

python/pyproject.toml

python/sglang/srt/layers/triton_attention/extend_attention.py

python/sglang/srt/model_executor/model_runner.py

python/sglang/srt/sampling/sampling_batch_info.py

python/sglang/srt/server_args.py

Co-authored-by: Lianmin Zheng <[email protected]>

merrymercy · 2024-09-24T10:14:29Z

@liangan1 Thanks for the contribution. Could you fix the unit tests https://github.com/sgl-project/sglang/tree/main/test?

liangan1 · 2024-09-25T01:28:53Z

@liangan1 Thanks for the contribution. Could you fix the unit tests https://github.com/sgl-project/sglang/tree/main/test?

Sure. I will work on it and let you know when all UTs pass.

merrymercy · 2024-09-28T21:32:00Z

It is almost there! There are only a few remaining issues for multi-gpu test cases.

merrymercy · 2024-09-29T09:40:19Z

We will push some big refactor soon, starting from #1534. To prevent too many conflicts, it is better to merge this PR soon or split this PR into multiple smaller ones.

liangan1 · 2024-09-30T01:13:03Z

We will push some big refactor soon, starting from #1534. To prevent too many conflicts, it is better to merge this PR soon or split this PR into multiple smaller ones.

Sorry, I don't have enough GPUs to reproduce this tensor parallel related UTs, do you have any comments abouts about this issues? According to the UT logs, the distributed backend and model init has been finished and the timeout occurs during the CUDAGraph initialization, but I don't need to change anything for this part.

liangan1 · 2024-10-08T07:52:25Z

I took a look at the code and could not find the issue. Maybe you can split the PR into several smaller ones (e.g., change is_hip() to not flashinfer_is_available()) and we iterate on that together.

Now I also need to manually trigger the CI run for you because you have never contributed to this repo. If you have contributed one commit, the CI will be triggered for you automatically.

Spliting this PR into smaller ones.

Firstly we will try to enable 'device' support in the runtime. Add device support #1607

python/pyproject.toml

Enable XPU device

db59d67

liangan1 marked this pull request as draft September 20, 2024 07:58

liangan1 changed the title ~~Enable XPU device~~ [Hardware|Feature] Enable XPU device Sep 20, 2024

liangan1 changed the title ~~[Hardware|Feature] Enable XPU device~~ [Feature, Hardware] Enable SGLang on XPU GPUs via PyTorch Sep 20, 2024

zhyncs reviewed Sep 21, 2024

View reviewed changes

python/pyproject.toml Outdated Show resolved Hide resolved

Ying1123 mentioned this pull request Sep 21, 2024

Development Roadmap (2024 Q4) #1487

Open

37 tasks

merrymercy requested changes Sep 22, 2024

View reviewed changes

liangan1 and others added 9 commits September 22, 2024 17:01

Update pyproject.toml

7a86003

Update pyproject.toml

15b4f87

Update python/sglang/srt/model_executor/model_runner.py

d5645bc

Co-authored-by: Lianmin Zheng <[email protected]>

Update python/sglang/srt/model_executor/model_runner.py

ca9bf0c

Co-authored-by: Lianmin Zheng <[email protected]>

Update model_runner.py

48aac38

Update python/sglang/srt/server_args.py

5749dbb

Co-authored-by: Lianmin Zheng <[email protected]>

Merge branch 'main' into liangan1/xpu-support

ef880b7

Refine code according to the comments

5013c39

Format code with pre-commit

c3454b3

Merge branch 'main' into liangan1/xpu-support

04d6697

liangan1 added 5 commits September 25, 2024 09:29

Merge branch 'main' into liangan1/xpu-support

c1f0102

Update tp_worker.py

80e18e6

Merge branch 'main' into liangan1/xpu-support

bf7da0b

Fix bugs in tpserver

08044fc

Fix init issue with fork

f68371a

liangan1 added 2 commits September 28, 2024 22:39

Fix weight update UT fail

491df66

Merge branch 'main' into liangan1/xpu-support

b7d0a52

Merge branch 'main' into liangan1/xpu-support

d3dff27

liangan1 added 13 commits October 7, 2024 19:27

Merge branch 'main' into liangan1/xpu-support

01925ca

cache torch.cuda.is_available()

7558b87

Fix memory pool init issue

3970e41

Remove the trition kernel changes

817dcab

Remove print in the cuda graph init

a5425e5

Remove modeule runner changes.

c01c2d8

Remove unuse change in cuda graph init

a71cb4d

Fix issue

ae166d5

Refine depedency files

3fa682a

add device support for model runner:load_model

ed6e941

Add device choice in model_runner

e09bb2d

Fix model runner init issue

0fa1e72

Add device choice for module runner init

3bddad3

liangan1 force-pushed the liangan1/xpu-support branch from d28c647 to 3bddad3 Compare October 8, 2024 06:57

liangan1 mentioned this pull request Oct 8, 2024

Add device support #1607

Merged

liangan1 added 2 commits October 9, 2024 10:06

Merge branch 'main' into liangan1/xpu-support

53d510b

Merge branch 'main' into liangan1/xpu-support

6cb023e

liangan1 marked this pull request as ready for review October 11, 2024 00:58

merrymercy requested changes Oct 11, 2024

View reviewed changes

python/pyproject.toml Show resolved Hide resolved

liangan1 added 3 commits October 12, 2024 15:43

Merge branch 'main' into liangan1/xpu-support

780a17b

Refine code

4d2ef01

Refine the python/pyproject.toml for xpu and fix the test fail for xpu

17afcc5

liangan1 requested a review from merrymercy October 12, 2024 11:33

merrymercy added 2 commits October 12, 2024 10:52

Update pyproject.toml

5bb7468

Update pyproject.toml

3299ba5

merrymercy approved these changes Oct 12, 2024

View reviewed changes

merrymercy enabled auto-merge (squash) October 12, 2024 17:54

merrymercy merged commit 5d638c9 into sgl-project:main Oct 12, 2024
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature, Hardware] Enable SGLang on XPU GPUs via PyTorch #1480

[Feature, Hardware] Enable SGLang on XPU GPUs via PyTorch #1480

liangan1 commented Sep 20, 2024 •

edited

Loading

merrymercy commented Sep 24, 2024 •

edited

Loading

liangan1 commented Sep 25, 2024

merrymercy commented Sep 28, 2024

merrymercy commented Sep 29, 2024

liangan1 commented Sep 30, 2024 •

edited

Loading

liangan1 commented Oct 8, 2024

[Feature, Hardware] Enable SGLang on XPU GPUs via PyTorch #1480

[Feature, Hardware] Enable SGLang on XPU GPUs via PyTorch #1480

Conversation

liangan1 commented Sep 20, 2024 • edited Loading

Status

ToDO In other PRs:

Functionality

Performance

merrymercy commented Sep 24, 2024 • edited Loading

liangan1 commented Sep 25, 2024

merrymercy commented Sep 28, 2024

merrymercy commented Sep 29, 2024

liangan1 commented Sep 30, 2024 • edited Loading

liangan1 commented Oct 8, 2024

liangan1 commented Sep 20, 2024 •

edited

Loading

merrymercy commented Sep 24, 2024 •

edited

Loading

liangan1 commented Sep 30, 2024 •

edited

Loading