adapt custom allreduce for tensorrt llm #2511

yizhang2077 · 2024-12-18T16:57:08Z

Motivation

adapt for tensorrt llm custom allreduce, currently still use vllm distributed.
After this pr is merged and sgl-kernel is stable, we only need replace vllm.distribued to sglang.srt.distributed, and add a monkey patch, then we can remove vllm distributed

Modifications

Checklist

Format your code according to the Contributor Guide.
Add unit tests as outlined in the Contributor Guide.
Update documentation as needed, including docstrings or example tutorials.

test/srt/test_custom_allreduce.py

zhyncs · 2024-12-18T17:12:50Z

@yizhang2077 Could you paste the unit test result

yizhang2077 · 2024-12-18T17:25:48Z

@yizhang2077 Could you paste the unit test result

there are only correctness test here, do we need compare with vllm?

zhyncs · 2024-12-18T17:26:07Z

^ gentle ping cc @merrymercy

zhyncs · 2024-12-18T17:26:46Z

@yizhang2077 Could you paste the unit test result

there are only correctness test here, do we need compare with vllm?

ref #2481 (comment)

zhyncs · 2024-12-18T17:27:13Z

The most important case we care about is TP=8 and bs in [1, 1024]. The size is about 0 - 32MB. Can we do a more comprehensive test?

merrymercy · 2024-12-26T18:39:53Z

I think we can merge this one as it has correctness test. We can benchmark the performance part in future PRs.

Condition for switching to this

Faster than or equal to vllm's custom allreduce on all cases (TP=2,4,8) x (bs=1,2,4,8, .. 128, 1024)
Does not break AMD support

yizhang2077 · 2025-01-01T18:03:10Z

performance part ref #2904, ci may fail temperarily until new sgl-kernel version release

python/pyproject.toml

yizhang2077 assigned zhyncs Dec 18, 2024

yizhang2077 requested review from merrymercy, Ying1123, zhyncs, hnyls2002, ispobock and ByronHsu as code owners December 18, 2024 16:57

github-advanced-security bot found potential problems Dec 18, 2024

View reviewed changes

test/srt/test_custom_allreduce.py Dismissed Show dismissed Hide dismissed

test/srt/test_custom_allreduce.py Dismissed Show dismissed Hide dismissed

zhyncs added the high priority label Dec 18, 2024

yizhang2077 mentioned this pull request Dec 31, 2024

Support twoshot kernel #2688

Merged

3 tasks

adapt custom allreduce for tensorrt llm

417e60d

yizhang2077 force-pushed the adapt-custom-ops branch from 71b1073 to 417e60d Compare January 15, 2025 15:26

yizhang2077 and others added 2 commits January 15, 2025 23:35

update sgl-kernel version

ee59580

Merge branch 'main' into adapt-custom-ops

135ded4

zhyncs reviewed Jan 15, 2025

View reviewed changes

python/pyproject.toml Outdated Show resolved Hide resolved

zhyncs added 2 commits January 16, 2025 03:51

upd

f434d9c

Merge branch 'main' into adapt-custom-ops

9a0db43

zhyncs approved these changes Jan 15, 2025

View reviewed changes

zhyncs merged commit 767c9de into main Jan 15, 2025
1 of 2 checks passed

zhyncs deleted the adapt-custom-ops branch January 15, 2025 20:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adapt custom allreduce for tensorrt llm #2511

adapt custom allreduce for tensorrt llm #2511

yizhang2077 commented Dec 18, 2024

zhyncs commented Dec 18, 2024

yizhang2077 commented Dec 18, 2024 •

edited

Loading

zhyncs commented Dec 18, 2024

zhyncs commented Dec 18, 2024

zhyncs commented Dec 18, 2024

merrymercy commented Dec 26, 2024

yizhang2077 commented Jan 1, 2025 •

edited

Loading

adapt custom allreduce for tensorrt llm #2511

adapt custom allreduce for tensorrt llm #2511

Conversation

yizhang2077 commented Dec 18, 2024

Motivation

Modifications

Checklist

zhyncs commented Dec 18, 2024

yizhang2077 commented Dec 18, 2024 • edited Loading

zhyncs commented Dec 18, 2024

zhyncs commented Dec 18, 2024

zhyncs commented Dec 18, 2024

merrymercy commented Dec 26, 2024

yizhang2077 commented Jan 1, 2025 • edited Loading

yizhang2077 commented Dec 18, 2024 •

edited

Loading

yizhang2077 commented Jan 1, 2025 •

edited

Loading