Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adapt custom allreduce for tensorrt llm #2511

Merged
merged 5 commits into from
Jan 15, 2025
Merged

adapt custom allreduce for tensorrt llm #2511

merged 5 commits into from
Jan 15, 2025

Conversation

yizhang2077
Copy link
Collaborator

Motivation

adapt for tensorrt llm custom allreduce, currently still use vllm distributed.
After this pr is merged and sgl-kernel is stable, we only need replace vllm.distribued to sglang.srt.distributed, and add a monkey patch, then we can remove vllm distributed

Modifications

Checklist

  • Format your code according to the Contributor Guide.
  • Add unit tests as outlined in the Contributor Guide.
  • Update documentation as needed, including docstrings or example tutorials.

test/srt/test_custom_allreduce.py Dismissed Show dismissed Hide dismissed
test/srt/test_custom_allreduce.py Dismissed Show dismissed Hide dismissed
@zhyncs
Copy link
Member

zhyncs commented Dec 18, 2024

@yizhang2077 Could you paste the unit test result

@yizhang2077
Copy link
Collaborator Author

yizhang2077 commented Dec 18, 2024

@yizhang2077 Could you paste the unit test result

there are only correctness test here, do we need compare with vllm?

@zhyncs
Copy link
Member

zhyncs commented Dec 18, 2024

^ gentle ping cc @merrymercy

@zhyncs
Copy link
Member

zhyncs commented Dec 18, 2024

@yizhang2077 Could you paste the unit test result

there are only correctness test here, do we need compare with vllm?

ref #2481 (comment)

@zhyncs
Copy link
Member

zhyncs commented Dec 18, 2024

The most important case we care about is TP=8 and bs in [1, 1024]. The size is about 0 - 32MB. Can we do a more comprehensive test?

@merrymercy
Copy link
Contributor

I think we can merge this one as it has correctness test. We can benchmark the performance part in future PRs.

Condition for switching to this

  • Faster than or equal to vllm's custom allreduce on all cases (TP=2,4,8) x (bs=1,2,4,8, .. 128, 1024)
  • Does not break AMD support

@yizhang2077 yizhang2077 mentioned this pull request Dec 31, 2024
3 tasks
@yizhang2077
Copy link
Collaborator Author

yizhang2077 commented Jan 1, 2025

performance part ref #2904, ci may fail temperarily until new sgl-kernel version release

python/pyproject.toml Outdated Show resolved Hide resolved
@zhyncs zhyncs merged commit 767c9de into main Jan 15, 2025
1 of 2 checks passed
@zhyncs zhyncs deleted the adapt-custom-ops branch January 15, 2025 20:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants