Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ROCm] Add support for Punica kernels on AMD GPUs #3140

Merged
merged 26 commits into from
May 9, 2024

Conversation

kliuae
Copy link
Contributor

@kliuae kliuae commented Mar 1, 2024

This PR adds ROCm support for punica kernels to enable multi-LoRA on AMD GPUs.
Some Punica files are slightly refactored so that the correct c++/hipcc compilers can be invoked when building under ROCm.
A custom bgmv shrink kernel is added to account for the difference in warp size between AMD's GPUs and Nvidia's.
The port has been tested on MI210, and the unit tests applying LoRA are passing.

@WoosukKwon WoosukKwon added the rocm label Mar 1, 2024
@WoosukKwon
Copy link
Collaborator

@hongxiayang @lcskrishna Could you help review this PR?

@WoosukKwon
Copy link
Collaborator

@hongxiayang @dllehr-amd Could you review this PR? This is an important PR that enables the AMD GPUs to support multi-LoRA serving, which is a key feature in vLLM liked by many users.

@simon-mo
Copy link
Collaborator

This script can help verify this works end to end https://github.com/vllm-project/vllm/blob/main/examples/multilora_inference.py

@hongxiayang
Copy link
Collaborator

will check. Thanks for this effort.

Copy link
Collaborator

@hongxiayang hongxiayang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some verification and ran the e2e verify script.

At first punica C extension was not build by default and import failed. And then I added the env VLLM_INSTALL_PUNICA_KERNELS and then rebuild the docker image. It was ok afterwards. Maybe worth a mention in the documentation somewhere if not already.

#else

#include <hip/hip_bf16.h>
#include <hip/hip_fp16.h>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this file is hipified, this should not be needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we should not be needing this, but we may still need this for the time being as in quite a few environments I've tested with, including the rocm/pytorch images Dockerfile.rocm builds from, the hipify tool seems to have failed to convert the bf16 header.


//====== pybind ======

#define DEFINE_pybind(name) m.def(#name, &name, #name);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this macro used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not used but it is shipped with the implementation for cuda, so I kept it when refactoring for consistency.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kliuae please ping @simon-mo or @WoosukKwon to further review/approve this PR so that it can be merged.

@jamestwhedbee
Copy link
Contributor

I was going to try this out soon, is this in a good spot or is it still being worked?

@kliuae
Copy link
Contributor Author

kliuae commented Apr 25, 2024

I was going to try this out soon, is this in a good spot or is it still being worked?

It's in a good state for testing, though occasionally I'll be merging in the upstream to fix conflicts before it gets merged.

@WoosukKwon WoosukKwon self-requested a review May 2, 2024 16:27
Copy link
Collaborator

@WoosukKwon WoosukKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kliuae Sorry for the late review. The PR looks good. Could you please resolved the merge conflict in CMakeLists.txt so that I can merge it? Thanks!

@Alexei-V-Ivanov-AMD
Copy link
Contributor

@kliuae Sorry for the late review. The PR looks good. Could you please resolved the merge conflict in CMakeLists.txt so that I can merge it? Thanks!

@kliuae Please resolve the latest merge conflict. Your PR is instrumental for our ongoing effort. Thank you very much!

@kliuae
Copy link
Contributor Author

kliuae commented May 8, 2024

@WoosukKwon Merge conflicts are resolved

@WoosukKwon WoosukKwon merged commit ff5abcd into vllm-project:main May 9, 2024
53 of 55 checks passed
robertgshaw2-redhat pushed a commit to neuralmagic/nm-vllm that referenced this pull request May 19, 2024
dtrifiro pushed a commit to dtrifiro/vllm that referenced this pull request May 21, 2024
Temirulan pushed a commit to Temirulan/vllm-whisper that referenced this pull request Sep 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants