[ROCm] Add support for Punica kernels on AMD GPUs #3140

kliuae · 2024-03-01T09:42:58Z

This PR adds ROCm support for punica kernels to enable multi-LoRA on AMD GPUs.
Some Punica files are slightly refactored so that the correct c++/hipcc compilers can be invoked when building under ROCm.
A custom bgmv shrink kernel is added to account for the difference in warp size between AMD's GPUs and Nvidia's.
The port has been tested on MI210, and the unit tests applying LoRA are passing.

WoosukKwon · 2024-03-04T18:57:10Z

@hongxiayang @lcskrishna Could you help review this PR?

WoosukKwon · 2024-03-13T06:27:39Z

@hongxiayang @dllehr-amd Could you review this PR? This is an important PR that enables the AMD GPUs to support multi-LoRA serving, which is a key feature in vLLM liked by many users.

simon-mo · 2024-03-28T17:03:04Z

This script can help verify this works end to end https://github.com/vllm-project/vllm/blob/main/examples/multilora_inference.py

hongxiayang · 2024-03-29T22:57:00Z

will check. Thanks for this effort.

hongxiayang

I did some verification and ran the e2e verify script.

At first punica C extension was not build by default and import failed. And then I added the env VLLM_INSTALL_PUNICA_KERNELS and then rebuild the docker image. It was ok afterwards. Maybe worth a mention in the documentation somewhere if not already.

hongxiayang · 2024-03-29T23:00:11Z

csrc/punica/type_convert.h

+#else
+
+#include <hip/hip_bf16.h>
+#include <hip/hip_fp16.h>


If this file is hipified, this should not be needed.

Ideally we should not be needing this, but we may still need this for the time being as in quite a few environments I've tested with, including the rocm/pytorch images Dockerfile.rocm builds from, the hipify tool seems to have failed to convert the bf16 header.

hongxiayang · 2024-03-29T23:03:12Z

csrc/punica/punica_pybind.cpp

+
+//====== pybind ======
+
+#define DEFINE_pybind(name) m.def(#name, &name, #name);


Is this macro used?

This is not used but it is shipped with the implementation for cuda, so I kept it when refactoring for consistency.

@kliuae please ping @simon-mo or @WoosukKwon to further review/approve this PR so that it can be merged.

jamestwhedbee · 2024-04-23T16:04:29Z

I was going to try this out soon, is this in a good spot or is it still being worked?

kliuae · 2024-04-25T06:41:09Z

I was going to try this out soon, is this in a good spot or is it still being worked?

It's in a good state for testing, though occasionally I'll be merging in the upstream to fix conflicts before it gets merged.

WoosukKwon

@kliuae Sorry for the late review. The PR looks good. Could you please resolved the merge conflict in CMakeLists.txt so that I can merge it? Thanks!

Alexei-V-Ivanov-AMD · 2024-05-02T23:10:41Z

@kliuae Sorry for the late review. The PR looks good. Could you please resolved the merge conflict in CMakeLists.txt so that I can merge it? Thanks!

@kliuae Please resolve the latest merge conflict. Your PR is instrumental for our ongoing effort. Thank you very much!

kliuae · 2024-05-08T09:20:00Z

@WoosukKwon Merge conflicts are resolved

Co-authored-by: miloice <[email protected]>

big-yellow-duck and others added 15 commits February 1, 2024 00:57

Port to rocm

3d6e635

Add patch for rocm amd_hip_bf16.h

f862ade

Merge branch 'upstream' into vllm-multi-lora-rocm-dev

7c5aebb

Added bgmv shrink kernel for rocm

ef6fde9

Merge branch 'upstream' into vllm-multi-lora-rocm-wip-merge

fa506e1

Add kernel launch

7d90ae3

Add kernel launch

5cff57a

Add 2d kernel

3dcb7c0

Remove unused kernel

d0e2e8e

Merge branch 'upstream' into vllm-multi-lora-rocm-merge

21e8527

Remove unused patches

0723d50

Format code

8978418

Merge branch 'upstream' into vllm-multi-lora-rocm-merge

a1f7e1e

Use hip runtime warp size

792800b

Drop rank 128

24e1109

WoosukKwon added the rocm label Mar 1, 2024

Remove redundant copy

e355763

kliuae added 2 commits March 5, 2024 05:52

Merge branch 'upstream' into vllm-multi-lora-rocm-dev

25cbb65

Merge upstream

a1cb014

kliuae added 3 commits March 27, 2024 03:12

Merge upstream

81bf75c

Merge upstream

cefb82a

Merge upstream

aa4928e

hongxiayang mentioned this pull request Mar 29, 2024

added one mapping used in vllm pytorch/pytorch#123001

Closed

hongxiayang approved these changes Mar 30, 2024

View reviewed changes

simon-mo assigned WoosukKwon Apr 16, 2024

Merge upstream

a5d9dae

kliuae added 2 commits April 17, 2024 08:37

Build punica by default

2985c79

update

d575a84

WoosukKwon self-requested a review May 2, 2024 16:27

WoosukKwon approved these changes May 2, 2024

View reviewed changes

kliuae added 2 commits May 3, 2024 06:31

Merge upstream

52ef599

Merge upstream

7645373

WoosukKwon merged commit ff5abcd into vllm-project:main May 9, 2024
53 of 55 checks passed

robertgshaw2-redhat pushed a commit to neuralmagic/nm-vllm that referenced this pull request May 19, 2024

[ROCm] Add support for Punica kernels on AMD GPUs (vllm-project#3140)

396a546

Co-authored-by: miloice <[email protected]>

dtrifiro pushed a commit to dtrifiro/vllm that referenced this pull request May 21, 2024

[ROCm] Add support for Punica kernels on AMD GPUs (vllm-project#3140)

0a838de

Co-authored-by: miloice <[email protected]>

Temirulan pushed a commit to Temirulan/vllm-whisper that referenced this pull request Sep 6, 2024

[ROCm] Add support for Punica kernels on AMD GPUs (vllm-project#3140)

9850ab2

Co-authored-by: miloice <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ROCm] Add support for Punica kernels on AMD GPUs #3140

[ROCm] Add support for Punica kernels on AMD GPUs #3140

kliuae commented Mar 1, 2024

WoosukKwon commented Mar 4, 2024

WoosukKwon commented Mar 13, 2024

simon-mo commented Mar 28, 2024

hongxiayang commented Mar 29, 2024

hongxiayang left a comment

hongxiayang Mar 29, 2024

kliuae Apr 1, 2024

hongxiayang Mar 29, 2024

kliuae Apr 1, 2024

hongxiayang Apr 12, 2024

jamestwhedbee commented Apr 23, 2024

kliuae commented Apr 25, 2024

WoosukKwon left a comment

Alexei-V-Ivanov-AMD commented May 2, 2024

kliuae commented May 8, 2024


		//====== pybind ======

		#define DEFINE_pybind(name) m.def(#name, &name, #name);

[ROCm] Add support for Punica kernels on AMD GPUs #3140

[ROCm] Add support for Punica kernels on AMD GPUs #3140

Conversation

kliuae commented Mar 1, 2024

WoosukKwon commented Mar 4, 2024

WoosukKwon commented Mar 13, 2024

simon-mo commented Mar 28, 2024

hongxiayang commented Mar 29, 2024

hongxiayang left a comment

Choose a reason for hiding this comment

hongxiayang Mar 29, 2024

Choose a reason for hiding this comment

kliuae Apr 1, 2024

Choose a reason for hiding this comment

hongxiayang Mar 29, 2024

Choose a reason for hiding this comment

kliuae Apr 1, 2024

Choose a reason for hiding this comment

hongxiayang Apr 12, 2024

Choose a reason for hiding this comment

jamestwhedbee commented Apr 23, 2024

kliuae commented Apr 25, 2024

WoosukKwon left a comment

Choose a reason for hiding this comment

Alexei-V-Ivanov-AMD commented May 2, 2024

kliuae commented May 8, 2024