[Bugfix] Restore support for larger block sizes #11259

kzawora-intel · 2024-12-17T11:40:31Z

This PR reverts #10938, expands description of block size and adds assertion for CUDA-supported block sizes in CacheConfig.
While GPU kernels might not support block sizes greater than 32, other accelerators do. On HPU, going below block size 128 is very detrimental to performance, and 128 is used there by default.

github-actions · 2024-12-17T11:40:44Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

This reverts commit 69ba344. Signed-off-by: Konrad Zawora <[email protected]>

Signed-off-by: Konrad Zawora <[email protected]>

mgoin

This is reasonable and thanks for the clear error for CUDA

simon-mo

Thanks for catching this

Signed-off-by: Konrad Zawora <[email protected]> Signed-off-by: Sage Moore <[email protected]>

comaniac · 2024-12-20T19:03:33Z

Found this PR because I got an error saying block size can only be up to 32 on CUDA. However, I was using larger block sizes with FlashAttention without any issue. Just want to confirm that do we really have this constraint for all use cases on CUDA?

Signed-off-by: Konrad Zawora <[email protected]>

kzawora-intel added 2 commits December 17, 2024 13:41

Revert "[Bugfix] Fix block size validation (vllm-project#10938)"

17aeab2

This reverts commit 69ba344. Signed-off-by: Konrad Zawora <[email protected]>

Add CUDA block size check

9059886

Signed-off-by: Konrad Zawora <[email protected]>

kzawora-intel force-pushed the revert-10938-qian/fix-block-size-default branch from 007a573 to 9059886 Compare December 17, 2024 11:42

kzawora-intel changed the title ~~Restore support for larger block sizes~~ [Bugfix] Restore support for larger block sizes Dec 17, 2024

kzawora-intel mentioned this pull request Dec 17, 2024

fix block-size description #10938

Merged

kzawora-intel added 3 commits December 17, 2024 13:47

current_platform.is_cuda_alike() -> current_platform.is_cuda()

e222055

Signed-off-by: Konrad Zawora <[email protected]>

make yapf happy

999fb56

Signed-off-by: Konrad Zawora <[email protected]>

add None to permitted values

c9a38a4

Signed-off-by: Konrad Zawora <[email protected]>

kzawora-intel force-pushed the revert-10938-qian/fix-block-size-default branch from ecff947 to c9a38a4 Compare December 17, 2024 12:36

turns out, block size 4 is also used in UTs

9941b61

Signed-off-by: Konrad Zawora <[email protected]>

mgoin approved these changes Dec 17, 2024

View reviewed changes

simon-mo approved these changes Dec 17, 2024

View reviewed changes

simon-mo merged commit 866fa45 into vllm-project:main Dec 18, 2024
20 of 23 checks passed

SageMoore pushed a commit to neuralmagic/vllm that referenced this pull request Dec 19, 2024

[Bugfix] Restore support for larger block sizes (vllm-project#11259)

b6082ff

Signed-off-by: Konrad Zawora <[email protected]> Signed-off-by: Sage Moore <[email protected]>

BKitor pushed a commit to BKitor/vllm that referenced this pull request Dec 30, 2024

[Bugfix] Restore support for larger block sizes (vllm-project#11259)

7cd4eb8

Signed-off-by: Konrad Zawora <[email protected]>

comaniac mentioned this pull request Jan 3, 2025

[Bugfix] Remove block size constraint #11723

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix] Restore support for larger block sizes #11259

[Bugfix] Restore support for larger block sizes #11259

kzawora-intel commented Dec 17, 2024 •

edited by github-actions bot

Loading

github-actions bot commented Dec 17, 2024

mgoin left a comment

simon-mo left a comment •

edited

Loading

comaniac commented Dec 20, 2024

[Bugfix] Restore support for larger block sizes #11259

[Bugfix] Restore support for larger block sizes #11259

Conversation

kzawora-intel commented Dec 17, 2024 • edited by github-actions bot Loading

github-actions bot commented Dec 17, 2024

mgoin left a comment

Choose a reason for hiding this comment

simon-mo left a comment • edited Loading

Choose a reason for hiding this comment

comaniac commented Dec 20, 2024

kzawora-intel commented Dec 17, 2024 •

edited by github-actions bot

Loading

simon-mo left a comment •

edited

Loading