Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add POD-Attention to FlashInfer #858

Open
wants to merge 21 commits into
base: main
Choose a base branch
from

Conversation

AKKamath
Copy link

@AKKamath AKKamath commented Feb 17, 2025

Adds POD-Attention kernel (https://arxiv.org/abs/2410.18038) with all necessary files.
Both AOT and JIT are supported.

NOTE: prefill.cuh and decode.cuh have been changed.
All helper functions that use threadIdx and blockIdx now use function arguments that supply these instead. Since these device functions are inline, this shouldn't impact performance. The prefill kernels (SinglePrefillWithKVCacheKernel and BatchPrefillWithPagedKVCacheKernel) have been changed into device functions (called by POD-Attention) and new wrapper kernels for the device function have been created.

@yzh119 yzh119 requested review from yzh119 and nandor February 17, 2025 04:24
@yzh119 yzh119 mentioned this pull request Feb 19, 2025
15 tasks
@AKKamath AKKamath closed this Feb 19, 2025
@AKKamath AKKamath deleted the new_branch branch February 19, 2025 21:44
@AKKamath AKKamath restored the new_branch branch February 20, 2025 03:10
@AKKamath AKKamath reopened this Feb 20, 2025
@AKKamath
Copy link
Author

Sorry, renamed the branch on my fork which accidentally closed this PR. Reopened now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant