Add POD-Attention to FlashInfer #858

AKKamath · 2025-02-17T03:49:14Z

Adds POD-Attention kernel (https://arxiv.org/abs/2410.18038) with all necessary files.
Both AOT and JIT are supported.

NOTE: prefill.cuh and decode.cuh have been changed.
All helper functions that use threadIdx and blockIdx now use function arguments that supply these instead. Since these device functions are inline, this shouldn't impact performance. The prefill kernels (SinglePrefillWithKVCacheKernel and BatchPrefillWithPagedKVCacheKernel) have been changed into device functions (called by POD-Attention) and new wrapper kernels for the device function have been created.

… params

…r to POD instead of batched decode.

…y decode kernel is manually called and needs to be merged into POD kernel

…smatching!

AKKamath · 2025-02-20T03:13:28Z

Sorry, renamed the branch on my fork which accidentally closed this PR. Reopened now.

Aditya K Kamath added 19 commits February 14, 2025 17:02

Separate single prefill into device and kernel to prepare for POD

a2e8212

Add initial pod-attention code

e5fe841

Fix all compiler errors with newer version of repo

5bd1607

Split BatchDecode into kernel and device function

99b2318

Rewrite pod params setup to explicit prefill params. TODO: Add decode…

658319c

… params

Add wrapper to POD for batched decode. TODO: Actually redirect wrappe…

3c2b817

…r to POD instead of batched decode.

Add decode wrapper to pod test

08c323f

Add both prefill and decode params to POD-Attention methods. Currentl…

54daf81

…y decode kernel is manually called and needs to be merged into POD kernel

Make threadId a function parameter

637a5b5

Use more realistic parameters for pod test

e19d05c

Working merged kernel. TODO: SM-aware scheduling

55e5f71

Add tensor-based decode version of POD. Currently decode output is mi…

3e10ac0

…smatching!

Add pod to compilation

a2602af

Fix issues where POD's decode output didn't match.

558757c

Add warning for POD's no_tensor version which performs worse

d382394

Fix compilation errors and cleanup POD

eb3dc4e

Working version of POD

e68c473

Update setup

d6485d7

Speed up compilation for POD

bcb8ee6

yzh119 requested review from yzh119 and nandor February 17, 2025 04:24

Add JIT support for POD

547f14e

yzh119 mentioned this pull request Feb 19, 2025

[Roadmap] FlashInfer v0.2 to v0.3 #675

Open

15 tasks

Fix POD performance bug (at the expense of larger compile time)

0c2cdd5

AKKamath closed this Feb 19, 2025

AKKamath deleted the new_branch branch February 19, 2025 21:44

AKKamath restored the new_branch branch February 20, 2025 03:10

AKKamath reopened this Feb 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add POD-Attention to FlashInfer #858

Add POD-Attention to FlashInfer #858

AKKamath commented Feb 17, 2025 •

edited

Loading

AKKamath commented Feb 20, 2025

Add POD-Attention to FlashInfer #858

Are you sure you want to change the base?

Add POD-Attention to FlashInfer #858

Conversation

AKKamath commented Feb 17, 2025 • edited Loading

AKKamath commented Feb 20, 2025

AKKamath commented Feb 17, 2025 •

edited

Loading