[Misc] Use vllm-flash-attn instead of flash-attn #4686

WoosukKwon · 2024-05-08T16:01:08Z

This PR is to use the pre-built vllm-flash-attn wheel instead of the original flash-attn.

LiuXiaoxuanPKU

LGTM!

maxin9966 · 2024-05-22T04:40:12Z

Thank you very much. By the way, does vllm-flash-attn support Turing architecture GPUs like the 2080ti? I recall that the Turing GPU supports flash-attn1.

CYang0515 · 2024-07-23T07:09:13Z

I‘m confused. What is the difference between vllm-flash-attn and flash-attn，why use vllm-flash-attn instead of flash attn?

CYang0515 · 2024-07-23T07:11:41Z

Currently, vllm-flash-attn only supports cuda12.1. Should I recompile it from source code for the other cuda or torch version?

[Misc] Use vllm-flash-attn instead of flash-attn

de121f5

WoosukKwon requested a review from LiuXiaoxuanPKU May 8, 2024 16:01

LiuXiaoxuanPKU approved these changes May 8, 2024

View reviewed changes

Merge branch 'main' into vllm-flash-attn

25a9e6c

WoosukKwon merged commit 89579a2 into main May 8, 2024
24 of 25 checks passed

WoosukKwon deleted the vllm-flash-attn branch May 8, 2024 20:15

z103cb pushed a commit to z103cb/opendatahub_vllm that referenced this pull request May 9, 2024

[Misc] Use vllm-flash-attn instead of flash-attn (vllm-project#4686)

3ab3a02

robertgshaw2-redhat pushed a commit to neuralmagic/nm-vllm that referenced this pull request May 19, 2024

[Misc] Use vllm-flash-attn instead of flash-attn (vllm-project#4686)

b5967c4

dtrifiro pushed a commit to dtrifiro/vllm that referenced this pull request May 21, 2024

[Misc] Use vllm-flash-attn instead of flash-attn (vllm-project#4686)

fe03b5c

atineoSE mentioned this pull request May 21, 2024

[Bug]: Cannot use FlashAttention-2 backend because the flash_attn package is not found #4906

Closed

sivanantha321 mentioned this pull request Jun 4, 2024

Add nccl package and Bump vLLM to 0.4.3 for huggingface runtime kserve/kserve#3723

Merged

8 tasks

Eta0 mentioned this pull request Jun 15, 2024

build(vllm-tensorizer): Compile vllm-flash-attn from source coreweave/ml-containers#70

Merged

Temirulan pushed a commit to Temirulan/vllm-whisper that referenced this pull request Sep 6, 2024

[Misc] Use vllm-flash-attn instead of flash-attn (vllm-project#4686)

5c27922

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Misc] Use vllm-flash-attn instead of flash-attn #4686

[Misc] Use vllm-flash-attn instead of flash-attn #4686

WoosukKwon commented May 8, 2024

LiuXiaoxuanPKU left a comment

maxin9966 commented May 22, 2024

CYang0515 commented Jul 23, 2024

CYang0515 commented Jul 23, 2024

[Misc] Use vllm-flash-attn instead of flash-attn #4686

[Misc] Use vllm-flash-attn instead of flash-attn #4686

Conversation

WoosukKwon commented May 8, 2024

LiuXiaoxuanPKU left a comment

Choose a reason for hiding this comment

maxin9966 commented May 22, 2024

CYang0515 commented Jul 23, 2024

CYang0515 commented Jul 23, 2024