[Kernel] Revert the API change of Attention.forward #12038

heheda12345 · 2025-01-14T12:25:48Z

The changing of kv_cache -> _kv_cache and attn_metadata -> _attn_metadata in Attention.forward by #11967 breaks models that pass these two arguments with kwargs, e.g., phi3_small:

attn_output = self.attn(q, k, v, kv_cache, attn_metadata=attn_metadata)

And this script is crashed

from vllm import LLM, SamplingParams

# Sample prompts.
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]
# Create a sampling params object.
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)

# Create an LLM.
llm = LLM(model="microsoft/Phi-3-small-8k-instruct", trust_remote_code=True)
# Generate texts from the prompts. The output is a list of RequestOutput objects
# that contain the prompt, generated text, and other information.
outputs = llm.generate(prompts, sampling_params)
# Print the outputs.
for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

This pr reverts the API change to fix this problem.
CC @youkaichao
NOTE: the above script is still crashed due to other problems. I'm investigating and fixing it.

Signed-off-by: Chen Zhang <[email protected]>

github-actions · 2025-01-14T12:25:59Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

youkaichao

LGTM, thanks for the fix!

heheda12345 · 2025-01-14T13:11:54Z

Fixing phi3small model by #12040

Signed-off-by: Chen Zhang <[email protected]> Signed-off-by: ice-tong <[email protected]>

Signed-off-by: Chen Zhang <[email protected]>

Signed-off-by: Chen Zhang <[email protected]> Signed-off-by: hzh <[email protected]>

Signed-off-by: Chen Zhang <[email protected]> Signed-off-by: Bowen Wang <[email protected]>

Signed-off-by: Chen Zhang <[email protected]>

revert attn interface

ca0564b

Signed-off-by: Chen Zhang <[email protected]>

youkaichao approved these changes Jan 14, 2025

View reviewed changes

youkaichao merged commit 1f18adb into vllm-project:main Jan 14, 2025
15 of 17 checks passed

ice-tong pushed a commit to ice-tong/vllm that referenced this pull request Jan 18, 2025

[Kernel] Revert the API change of Attention.forward (vllm-project#12038)

9e9ca03

Signed-off-by: Chen Zhang <[email protected]> Signed-off-by: ice-tong <[email protected]>

joennlae pushed a commit to 44ai-labs/vllm that referenced this pull request Jan 19, 2025

[Kernel] Revert the API change of Attention.forward (vllm-project#12038)

bf90b68

Signed-off-by: Chen Zhang <[email protected]>

joennlae pushed a commit to 44ai-labs/vllm that referenced this pull request Jan 19, 2025

[Kernel] Revert the API change of Attention.forward (vllm-project#12038)

50ec549

Signed-off-by: Chen Zhang <[email protected]>

jikunshang pushed a commit to jikunshang/vllm that referenced this pull request Jan 21, 2025

[Kernel] Revert the API change of Attention.forward (vllm-project#12038)

850c5d4

Signed-off-by: Chen Zhang <[email protected]>

HwwwwwwwH pushed a commit to HwwwwwwwH/vllm that referenced this pull request Jan 22, 2025

[Kernel] Revert the API change of Attention.forward (vllm-project#12038)

941a5d5

Signed-off-by: Chen Zhang <[email protected]> Signed-off-by: hzh <[email protected]>

abmfy pushed a commit to abmfy/vllm-flashinfer that referenced this pull request Jan 24, 2025

[Kernel] Revert the API change of Attention.forward (vllm-project#12038)

5f1749a

Signed-off-by: Chen Zhang <[email protected]> Signed-off-by: Bowen Wang <[email protected]>

abmfy pushed a commit to abmfy/vllm-flashinfer that referenced this pull request Jan 24, 2025

[Kernel] Revert the API change of Attention.forward (vllm-project#12038)

ca5d5d1

Signed-off-by: Chen Zhang <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Kernel] Revert the API change of Attention.forward #12038

[Kernel] Revert the API change of Attention.forward #12038

heheda12345 commented Jan 14, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Jan 14, 2025

youkaichao left a comment

heheda12345 commented Jan 14, 2025

[Kernel] Revert the API change of Attention.forward #12038

[Kernel] Revert the API change of Attention.forward #12038

Conversation

heheda12345 commented Jan 14, 2025 • edited by github-actions bot Loading

github-actions bot commented Jan 14, 2025

youkaichao left a comment

Choose a reason for hiding this comment

heheda12345 commented Jan 14, 2025

heheda12345 commented Jan 14, 2025 •

edited by github-actions bot

Loading