[Roadmap] vLLM Roadmap Q1 2025 #11862

simon-mo · 2025-01-08T18:12:42Z

Zachary-ai-engineer · 2025-01-22T02:26:24Z

Will vLLM consider optimizing communication operations such as all gather/reduce through 4bit or 8bit quantization?

JaheimLee · 2025-01-23T09:24:41Z

When will V1 support fp8 kv cache?

youyc22 · 2025-01-25T12:04:42Z

Will vllm consider supporting sparse attention like streamingLLM and h2o?

simon-mo added misc and removed misc labels Jan 8, 2025

simon-mo pinned this issue Jan 8, 2025

FurtherAI mentioned this issue Jan 9, 2025

[Feature]: Support Multiple Tasks Per Model #11905

Open

1 task

WoutDeRijck mentioned this issue Jan 15, 2025

[Feature]: Add support for attention score output #11365

Open

1 task

Provide feedback