Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Roadmap] vLLM Roadmap Q1 2025 #11862

Open
37 tasks
simon-mo opened this issue Jan 8, 2025 · 3 comments
Open
37 tasks

[Roadmap] vLLM Roadmap Q1 2025 #11862

simon-mo opened this issue Jan 8, 2025 · 3 comments

Comments

@simon-mo
Copy link
Collaborator

simon-mo commented Jan 8, 2025

This page is accessible via roadmap.vllm.ai

This is a living document! For each item here, we intend to link the RFC as well as discussion Slack channel in the vLLM Slack

vLLM Core

These projects will deliver performance enhancements to majority of workloads running on vLLM, and the core team has assigned priorities to signal what must get done. Help is also wanted here, especially for people want to get more involved in the core of vLLM.

Ship a performant and modular V1 architecture (#8779, #sig-v1)

Support large and long context models

  • (P0) MoE optimizations: Data Parallel for Attention + Expert Parallel for MoE
  • (P1) Productionize Prefill Disaggregation
  • (P1) Productionize KV Cache offloading to CPU and disk
  • (Help Wanted) Investigate context parallelism

Improved performance in batch mode

  • (P0) Optimized vLLM in post training workflow (#sig-post-training)
  • (P1) Efficiency in batch inference and long generations

Others

  • (P0) Blackwell Support
  • (P1) Track vLLM Performance
  • (Help Wanted) Extensible sampler

Model Support

Hardware Support

  • PagedAttention and Chunked Prefill on Trainium and Inferentia
  • Productionize and support large scale deployment of vLLM on TPU
  • Progress in Gaudi Support
  • Out of tree support for IBM Spyre, Ascend, and Tenstorrent ([RFC]: Hardware pluggable #11162)

Optimizations

CI and Developer Productivity

  • Wheel server
  • Multi-platform wheels and docker
  • Better performance tracker
  • Easier installation (optional dependencies, separate kernel packages)

Ecosystem Projects

These are independent projects that we love to have native collaboration and integration with!


If any of the items you wanted is not on the roadmap, your suggestion and contribution is strongly welcomed! Please feel free to comment in this thread, open feature request, or create an RFC.

Historical Roadmap: #9006, #5805, #3861, #2681, #244

@Zachary-ai-engineer
Copy link

Will vLLM consider optimizing communication operations such as all gather/reduce through 4bit or 8bit quantization?

@JaheimLee
Copy link

JaheimLee commented Jan 23, 2025

When will V1 support fp8 kv cache?

@youyc22
Copy link

youyc22 commented Jan 25, 2025

Will vllm consider supporting sparse attention like streamingLLM and h2o?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants