-
-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Pull requests: vllm-project/vllm
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[Kernel] add triton fused moe kernel for gptq/awq
#12185
opened Jan 18, 2025 by
jinzhen-lin
Loading…
[Misc] Add BNB support to GLM4-V model
ready
ONLY add when PR is ready to merge/full CI is needed
#12184
opened Jan 18, 2025 by
Isotr0py
Loading…
[torch.compile] store inductor compiled Python file
ready
ONLY add when PR is ready to merge/full CI is needed
#12182
opened Jan 18, 2025 by
youkaichao
Loading…
[Hardware][Gaudi][Bugfix] Fix HPU tensor parallelism, enable multiprocessing executor
#12167
opened Jan 17, 2025 by
kzawora-intel
Loading…
[Quantization/Parameter] WIP: Another Implementation of the Quantization Parameter Subclass Substitution
#12158
opened Jan 17, 2025 by
cennn
Loading…
[Core] Optimize topp/topk calculation in sampler
#12156
opened Jan 17, 2025 by
afierka-intel
•
Draft
[WIP][Hardware][CPU] testing branch for mlperf
ci/build
documentation
Improvements or additions to documentation
needs-rebase
#12141
opened Jan 17, 2025 by
bigPYJ1151
•
Draft
[V1] Add V1 support of Qwen2-VL
documentation
Improvements or additions to documentation
ready
ONLY add when PR is ready to merge/full CI is needed
#12128
opened Jan 16, 2025 by
ywang96
Loading…
[Misc] Update to Transformers 4.48
ci/build
ready
ONLY add when PR is ready to merge/full CI is needed
#12120
opened Jan 16, 2025 by
tlrmchlsmth
Loading…
[BUILD] Add VLLM_BUILD_EXT to control custom op build
ci/build
#12116
opened Jan 16, 2025 by
MengqingCao
Loading…
[Misc]add modules_to_not_convert attribute to gptq series
#12103
opened Jan 16, 2025 by
1096125073
Loading…
Use CUDA 12.4 as default for release and nightly wheels
ci/build
documentation
Improvements or additions to documentation
#12098
opened Jan 15, 2025 by
mgoin
Loading…
Add: Support for Sparse24Bitmask Compressed Models
#12097
opened Jan 15, 2025 by
rahul-tuli
•
Draft
1 task
[V1][Perf] Reduce scheduling overhead in model runner after cuda sync
#12094
opened Jan 15, 2025 by
youngkent
Loading…
[WIP][Kernel] Flash Attention 3 Support
ci/build
#12093
opened Jan 15, 2025 by
LucasWilkinson
•
Draft
[V1][WIP] Add KV cache group dimension to block table
#12086
opened Jan 15, 2025 by
heheda12345
•
Draft
[V1] Add notes on test_async_engine.py::test_abort
#12081
opened Jan 15, 2025 by
heheda12345
Loading…
[V1] Optimize block table copy from CPU to GPU (take 2)
ci/build
#12078
opened Jan 15, 2025 by
WoosukKwon
•
Draft
[Bugfix] Fix num_heads value for simple connector when tp enabled
ready
ONLY add when PR is ready to merge/full CI is needed
#12074
opened Jan 15, 2025 by
ShangmingCai
Loading…
Previous Next
ProTip!
Find all pull requests that aren't related to any open issues with -linked:issue.