[MFM-20250115] Merge from ROCm/main to llama_fp8#360
Merged
hongxiayang merged 537 commits intoROCm:llama_fp8_12062024from EmbeddedLLM:main-to-llama-fp8Jan 15, 2025
+59,614-29,878
Commits
This pull request is big! We're only showing the most recent 250 commits
Commits on Dec 24, 2024
Commits on Dec 25, 2024
- authored
- authored
- authored
- authored
Commits on Dec 26, 2024
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Dec 27, 2024
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
[Misc] Improve BNB loader to handle mixture of sharded and merged weights with same suffix (vllm-project#11566)
authored- authored
Commits on Dec 28, 2024
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Dec 29, 2024
- authored
- authored
- authored
- authored
Commits on Dec 30, 2024
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Dec 31, 2024
- authored
[Bugfix] Move the _touch(computed_blocks) call in the allocate_slots method to after the check for allocating new blocks. (vllm-project#11565)
authored- authored
Commits on Jan 1, 2025
- authored
- authored
- authored
- authored
- authored
Commits on Jan 2, 2025
- authored
- authored
- authored
- authored
[Bugfix] Free cross attention block table for preempted-for-recompute sequence group. (vllm-project#10013)
authored- authored
- authored
- authored
Commits on Jan 3, 2025
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Jan 4, 2025
- authored
- authored
- authored
- authored
- authored
[Core][Bugfix] Use correct device to initialize GPU data during CUDA-graph-capture (vllm-project#11233)
authored- authored
- authored
- authored
Commits on Jan 5, 2025
- authored
- authored
- authored
Commits on Jan 6, 2025
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- committed
- committed
[V1] Extend beyond image modality and support mixed-modality inference with Llava-OneVision (vllm-project#11685)
- authored
Commits on Jan 7, 2025
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Jan 8, 2025
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
[Kernel][Triton][AMD] Use block size heuristic for avg 2.8x speedup for int8 models (vllm-project#11698)
authored- authored
- authored
Commits on Jan 9, 2025
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Jan 10, 2025
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
[Bugfix] Check that number of images matches number of <|image|> tokens with mllama (vllm-project#11939)
authored
Commits on Jan 11, 2025
- authored
- authored
[Bugfix][SpecDecode] Adjust Eagle model architecture to align with intended design (vllm-project#11672)
authored- authored
- authored
- authored
Commits on Jan 12, 2025
- authored
- authored
- authored
- authored
Commits on Jan 13, 2025
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- committed
- committed
- committed
- committed
Commits on Jan 14, 2025
Commits on Jan 15, 2025
- committed
- committed
- committed
- committed
- committed
- committed