Fix gptq for moe layers #2300

merrymercy · 2024-12-01T11:25:20Z

We can run python3 -m sglang.launch_server --model TheBloke/Mixtral-8x7B-v0.1-GPTQ with vllm's fused moe layer, but cannot run it with sglang's fused moe layer.
We should probably add this model to nightly eval.

Test cases

python3 -m sglang.launch_server --model TheBloke/Mixtral-8x7B-v0.1-GPTQ

python3 -m sglang.launch_server --model casperhansen/deepseek-coder-v2-instruct-awq --trust-remote-code --tp 2

zhyncs · 2024-12-01T11:41:54Z

ref https://github.com/sgl-project/sglang/actions/runs/12105111106

zhyncs · 2024-12-02T15:40:04Z

also cc @HandH1998 @ispobock

zhyncs · 2024-12-03T14:38:08Z

python3 -m sglang.launch_server --model TheBloke/Mixtral-8x7B-v0.1-GPTQ
python3 -m sglang.launch_server --model casperhansen/deepseek-coder-v2-instruct-awq --trust-remote-code --tp 2 --disable-mla

zhyncs · 2024-12-03T14:40:13Z

https://github.com/sgl-project/sglang/actions/runs/12141975882

zhyncs · 2024-12-03T14:53:54Z

follow-up PRs (may after v0.4)

support AWQ with enable MLA @ispobock @HandH1998
use SGLang's FusedMoE with quantization @zhyncs

Fix gptq for moe layers

60ac9d3

merrymercy requested review from Ying1123, hnyls2002, zhyncs, ispobock and ByronHsu as code owners December 1, 2024 11:25

merrymercy mentioned this pull request Dec 1, 2024

[Feature] support gptq or awq for deepseek v2 #2270

Closed

2 tasks

zhyncs self-assigned this Dec 1, 2024

zhyncs mentioned this pull request Dec 1, 2024

[Bug] Unable to load GPTQ Mixtral 8x7 v0.1 with SGLang #2117

Closed

5 tasks

Fix deepseek

1d6f86e

zhyncs assigned ispobock Dec 2, 2024

zhyncs added the high priority label Dec 2, 2024

zhyncs added 3 commits December 3, 2024 05:26

Merge branch 'main' into pr-fix-gptq-moe

03a7d4f

fix

353fcd4

fix

bf8238c

This was referenced Dec 3, 2024

[Feature] support AWQ with enable MLA #2336

Closed

[Feature] use SGLang's FusedMoE with quantization #2337

Open

zhyncs merged commit 1228f7c into main Dec 3, 2024
17 of 18 checks passed

zhyncs deleted the pr-fix-gptq-moe branch December 3, 2024 15:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix gptq for moe layers #2300

Fix gptq for moe layers #2300

merrymercy commented Dec 1, 2024 •

edited

Loading

zhyncs commented Dec 1, 2024

zhyncs commented Dec 2, 2024

zhyncs commented Dec 3, 2024

zhyncs commented Dec 3, 2024

zhyncs commented Dec 3, 2024

Fix gptq for moe layers #2300

Fix gptq for moe layers #2300

Conversation

merrymercy commented Dec 1, 2024 • edited Loading

Test cases

zhyncs commented Dec 1, 2024

zhyncs commented Dec 2, 2024

zhyncs commented Dec 3, 2024

zhyncs commented Dec 3, 2024

zhyncs commented Dec 3, 2024

merrymercy commented Dec 1, 2024 •

edited

Loading