Add a unittest for fused_moe #2416

BBuf · 2024-12-09T03:24:53Z

add bf16 qwen2-57b-a14b tuning config for tp2/tp4 in A800.
add a fused_moe_triton unittest for bf16, fp16 and fp8_w8a8.
refine fused_moe benchmark readme.md

When I wan't to deploy qwen2-57b-a14b model in A800 with fp8, the error happens:

The reason is that in Triton, the Ampere architecture currently doesn't support the fp8e4nv dtype. To detect this situation early, I added the fused_moe test mentioned above, and verify whether the fused_moe operator can work properly on the current GPU by checking the GPU architecture information.

BBuf · 2024-12-09T03:26:28Z

test/srt/test_fused_moe.py

+
+    def _test_case(self, m, n, k, e, topk, dtype, use_fp8_w8a8=False):
+        if use_fp8_w8a8:
+            # AssertionError: fp8e4nv data type is not supported on CUDA arch < 89


If the GPU is Ampere architecture, we should fallback from fused_moe to either naive implementation or torch.compile implementation to prevent errors.

test/srt/run_suite.py

BBuf and others added 6 commits December 9, 2024 11:14

add fused moe unittest

5e20c98

refine

96028ca

refine

37cae67

Merge branch 'sgl-project:main' into main

dad0e98

refine

defd24f

Merge branch 'main' of github.com:BBuf/sglang

f9556d1

BBuf requested review from merrymercy, Ying1123, zhyncs and ispobock as code owners December 9, 2024 03:24

BBuf commented Dec 9, 2024

View reviewed changes

BBuf added 2 commits December 9, 2024 12:17

lint

89174c8

lint

bbf8de5

merrymercy reviewed Dec 9, 2024

View reviewed changes

test/srt/run_suite.py Outdated Show resolved Hide resolved

Update test/srt/run_suite.py

cd361a6

merrymercy merged commit 3844feb into sgl-project:main Dec 9, 2024
0 of 14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a unittest for fused_moe #2416

Add a unittest for fused_moe #2416

BBuf commented Dec 9, 2024 •

edited

Loading

BBuf Dec 9, 2024

Add a unittest for fused_moe #2416

Add a unittest for fused_moe #2416

Conversation

BBuf commented Dec 9, 2024 • edited Loading

BBuf Dec 9, 2024

Choose a reason for hiding this comment

BBuf commented Dec 9, 2024 •

edited

Loading