Move FP8 to sglang #2366

HaiShaw · 2024-12-05T19:45:08Z

Motivation

Move FP8 layers definition to SGLang

Modifications

As it is.
Kernels come next.

Checklist

[+] Format your code according to the Contributor Guide.
[+] Add unit tests as outlined in the Contributor Guide.
[+] Update documentation as needed, including docstrings or example tutorials.

Co-authored-by: HAI <[email protected]>

zhyncs

Except for vllm.model_executor.layers.quantization, LinearBase, and _custom_ops, everything else needs to be removed. Thanks!

python/sglang/srt/layers/quantization/fp8.py

zhyncs · 2024-12-05T20:09:51Z

python/sglang/srt/layers/quantization/fp8.py

+    per_tensor_dequantize,
+    requantize_with_max_scale,
+)
+from vllm.model_executor.parameter import ModelWeightParameter, PerTensorScaleParameter


remove this

This is still in use, will decouple and migrate later.

python/sglang/srt/layers/quantization/fp8.py

…-project#2359)

zhyncs · 2024-12-06T07:16:29Z

move to #2370
All credit goes to @HaiShaw Thanks!

xiaobochen123 and others added 2 commits December 5, 2024 10:44

MoE Expert Parallel Impl (sgl-project#2203)

f9b7c64

Co-authored-by: HAI <[email protected]>

Move FP8 to sglang

c229398

HaiShaw requested review from merrymercy, Ying1123, zhyncs and ispobock as code owners December 5, 2024 19:45

zhyncs reviewed Dec 5, 2024

View reviewed changes

merrymercy and others added 5 commits December 5, 2024 13:42

Fix the cuda graph capture range for small #max-running-requests (sgl…

337fe53

…-project#2359)

remove unneccessarty vllm dependencies

47b1e33

Merge branch 'main' into moe_fp8

1049088

[router] use 2-gpu-runner (sgl-project#2368)

fc6387e

Merge branch 'main' into moe_fp8

9677f61

HaiShaw requested a review from zhyncs December 6, 2024 04:04

zhyncs force-pushed the main branch from fc6387e to 64fceab Compare December 6, 2024 06:14

zhyncs requested review from ByronHsu and hnyls2002 as code owners December 6, 2024 06:14

Merge branch 'main' into moe_fp8

1a98996

HaiShaw closed this Dec 6, 2024

zhyncs mentioned this pull request Dec 7, 2024

fix: resolve fp8 moe issue #2387

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move FP8 to sglang #2366

Move FP8 to sglang #2366

HaiShaw commented Dec 5, 2024

zhyncs left a comment

zhyncs Dec 5, 2024

HaiShaw Dec 6, 2024

zhyncs commented Dec 6, 2024

Move FP8 to sglang #2366

Move FP8 to sglang #2366

Conversation

HaiShaw commented Dec 5, 2024

Motivation

Modifications

Checklist

zhyncs left a comment

Choose a reason for hiding this comment

zhyncs Dec 5, 2024

Choose a reason for hiding this comment

HaiShaw Dec 6, 2024

Choose a reason for hiding this comment

zhyncs commented Dec 6, 2024