Make torch TP composable with torchao #2436

kwen2501 · 2024-12-10T23:54:40Z

Motivation

The TP styles in PyTorch trunk performs a scatter from rank 0 to distribute a full tensor into DTensors.
While this is safer for training, it is unnecessary for inference where all ranks' weight already come from the same checkpoint. Therefore, we choose to do a local sharding instead of scatter.

Modifications

This PR customizes the TP styles impls to use _shard_tensor instead of distribute_tensor.

Worked configs:

TP + int8wo
TP + fp8wo

$ export ENABLE_INTRA_NODE_COMM=1
$ python3 -m sglang.bench_one_batch --model meta-llama/Meta-Llama-3-8B --batch-size 1 --input 128 --output 8 --json-model-override-args '{"architectures": ["TorchNativeLlamaForCausalLM"]}' --enable-torch-compile --torchao-config int8wo --tp 4

Output:

Benchmark ...
Prefill. latency: 0.05945 s, throughput:   2153.25 token/s
Decode.  latency: 0.00321 s, throughput:    311.98 token/s
Decode.  latency: 0.00299 s, throughput:    334.26 token/s
Decode.  latency: 0.00298 s, throughput:    335.68 token/s
Decode.  latency: 0.00297 s, throughput:    337.05 token/s
Decode.  latency: 0.00296 s, throughput:    337.33 token/s
Decode.  median latency: 0.00297 s, median throughput:    337.05 token/s
Total. latency:  0.080 s, throughput:   1690.04 token/s

Checklist

Format your code according to the Contributor Guide.
Add unit tests as outlined in the Contributor Guide.
Update documentation as needed, including docstrings or example tutorials.

Customize parallel style to perform local sharding instead of scatter worked configs: TP + int8wo, TP + fp8wo

Make torch TP composable with torchao

26fbb50

Customize parallel style to perform local sharding instead of scatter worked configs: TP + int8wo, TP + fp8wo

kwen2501 requested review from merrymercy, Ying1123, hnyls2002, zhyncs, ispobock and ByronHsu as code owners December 10, 2024 23:54

merrymercy merged commit ece7249 into sgl-project:main Dec 11, 2024
14 of 15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make torch TP composable with torchao #2436

Make torch TP composable with torchao #2436

kwen2501 commented Dec 10, 2024 •

edited

Loading

Make torch TP composable with torchao #2436

Make torch TP composable with torchao #2436

Conversation

kwen2501 commented Dec 10, 2024 • edited Loading

Motivation

Modifications

Checklist

kwen2501 commented Dec 10, 2024 •

edited

Loading