Improve mixed dtype GEMM #1972

IwakuraRein · 2024-12-04T01:49:08Z

Remove the suffix MixedInput in the mixed dtype GEMM's collective schedules. See Eliminate MixedInput kernel schedule tags. #1956
Refactor the mixed dtype GEMM's mainloop and examples. Move some logics to new headers
Add interleaved converters. User can use a cute layout to specify the value shuffle pattern to activate these converters. See examples/55_hopper_mixed_dtype_gemm/55_hopper_int4_bf16_gemm.cu and include/cutlass/detail/collective/mixed_input_utils.hpp for more details

Algy · 2024-12-04T08:06:44Z

Just out of curiosity, how much does it improve latency on which cases?

IwakuraRein · 2024-12-04T21:24:02Z

Just out of curiosity, how much does it improve latency on which cases?

@Algy Hi. The new functionality eliminates some instructions in the dequantization phase for the 4bit x 16bit case and int8 x 16bit case. It is expected to have ~3% improvement when dequantization is not negligible, eg., when problem K is very small. You can turn on/off this feature in the int4xbf16 example by changing the ValueShuffle type.

There's additional improvements for all the cases from refactoring the dequantization codes.

The other improvements are refactoring and improving the robustness. Eg., you can now use the argument --shuffle to turn on/off the offline layout shuffle feature in the int4xbf16 and int4xfp8 example to see the perf gain.

IwakuraRein added 2 commits December 3, 2024 15:03

update

afbdbc8

fix a typo

f00d092

hwu36 approved these changes Dec 6, 2024

View reviewed changes

hwu36 merged commit 4c42f73 into NVIDIA:main Dec 6, 2024

tlrmchlsmth mentioned this pull request Dec 30, 2024

[Build][Kernel] Update CUTLASS to v3.6.0 vllm-project/vllm#11607

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve mixed dtype GEMM #1972

Improve mixed dtype GEMM #1972

IwakuraRein commented Dec 4, 2024

Algy commented Dec 4, 2024

IwakuraRein commented Dec 4, 2024 •

edited

Loading

Improve mixed dtype GEMM #1972

Improve mixed dtype GEMM #1972

Conversation

IwakuraRein commented Dec 4, 2024

Algy commented Dec 4, 2024

IwakuraRein commented Dec 4, 2024 • edited Loading

IwakuraRein commented Dec 4, 2024 •

edited

Loading