Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve mixed dtype GEMM #1972

Merged
merged 2 commits into from
Dec 6, 2024
Merged

Conversation

IwakuraRein
Copy link
Contributor

  • Remove the suffix MixedInput in the mixed dtype GEMM's collective schedules. See Eliminate MixedInput kernel schedule tags. #1956
  • Refactor the mixed dtype GEMM's mainloop and examples. Move some logics to new headers
  • Add interleaved converters. User can use a cute layout to specify the value shuffle pattern to activate these converters. See examples/55_hopper_mixed_dtype_gemm/55_hopper_int4_bf16_gemm.cu and include/cutlass/detail/collective/mixed_input_utils.hpp for more details

@Algy
Copy link
Contributor

Algy commented Dec 4, 2024

Just out of curiosity, how much does it improve latency on which cases?

@IwakuraRein
Copy link
Contributor Author

IwakuraRein commented Dec 4, 2024

Just out of curiosity, how much does it improve latency on which cases?

@Algy Hi. The new functionality eliminates some instructions in the dequantization phase for the 4bit x 16bit case and int8 x 16bit case. It is expected to have ~3% improvement when dequantization is not negligible, eg., when problem K is very small. You can turn on/off this feature in the int4xbf16 example by changing the ValueShuffle type.

There's additional improvements for all the cases from refactoring the dequantization codes.

The other improvements are refactoring and improving the robustness. Eg., you can now use the argument --shuffle to turn on/off the offline layout shuffle feature in the int4xbf16 and int4xfp8 example to see the perf gain.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants