Rowwise F8F8BF16 GEMMs - Auto-generate kernel library, auto-generated heuristics cache, add to FBGEMM quantize_bench #3210

manishucsd · 2024-10-02T15:29:39Z

Summary:

Summary

Auto-generated F8F8BF16 Rowwise Scaled Kernels.
Auto-generation of Heuristic Cache.
Add to quantize_bench

Performance Improvements

DisaggBench

Cultass
Prefill B=1 T=2048: Elapsed: 109.13ms FLOPs: 333.74TF/s
Prefill B=1 T=4928: Elapsed: 272.55ms FLOPs: 338.62TF/s
Prefill B=1 T=6336: Elapsed: 354.93ms FLOPs: 342.55TF/s
Prefill B=1 T=8192: Elapsed: 468.64ms FLOPs: 346.06TF/s

Cultass extensions
Prefill B=1 T=2048: Elapsed: 108.83ms FLOPs: 334.66TF/s
Prefill B=1 T=4928: Elapsed: 260.46ms FLOPs: 354.34TF/s
Prefill B=1 T=6336: Elapsed: 336.39ms FLOPs: 361.43TF/s
Prefill B=1 T=8192: Elapsed: 442.64ms FLOPs: 366.39TF/s

Differential Revision: D63744054

netlify · 2024-10-02T15:29:58Z

✅ Deploy Preview for pytorch-fbgemm-docs ready!

Name	Link
🔨 Latest commit	`33855a3`
🔍 Latest deploy log	https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/66fe2b1455dd240008cd38c1
😎 Deploy Preview	https://deploy-preview-3210--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

facebook-github-bot · 2024-10-02T15:29:58Z

This pull request was exported from Phabricator. Differential Revision: D63744054

… heuristics cache, add to FBGEMM quantize_bench (pytorch#3210) Summary: # Summary - Auto-generated F8F8BF16 Rowwise Scaled Kernels. - Auto-generation of Heuristic Cache. - Add to quantize_bench # Performance Improvements ## DisaggBench Cultass Prefill B=1 T=2048: Elapsed: 109.13ms FLOPs: 333.74TF/s Prefill B=1 T=4928: Elapsed: 272.55ms FLOPs: 338.62TF/s Prefill B=1 T=6336: Elapsed: 354.93ms FLOPs: 342.55TF/s Prefill B=1 T=8192: Elapsed: 468.64ms FLOPs: 346.06TF/s Cultass extensions Prefill B=1 T=2048: Elapsed: 108.83ms FLOPs: 334.66TF/s Prefill B=1 T=4928: Elapsed: 260.46ms FLOPs: 354.34TF/s Prefill B=1 T=6336: Elapsed: 336.39ms FLOPs: 361.43TF/s Prefill B=1 T=8192: Elapsed: 442.64ms FLOPs: 366.39TF/s Differential Revision: D63744054

facebook-github-bot · 2024-10-03T05:26:49Z

This pull request was exported from Phabricator. Differential Revision: D63744054

facebook-github-bot · 2024-12-28T03:13:52Z

Hi @manishucsd!

Thank you for your pull request.

We require contributors to sign our Contributor License Agreement, and yours needs attention.

You currently have a record in our system, but the CLA is no longer valid, and will need to be resubmitted.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

facebook-github-bot added the cla signed label Oct 2, 2024

facebook-github-bot added the fb-exported label Oct 2, 2024

manishucsd force-pushed the export-D63744054 branch from 8d65a52 to 33855a3 Compare October 3, 2024 05:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rowwise F8F8BF16 GEMMs - Auto-generate kernel library, auto-generated heuristics cache, add to FBGEMM quantize_bench #3210

Rowwise F8F8BF16 GEMMs - Auto-generate kernel library, auto-generated heuristics cache, add to FBGEMM quantize_bench #3210

manishucsd commented Oct 2, 2024

netlify bot commented Oct 2, 2024 •

edited

Loading

facebook-github-bot commented Oct 2, 2024

facebook-github-bot commented Oct 3, 2024

facebook-github-bot commented Dec 28, 2024

Rowwise F8F8BF16 GEMMs - Auto-generate kernel library, auto-generated heuristics cache, add to FBGEMM quantize_bench #3210

Are you sure you want to change the base?

Rowwise F8F8BF16 GEMMs - Auto-generate kernel library, auto-generated heuristics cache, add to FBGEMM quantize_bench #3210

Conversation

manishucsd commented Oct 2, 2024

Summary

Performance Improvements

DisaggBench

netlify bot commented Oct 2, 2024 • edited Loading

✅ Deploy Preview for pytorch-fbgemm-docs ready!

facebook-github-bot commented Oct 2, 2024

facebook-github-bot commented Oct 3, 2024

facebook-github-bot commented Dec 28, 2024

Process

netlify bot commented Oct 2, 2024 •

edited

Loading