Skip to content

Commit

Permalink
Tune FP8 rowwise bmm tile hueristic (#3256)
Browse files Browse the repository at this point in the history
Summary:
X-link: facebookresearch/FBGEMM#357

Pull Request resolved: #3256

Refactor the heuristic for general shapes to tune FP8 rowwise bmm based on Luca's [PR](pytorch/pytorch#134773). It provides **10% speedup on average (up to 36%)**. In general, the speedup with new heuristic is similar to the offline profiling tuning (D64482604) (1% difference on average):
- some shapes in the new heuristic are 24% faster than offline profiling tuning
- some shapes are slower

The above new heuristic is for general shapes. We could follow up on adding more customized heuristic for required shapes to further improve performance (e.g., those shapes that are slower). Additionally:
-  compared to offline profiling tuning, this heuristic can be applied to all shapes, and **does not require running/storing any offline profiling for each new shape as used in the offline profiling tuning approach. Performance of this new heuristic is close to optimal performance with further customized heuristic if needed**

More results can be found in [this data sheet](https://docs.google.com/spreadsheets/d/1ZUdiKjcl7ATSDoYwivS_W91kSuhiw89RnKiMp17Pjms/edit?gid=0#gid=0)

Reviewed By: jianyuh

Differential Revision: D64563314

fbshipit-source-id: 75c37ca257c8c1420c65db781495151b7cb6e920
  • Loading branch information
jiawenliu64 authored and facebook-github-bot committed Oct 21, 2024
1 parent bc67b37 commit 8294513
Showing 1 changed file with 277 additions and 60 deletions.
Loading

0 comments on commit 8294513

Please sign in to comment.