Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Tune FP8 rowwise bmm tile hueristic (#3256)
Summary: X-link: facebookresearch/FBGEMM#357 Pull Request resolved: #3256 Refactor the heuristic for general shapes to tune FP8 rowwise bmm based on Luca's [PR](pytorch/pytorch#134773). It provides **10% speedup on average (up to 36%)**. In general, the speedup with new heuristic is similar to the offline profiling tuning (D64482604) (1% difference on average): - some shapes in the new heuristic are 24% faster than offline profiling tuning - some shapes are slower The above new heuristic is for general shapes. We could follow up on adding more customized heuristic for required shapes to further improve performance (e.g., those shapes that are slower). Additionally: - compared to offline profiling tuning, this heuristic can be applied to all shapes, and **does not require running/storing any offline profiling for each new shape as used in the offline profiling tuning approach. Performance of this new heuristic is close to optimal performance with further customized heuristic if needed** More results can be found in [this data sheet](https://docs.google.com/spreadsheets/d/1ZUdiKjcl7ATSDoYwivS_W91kSuhiw89RnKiMp17Pjms/edit?gid=0#gid=0) Reviewed By: jianyuh Differential Revision: D64563314 fbshipit-source-id: 75c37ca257c8c1420c65db781495151b7cb6e920
- Loading branch information