Revamp -tritonintelgpu-optimize-reduction-locality
#2752
Labels
Milestone
-tritonintelgpu-optimize-reduction-locality
#2752
-tritonintelgpu-optimize-reduction-locality
is incorrect as register reordering may lead to incorrect results. Also, it can be greatly improved so optimal layouts are propagated instead of unoptimal sliced ones.A DPAS layout that "covers" the tensor in dimension 0 can be represented as a 7D layout:
In which dimensions 0, 2, 4 and 5 correspond to dimension 1 in the original layout.
A reduction on the original DPAS layout axis 1 (fast changing axis) can be represented as follows in this new layout:
This last step is crucial to get right. It should be split exactly as:
reshape
to original shapeconvert_layout
to original layoutAs the original layout is suboptimal and
reshape
operations propagate layouts, swapping the reshape and layout conversions would lead to propagating the suboptimal layout. This is what we are getting wrong (in addition to semantics change due to register ordering) in the current pass.The text was updated successfully, but these errors were encountered: