[RISCV] Vector saturating truncation not using VNCLIP #73424

boomshroom · 2023-11-26T05:14:59Z

Closely related to #68466 and #58266, but for a different platform.

Much like x86's (V)PACK(U,S)S and ARM's (U,S)QXTN(2), RISC-V's V extension includes VNCLIP(U).W(V,X,I), which performs a right shift and then performs either signed or unsigned saturation to vector of integers with half the width. With a shift value of 0 (whether from an immediate or the register; the V extension specification doesn't specifically talk about this use case), the instruction can be used to perform a saturating truncation from vectors of one integer type, the the type with half the width.

Given the complexity of matching saturating truncation, and the apparent duplication between targets of this functionality, it could be worth considering matching it earlier in the pipeline. Matching a saturating truncation preceded by a shift should be much simpler and can be done with TableGen.

Example code with current code gen, next to x86 and AArch64 equivalents.

https://rust.godbolt.org/z/bKdsq8GGM

llvmbot · 2023-11-26T05:23:35Z

@llvm/issue-subscribers-backend-risc-v

Author: Angelo Bulfone (boomshroom)

Closely related to #68466 and #58266, but for a different platform.

Much like x86's (V)PACK(U,S)S and ARM's (U,S)QXTN(2), RISC-V's V extension includes VNCLIP(U).W(V,X,I), which performs a right shift and then performs either signed or unsigned saturation to vector of integers with half the width. With a shift value of 0 (whether from an immediate or the register; the V extension specification doesn't specifically talk about this use case), the instruction can be used to perform a saturating truncation from vectors of one integer type, the the type with half the width.

Given the complexity of matching saturating truncation, and the apparent duplication between targets of this functionality, it could be worth considering matching it earlier in the pipeline. Matching a saturating truncation preceded by a shift should be much simpler and can be done with TableGen.

Example code with current code gen, next to x86 and AArch64 equivalents.

https://rust.godbolt.org/z/bKdsq8GGM

sun-jacobi · 2023-12-07T08:21:49Z

@boomshroom Hi, can I work on this issue?

boomshroom · 2023-12-07T08:50:36Z

Sure. It probably wouldn't be too hard to copy and paste the logic from one of the several other targets that match this pattern, though the fact that it does seem to be duplicated so many times makes me suspicious that doing so is really the best idea. I personally think adding a target-independent "saturating truncation" node would be better, though I'm less sure how to go about that and it would entail a larger change to the overall infrastructure.

sun-jacobi · 2023-12-09T12:58:38Z

@boomshroom Can you assign this issue to me?

sun-jacobi · 2023-12-11T10:54:57Z

The previous pattern matching around the trunc mostly uses vnsrl.
I think using vnsrl might be a better choice, since vnclip may bring new complexity due to the vxrm.

boomshroom · 2023-12-13T00:19:38Z

If the shift is 0, then it should never have to round and thus vxrm shouldn't pose an issue.

Also vnsrl always floors when inexact and wraps when the value is out of range. Saturating when out of range is what vnclip does, like the other instructions on other architectures do. Worth noting is that it only saturates unsigned to unsigned and signed to signed, so a min or max is still required to reduce the range to what's representable in the desired signedness.

I still wonder why such an operation that seems fairly useful, relatively common in ISAs, and is non-trivial to match, doesn't have a more target-independent representation in LLVM. Could save some work matching the patterns before lowering to individual ISAs and reduce the duplication between them.

Fixes llvm#73424. If the range created by a min and max is precisely the range of trunc target, the min/max could be removed.

This patch closed #73424, which is also a missed-optimization case similar to #68466 on X86. ## Source Code ``` define void @trunc_sat_i8i16(ptr %x, ptr %y) { %1 = load <8 x i16>, ptr %x, align 16 %2 = tail call <8 x i16> @llvm.smax.v8i16(<8 x i16> %1, <8 x i16> <i16 -128, i16 -128, i16 -128, i16 -128, i16 -128, i16 -128, i16 -128, i16 -128>) %3 = tail call <8 x i16> @llvm.smin.v8i16(<8 x i16> %2, <8 x i16> <i16 127, i16 127, i16 127, i16 127, i16 127, i16 127, i16 127, i16 127>) %4 = trunc <8 x i16> %3 to <8 x i8> store <8 x i8> %4, ptr %y, align 8 ret void } ``` ## Before this patch: ``` trunc_sat_i8i16: # @trunc_maxmin_id_i8i16 vsetivli zero, 8, e16, m1, ta, ma vle16.v v8, (a0) li a0, -128 vmax.vx v8, v8, a0 li a0, 127 vmin.vx v8, v8, a0 vsetvli zero, zero, e8, mf2, ta, ma vnsrl.wi v8, v8, 0 vse8.v v8, (a1) ret ``` ## After this patch: ``` trunc_sat_i8i16: # @trunc_maxmin_id_i8i16 vsetivli zero, 8, e8, mf2, ta, ma vle16.v v8, (a0) csrwi vxrm, 0 vnclip.wi v8, v8, 0 vse8.v v8, (a1) ret ```

sun-jacobi · 2024-01-18T13:46:51Z

I'm planning to add a new opcode ISD::TRUNCSAT to SelectionDAG, and do a target-independent combine for it.

github-actions bot added the new issue label Nov 26, 2023

EugeneZelenko added backend:RISC-V and removed new issue labels Nov 26, 2023

dtcxzyw added the missed-optimization label Nov 27, 2023

dtcxzyw assigned dtcxzyw and sun-jacobi and unassigned dtcxzyw Dec 9, 2023

sun-jacobi mentioned this issue Dec 12, 2023

[RISCV][ISel] Remove redundant min/max in saturating truncation #75145

Merged

sun-jacobi added a commit to sun-jacobi/llvm-project that referenced this issue Dec 29, 2023

[RISCV][ISel] remove redundant min/max followed by a trunc.

af443b0

Fixes llvm#73424. If the range created by a min and max is precisely the range of trunc target, the min/max could be removed.

ChunyuLiao closed this as completed in #75145 Dec 29, 2023

sun-jacobi mentioned this issue Mar 20, 2024

[SelectionDAG] Add a new ISD Node for vector saturating truncation #85903

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RISCV] Vector saturating truncation not using VNCLIP #73424

[RISCV] Vector saturating truncation not using VNCLIP #73424

boomshroom commented Nov 26, 2023

llvmbot commented Nov 26, 2023

sun-jacobi commented Dec 7, 2023 •

edited

Loading

boomshroom commented Dec 7, 2023 •

edited

Loading

sun-jacobi commented Dec 9, 2023

sun-jacobi commented Dec 11, 2023 •

edited

Loading

boomshroom commented Dec 13, 2023

sun-jacobi commented Jan 18, 2024 •

edited

Loading

[RISCV] Vector saturating truncation not using VNCLIP #73424

[RISCV] Vector saturating truncation not using VNCLIP #73424

Comments

boomshroom commented Nov 26, 2023

llvmbot commented Nov 26, 2023

sun-jacobi commented Dec 7, 2023 • edited Loading

boomshroom commented Dec 7, 2023 • edited Loading

sun-jacobi commented Dec 9, 2023

sun-jacobi commented Dec 11, 2023 • edited Loading

boomshroom commented Dec 13, 2023

sun-jacobi commented Jan 18, 2024 • edited Loading

sun-jacobi commented Dec 7, 2023 •

edited

Loading

boomshroom commented Dec 7, 2023 •

edited

Loading

sun-jacobi commented Dec 11, 2023 •

edited

Loading

sun-jacobi commented Jan 18, 2024 •

edited

Loading