-
Notifications
You must be signed in to change notification settings - Fork 12.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RISCV] Vector saturating truncation not using VNCLIP #73424
Comments
@llvm/issue-subscribers-backend-risc-v Author: Angelo Bulfone (boomshroom)
Closely related to #68466 and #58266, but for a different platform.
Much like x86's (V)PACK(U,S)S and ARM's (U,S)QXTN(2), RISC-V's V extension includes VNCLIP(U).W(V,X,I), which performs a right shift and then performs either signed or unsigned saturation to vector of integers with half the width. With a shift value of 0 (whether from an immediate or the register; the V extension specification doesn't specifically talk about this use case), the instruction can be used to perform a saturating truncation from vectors of one integer type, the the type with half the width. Given the complexity of matching saturating truncation, and the apparent duplication between targets of this functionality, it could be worth considering matching it earlier in the pipeline. Matching a saturating truncation preceded by a shift should be much simpler and can be done with TableGen. Example code with current code gen, next to x86 and AArch64 equivalents. |
@boomshroom Hi, can I work on this issue? |
Sure. It probably wouldn't be too hard to copy and paste the logic from one of the several other targets that match this pattern, though the fact that it does seem to be duplicated so many times makes me suspicious that doing so is really the best idea. I personally think adding a target-independent "saturating truncation" node would be better, though I'm less sure how to go about that and it would entail a larger change to the overall infrastructure. |
@boomshroom Can you assign this issue to me? |
The previous pattern matching around the |
If the shift is 0, then it should never have to round and thus Also I still wonder why such an operation that seems fairly useful, relatively common in ISAs, and is non-trivial to match, doesn't have a more target-independent representation in LLVM. Could save some work matching the patterns before lowering to individual ISAs and reduce the duplication between them. |
Fixes llvm#73424. If the range created by a min and max is precisely the range of trunc target, the min/max could be removed.
This patch closed #73424, which is also a missed-optimization case similar to #68466 on X86. ## Source Code ``` define void @trunc_sat_i8i16(ptr %x, ptr %y) { %1 = load <8 x i16>, ptr %x, align 16 %2 = tail call <8 x i16> @llvm.smax.v8i16(<8 x i16> %1, <8 x i16> <i16 -128, i16 -128, i16 -128, i16 -128, i16 -128, i16 -128, i16 -128, i16 -128>) %3 = tail call <8 x i16> @llvm.smin.v8i16(<8 x i16> %2, <8 x i16> <i16 127, i16 127, i16 127, i16 127, i16 127, i16 127, i16 127, i16 127>) %4 = trunc <8 x i16> %3 to <8 x i8> store <8 x i8> %4, ptr %y, align 8 ret void } ``` ## Before this patch: ``` trunc_sat_i8i16: # @trunc_maxmin_id_i8i16 vsetivli zero, 8, e16, m1, ta, ma vle16.v v8, (a0) li a0, -128 vmax.vx v8, v8, a0 li a0, 127 vmin.vx v8, v8, a0 vsetvli zero, zero, e8, mf2, ta, ma vnsrl.wi v8, v8, 0 vse8.v v8, (a1) ret ``` ## After this patch: ``` trunc_sat_i8i16: # @trunc_maxmin_id_i8i16 vsetivli zero, 8, e8, mf2, ta, ma vle16.v v8, (a0) csrwi vxrm, 0 vnclip.wi v8, v8, 0 vse8.v v8, (a1) ret ```
I'm planning to add a new opcode |
Closely related to #68466 and #58266, but for a different platform.
Much like x86's (V)PACK(U,S)S and ARM's (U,S)QXTN(2), RISC-V's V extension includes VNCLIP(U).W(V,X,I), which performs a right shift and then performs either signed or unsigned saturation to vector of integers with half the width. With a shift value of 0 (whether from an immediate or the register; the V extension specification doesn't specifically talk about this use case), the instruction can be used to perform a saturating truncation from vectors of one integer type, the the type with half the width.
Given the complexity of matching saturating truncation, and the apparent duplication between targets of this functionality, it could be worth considering matching it earlier in the pipeline. Matching a saturating truncation preceded by a shift should be much simpler and can be done with TableGen.
Example code with current code gen, next to x86 and AArch64 equivalents.
https://rust.godbolt.org/z/bKdsq8GGM
The text was updated successfully, but these errors were encountered: