-
Notifications
You must be signed in to change notification settings - Fork 12.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SelectionDAG] Add a new ISD Node for vector saturating truncation #85903
Comments
@llvm/issue-subscribers-backend-x86 Author: Chia (sun-jacobi)
Based on the #73424. There were several patches and issues on the vector saturating truncation operation on different platform.
For instance, #68466 for x86, #75145 for RISC-V vector extension. I think it is better for us to do a target-independent combine for this operation, which could expressed as a sequence of LLVM IR like:
Especially, I think this combine could be done at the SelectionDAG, like the other saturating operation such as |
@llvm/issue-subscribers-backend-risc-v Author: Chia (sun-jacobi)
Based on the #73424. There were several patches and issues on the vector saturating truncation operation on different platform.
For instance, #68466 for x86, #75145 for RISC-V vector extension. I think it is better for us to do a target-independent combine for this operation, which could expressed as a sequence of LLVM IR like:
Especially, I think this combine could be done at the SelectionDAG, like the other saturating operation such as |
@llvm/issue-subscribers-backend-aarch64 Author: Chia (sun-jacobi)
Based on the #73424. There were several patches and issues on the vector saturating truncation operation on different platform.
For instance, #68466 for x86, #75145 for RISC-V vector extension. I think it is better for us to do a target-independent combine for this operation, which could expressed as a sequence of LLVM IR like:
Especially, I think this combine could be done at the SelectionDAG, like the other saturating operation such as |
I'm sure this has come up before as a phab review but I can't find it right now (@davemgreen can you remember?). What would be awesome is if there was a way to handle the 2 SSE PACK instruction signed saturations:
|
@RKSimon Thank you for the information. Do you mean we also need to add a new ISD Node for signed sat of unsigned range? |
I don't know yet - I think what you were proposing was we'd have a ISD::TRUNCSSAT node that just used the signed smin/smax clamp limits of the destination type (and ISD::TRUNCUSAT that had a umax limit?). What I'm wondering is how useful it would be to add extra args describing the limits - I just don't want to end up having a target node that has to match the generic nodes, especially if we end up with more complex folds for cases such as sat-trunc-store patterns. |
For MVE this is spelled ARMISD::VQMOVNs/ARMISD::VQMOVNu, but as it uses the top/bottom instructions the new ISD might not be a good fit. It would be better for AArch64/Neon (the SQXTN/UQXTN instructions) - there I think I added tablegen patterns to keep the nodes generic as long as possible, but so long as the new nodes handled known/demanded bits and whatnot then it should be a good replacement. |
Can a beginner take over this issue? if may, I'm very glad to take. |
@topperc Thanks for letting me know. If there are any architectures that have not been patched yet, I will refer to the link you provided. |
I think the RISC-V specific DAGCombine code could be removed if we had a generic saturation node. We'd still need custom lowering to split something like a vXi32->vXi8 saturating truncate into multiple vnclip instructions since we can only truncate vXi32->vXi16 or vXi16->vXi8 in a single instruction. |
@ParkHanbum What I proposed is actually a generic saturation node (just as @topperc said). The proposed node is somewhat difficult to work with x86, but AArch64/Neon and RISCV-V would benefit. If you are interested in this, feel free to try it :) |
Starting with adding ISD::TRUNCSSAT would make sense - cheers. |
I'm on duty for this. as @RKSimon guided, implemented (is it correctly?) truncate_ssat first.
ParkHanbum@f88709c#diff-d7065626b3d269e24241429ce037d51fc91d5ead5896d67fcc038aefc1111fd2 I should start implementing architecture-specific DAGToDAGISel now, right? 😄 |
Hi @ParkHanbum,
I think maybe modifications on |
Hi @inclyc thanks for advice. I'll find what can I do! |
@ParkHanbum You might want to use something like 2887f14 as reference about introducing a new ISD node, although in that case this was a conversion of an existing aarch64 opcode |
@RKSimon thanks!! it is very helpful!! |
can I ask a question? because I found it does not supported in aarch64&x86 but supported in riscv. in this testcase :
result of aarch64:
result of riscv:
link : https://godbolt.org/z/7W6baE83c and as I tested result, that trunc at @vqmovni64_smaxmin is saturated.
|
I expect you will have to add legalizer support as well - not just handle known legal instructions. Search for ISD::TRUNCATE in the llvm\lib\CodeGen\SelectionDAG\Legalize* files - its likely you will just have duplicate some of this code with a suitable smin/smax clamping before truncation. |
Thanks @RKSimon, the commits you shared earlier gave me what I have to do. The reason I asked is that I wanted to make sure that the cases that currently appear to be unsupported are still unsupportable. |
`truncate` is `saturated` if no additional conversion is required between the target and return values. if the target is `saturated` when attempting to crop from a `vector`, there is an opportunity to optimize it. previously, each architecture had an attemping optimization, so there was redundant code. this patch implements common logic by adding `ISD::TRUNCATE_[US]SAT` to indicate saturated truncate. Fixes llvm#85903
I've implemented truncate_[us]at for aarch64, riscv. x86 has a complicated implementation and I think I need to do more work on it, can I send a PR with the implemented one first? |
Please raise a PR, keep it as draft if you want (but people will comment whatever :)) |
`truncate` is `saturated` if no additional conversion is required between the target and return values. if the target is `saturated` when attempting to crop from a `vector`, there is an opportunity to optimize it. previously, each architecture had an attemping optimization, so there was redundant code. this patch implements common logic by adding `ISD::TRUNCATE_[US]SAT` to indicate saturated truncate. Fixes llvm#85903
`truncate` is `saturated` if no additional conversion is required between the target and return values. if the target is `saturated` when attempting to crop from a `vector`, there is an opportunity to optimize it. previously, each architecture had an attemping optimization, so there was redundant code. this patch implements common logic by adding `ISD::TRUNCATE_[US]SAT` to indicate saturated truncate. Fixes llvm#85903
`truncate` is `saturated` if no additional conversion is required between the target and return values. if the target is `saturated` when attempting to crop from a `vector`, there is an opportunity to optimize it. previously, each architecture had an attemping optimization, so there was redundant code. this patch implements common logic by adding `ISD::TRUNCATE_[US]SAT` to indicate saturated truncate. Fixes llvm#85903
A truncate is considered saturated if no additional conversion is required between the target and return values. If the target is saturated when attempting to truncate from a vector, there is an opportunity to optimize it. Previously, each architecture had its own attempt at optimization, leading to redundant code. This patch implements common logic by introducing three new ISDs: `ISD::TRUNCATE_SSAT_S`: When the operand is a signed value and the range of values matches the range of signed values of the destination type. `ISD::TRUNCATE_SSAT_U`: When the operand is a signed value and the range of values matches the range of unsigned values of the destination type. `ISD::TRUNCATE_USAT_U`: When the operand is an unsigned value and the range of values matches the range of unsigned values of the destination type. These ISDs indicate a saturated truncate. Fixes llvm#85903
A truncate is considered saturated if no additional conversion is required between the target and return values. If the target is saturated when attempting to truncate from a vector, there is an opportunity to optimize it. Previously, each architecture had its own attempt at optimization, leading to redundant code. This patch implements common logic by introducing three new ISDs: `ISD::TRUNCATE_SSAT_S`: When the operand is a signed value and the range of values matches the range of signed values of the destination type. `ISD::TRUNCATE_SSAT_U`: When the operand is a signed value and the range of values matches the range of unsigned values of the destination type. `ISD::TRUNCATE_USAT_U`: When the operand is an unsigned value and the range of values matches the range of unsigned values of the destination type. These ISDs indicate a saturated truncate. Fixes llvm#85903
Based on the #73424. There were several patches and issues on the vector saturating truncation operation on different platform.
For instance, #68466 for x86 and #75145 for RISC-V vector extension. I think it is better for us to do a target-independent combine for this operation, which could be expressed as a sequence of LLVM IR like:
Especially, I think this combine could be done at the SelectionDAG, like the other saturating operation such as
ISD::SADDSAT
. This could remove complex TableGen pattern matching and refactor each platform's CodeGen path for this operation.The text was updated successfully, but these errors were encountered: