[SelectionDAG] Add a new ISD Node for vector saturating truncation #85903

sun-jacobi · 2024-03-20T08:04:30Z

Based on the #73424. There were several patches and issues on the vector saturating truncation operation on different platform.

For instance, #68466 for x86 and #75145 for RISC-V vector extension. I think it is better for us to do a target-independent combine for this operation, which could be expressed as a sequence of LLVM IR like:

define void @trunc_sat_i8i16(ptr %x, ptr %y) {
  %1 = load <8 x i16>, ptr %x, align 16
  %2 = tail call <8 x i16> @llvm.smax.v8i16(<8 x i16> %1, <8 x i16> <i16 -128, i16 -128, i16 -128, i16 -128, i16 -128, i16 -128, i16 -128, i16 -128>)
  %3 = tail call <8 x i16> @llvm.smin.v8i16(<8 x i16> %2, <8 x i16> <i16 127, i16 127, i16 127, i16 127, i16 127, i16 127, i16 127, i16 127>)
  %4 = trunc <8 x i16> %3 to <8 x i8>
  store <8 x i8> %4, ptr %y, align 8
  ret void
}

Especially, I think this combine could be done at the SelectionDAG, like the other saturating operation such as ISD::SADDSAT. This could remove complex TableGen pattern matching and refactor each platform's CodeGen path for this operation.

The text was updated successfully, but these errors were encountered:

llvmbot · 2024-03-20T08:04:46Z

@llvm/issue-subscribers-backend-x86

Author: Chia (sun-jacobi)

Based on the #73424. There were several patches and issues on the vector saturating truncation operation on different platform.

For instance, #68466 for x86, #75145 for RISC-V vector extension. I think it is better for us to do a target-independent combine for this operation, which could expressed as a sequence of LLVM IR like:

define void @<!-- -->trunc_sat_i8i16(ptr %x, ptr %y) {
  %1 = load &lt;8 x i16&gt;, ptr %x, align 16
  %2 = tail call &lt;8 x i16&gt; @<!-- -->llvm.smax.v8i16(&lt;8 x i16&gt; %1, &lt;8 x i16&gt; &lt;i16 -128, i16 -128, i16 -128, i16 -128, i16 -128, i16 -128, i16 -128, i16 -128&gt;)
  %3 = tail call &lt;8 x i16&gt; @<!-- -->llvm.smin.v8i16(&lt;8 x i16&gt; %2, &lt;8 x i16&gt; &lt;i16 127, i16 127, i16 127, i16 127, i16 127, i16 127, i16 127, i16 127&gt;)
  %4 = trunc &lt;8 x i16&gt; %3 to &lt;8 x i8&gt;
  store &lt;8 x i8&gt; %4, ptr %y, align 8
  ret void
}

Especially, I think this combine could be done at the SelectionDAG, like the other saturating operation such as ISD::SADDSAT. This could remove complex TableGen pattern matching for this operation.

llvmbot · 2024-03-20T08:04:47Z

@llvm/issue-subscribers-backend-risc-v

Author: Chia (sun-jacobi)

Based on the #73424. There were several patches and issues on the vector saturating truncation operation on different platform.

For instance, #68466 for x86, #75145 for RISC-V vector extension. I think it is better for us to do a target-independent combine for this operation, which could expressed as a sequence of LLVM IR like:

define void @<!-- -->trunc_sat_i8i16(ptr %x, ptr %y) {
  %1 = load &lt;8 x i16&gt;, ptr %x, align 16
  %2 = tail call &lt;8 x i16&gt; @<!-- -->llvm.smax.v8i16(&lt;8 x i16&gt; %1, &lt;8 x i16&gt; &lt;i16 -128, i16 -128, i16 -128, i16 -128, i16 -128, i16 -128, i16 -128, i16 -128&gt;)
  %3 = tail call &lt;8 x i16&gt; @<!-- -->llvm.smin.v8i16(&lt;8 x i16&gt; %2, &lt;8 x i16&gt; &lt;i16 127, i16 127, i16 127, i16 127, i16 127, i16 127, i16 127, i16 127&gt;)
  %4 = trunc &lt;8 x i16&gt; %3 to &lt;8 x i8&gt;
  store &lt;8 x i8&gt; %4, ptr %y, align 8
  ret void
}

Especially, I think this combine could be done at the SelectionDAG, like the other saturating operation such as ISD::SADDSAT. This could remove complex TableGen pattern matching for this operation.

llvmbot · 2024-03-20T08:04:49Z

@llvm/issue-subscribers-backend-aarch64

Author: Chia (sun-jacobi)

Based on the #73424. There were several patches and issues on the vector saturating truncation operation on different platform.

For instance, #68466 for x86, #75145 for RISC-V vector extension. I think it is better for us to do a target-independent combine for this operation, which could expressed as a sequence of LLVM IR like:

define void @<!-- -->trunc_sat_i8i16(ptr %x, ptr %y) {
  %1 = load &lt;8 x i16&gt;, ptr %x, align 16
  %2 = tail call &lt;8 x i16&gt; @<!-- -->llvm.smax.v8i16(&lt;8 x i16&gt; %1, &lt;8 x i16&gt; &lt;i16 -128, i16 -128, i16 -128, i16 -128, i16 -128, i16 -128, i16 -128, i16 -128&gt;)
  %3 = tail call &lt;8 x i16&gt; @<!-- -->llvm.smin.v8i16(&lt;8 x i16&gt; %2, &lt;8 x i16&gt; &lt;i16 127, i16 127, i16 127, i16 127, i16 127, i16 127, i16 127, i16 127&gt;)
  %4 = trunc &lt;8 x i16&gt; %3 to &lt;8 x i8&gt;
  store &lt;8 x i8&gt; %4, ptr %y, align 8
  ret void
}

Especially, I think this combine could be done at the SelectionDAG, like the other saturating operation such as ISD::SADDSAT. This could remove complex TableGen pattern matching for this operation.

dtcxzyw · 2024-03-20T08:12:39Z

cc @topperc @RKSimon @preames @asb

RKSimon · 2024-03-20T09:57:26Z

I'm sure this has come up before as a phab review but I can't find it right now (@davemgreen can you remember?).

What would be awesome is if there was a way to handle the 2 SSE PACK instruction signed saturations:

/// PACKSS (normal signed sat)
/// (truncate (smin ((smax (x, signed_min_of_dest_type)),
///                  signed_max_of_dest_type)) to dest_type)
/// or:
/// (truncate (smax ((smin (x, signed_max_of_dest_type)),
///                  signed_min_of_dest_type)) to dest_type).
///
/// PACKUS (signed sat of unsigned range) - NOT the same as USAT
/// the smax/smin range is [0, unsigned_max_of_dest_type].

sun-jacobi · 2024-03-20T14:19:44Z

@RKSimon Thank you for the information. Do you mean we also need to add a new ISD Node for signed sat of unsigned range?

RKSimon · 2024-03-20T14:34:53Z

I don't know yet - I think what you were proposing was we'd have a ISD::TRUNCSSAT node that just used the signed smin/smax clamp limits of the destination type (and ISD::TRUNCUSAT that had a umax limit?).

What I'm wondering is how useful it would be to add extra args describing the limits - I just don't want to end up having a target node that has to match the generic nodes, especially if we end up with more complex folds for cases such as sat-trunc-store patterns.

davemgreen · 2024-03-21T10:04:01Z

For MVE this is spelled ARMISD::VQMOVNs/ARMISD::VQMOVNu, but as it uses the top/bottom instructions the new ISD might not be a good fit.

It would be better for AArch64/Neon (the SQXTN/UQXTN instructions) - there I think I added tablegen patterns to keep the nodes generic as long as possible, but so long as the new nodes handled known/demanded bits and whatnot then it should be a good replacement.

ParkHanbum · 2024-07-03T23:59:40Z

Can a beginner take over this issue? if may, I'm very glad to take.

topperc · 2024-07-04T00:07:52Z

The RISC-V code was further modified in #93596 #93728 #93756 #93752. There are no longer complex tablegen patterns, but there is a bunch of target specific DAG combine code.

ParkHanbum · 2024-07-04T00:13:51Z

@topperc Thanks for letting me know. If there are any architectures that have not been patched yet, I will refer to the link you provided.

topperc · 2024-07-04T00:21:24Z

@topperc Thanks for letting us know. If there are any architectures that have not been patched yet, I will refer to the link you provided.

I think the RISC-V specific DAGCombine code could be removed if we had a generic saturation node. We'd still need custom lowering to split something like a vXi32->vXi8 saturating truncate into multiple vnclip instructions since we can only truncate vXi32->vXi16 or vXi16->vXi8 in a single instruction.

sun-jacobi · 2024-07-04T07:54:42Z

@topperc Thanks for letting me know. If there are any architectures that have not been patched yet, I will refer to the link you provided.

@ParkHanbum What I proposed is actually a generic saturation node (just as @topperc said). The proposed node is somewhat difficult to work with x86, but AArch64/Neon and RISCV-V would benefit.

If you are interested in this, feel free to try it :)

RKSimon · 2024-07-04T08:50:39Z

Starting with adding ISD::TRUNCSSAT would make sense - cheers.

ParkHanbum · 2024-07-08T02:54:16Z

I'm on duty for this. as @RKSimon guided, implemented (is it correctly?) truncate_ssat first.

Legalized selection DAG: %bb.0 'saturate2:'
SelectionDAG has 7 nodes:
  t0: ch,glue = EntryToken
      t2: v8i16,ch = CopyFromReg t0, Register:v8i16 %0
    t13: v8i8 = truncate_ssat t2
  t11: ch,glue = CopyToReg t0, Register:v8i8 $d0, t13
  t12: ch = AArch64ISD::RET_GLUE t11, Register:v8i8 $d0, t11:1

ParkHanbum@f88709c#diff-d7065626b3d269e24241429ce037d51fc91d5ead5896d67fcc038aefc1111fd2

I should start implementing architecture-specific DAGToDAGISel now, right? 😄

inclyc · 2024-07-08T03:07:48Z

Hi @ParkHanbum,

I should start implementing architecture-specific DAGToDAGISel now, right?

I think maybe modifications on llvm/lib/Target/*.td files would be sufficient. You can start looking there.

ParkHanbum · 2024-07-08T03:09:35Z

Hi @inclyc thanks for advice. I'll find what can I do!

RKSimon · 2024-07-08T10:54:20Z

@ParkHanbum You might want to use something like 2887f14 as reference about introducing a new ISD node, although in that case this was a conversion of an existing aarch64 opcode

ParkHanbum · 2024-07-08T12:28:58Z

@RKSimon thanks!! it is very helpful!!

ParkHanbum · 2024-07-09T02:25:13Z

can I ask a question?
is we cannot support i64 type for this issue in aarch64 and x86?

because I found it does not supported in aarch64&x86 but supported in riscv.

in this testcase :

define <2 x i32> @vqmovni64_smaxmin(<2 x i64> %s0) {
entry:
  %c1 = icmp slt <2 x i64> %s0, <i64 2147483647, i64 2147483647>
  %s1 = select <2 x i1> %c1, <2 x i64> %s0, <2 x i64> <i64 2147483647, i64 2147483647>
  %c2 = icmp sgt <2 x i64> %s1, <i64 -2147483648, i64 -2147483648>
  %s2 = select <2 x i1> %c2, <2 x i64> %s1, <2 x i64> <i64 -2147483648, i64 -2147483648>
  %t = trunc <2 x i64> %s2 to <2 x i32>
  ret <2 x i32> %t
}

result of aarch64:

vqmovni64_smaxmin:                      // @vqmovni64_smaxmin
        mov     w8, #2147483647                 // =0x7fffffff
        dup     v1.2d, x8
        mov     x8, #-2147483648                // =0xffffffff80000000
        cmgt    v2.2d, v1.2d, v0.2d
        bif     v0.16b, v1.16b, v2.16b
        dup     v1.2d, x8
        cmgt    v2.2d, v0.2d, v1.2d
        bif     v0.16b, v1.16b, v2.16b
        xtn     v0.2s, v0.2d
        ret

result of riscv:

vqmovni64_smaxmin:                      # @vqmovni64_smaxmin
        vsetivli        zero, 2, e32, mf2, ta, ma
        vnclip.wi       v8, v8, 0
        ret

link : https://godbolt.org/z/7W6baE83c

and as I tested result, that trunc at @vqmovni64_smaxmin is saturated.


Optimized legalized selection DAG: %bb.0 'vqmovni64_smaxmin:entry'
SelectionDAG has 7 nodes:
  t0: ch,glue = EntryToken
      t2: v2i64,ch = CopyFromReg t0, Register:v2i64 %0
    t17: v2i32 = truncate_ssat t2
  t15: ch,glue = CopyToReg t0, Register:v2i32 $d0, t17
  t16: ch = AArch64ISD::RET_GLUE t15, Register:v2i32 $d0, t15:1

RKSimon · 2024-07-09T07:59:32Z

I expect you will have to add legalizer support as well - not just handle known legal instructions.

Search for ISD::TRUNCATE in the llvm\lib\CodeGen\SelectionDAG\Legalize* files - its likely you will just have duplicate some of this code with a suitable smin/smax clamping before truncation.

ParkHanbum · 2024-07-09T13:34:33Z

Thanks @RKSimon, the commits you shared earlier gave me what I have to do.

The reason I asked is that I wanted to make sure that the cases that currently appear to be unsupported are still unsupportable.

`truncate` is `saturated` if no additional conversion is required between the target and return values. if the target is `saturated` when attempting to crop from a `vector`, there is an opportunity to optimize it. previously, each architecture had an attemping optimization, so there was redundant code. this patch implements common logic by adding `ISD::TRUNCATE_[US]SAT` to indicate saturated truncate. Fixes llvm#85903

ParkHanbum · 2024-07-17T12:20:06Z

I've implemented truncate_[us]at for aarch64, riscv. x86 has a complicated implementation and I think I need to do more work on it, can I send a PR with the implemented one first?

RKSimon · 2024-07-17T13:31:59Z

Please raise a PR, keep it as draft if you want (but people will comment whatever :))

`truncate` is `saturated` if no additional conversion is required between the target and return values. if the target is `saturated` when attempting to crop from a `vector`, there is an opportunity to optimize it. previously, each architecture had an attemping optimization, so there was redundant code. this patch implements common logic by adding `ISD::TRUNCATE_[US]SAT` to indicate saturated truncate. Fixes llvm#85903

A truncate is considered saturated if no additional conversion is required between the target and return values. If the target is saturated when attempting to truncate from a vector, there is an opportunity to optimize it. Previously, each architecture had its own attempt at optimization, leading to redundant code. This patch implements common logic by introducing three new ISDs: `ISD::TRUNCATE_SSAT_S`: When the operand is a signed value and the range of values matches the range of signed values of the destination type. `ISD::TRUNCATE_SSAT_U`: When the operand is a signed value and the range of values matches the range of unsigned values of the destination type. `ISD::TRUNCATE_USAT_U`: When the operand is an unsigned value and the range of values matches the range of unsigned values of the destination type. These ISDs indicate a saturated truncate. Fixes llvm#85903

sun-jacobi added backend:AArch64 backend:RISC-V backend:X86 llvm:SelectionDAG SelectionDAGISel as well labels Mar 20, 2024

sun-jacobi changed the title ~~[SelectionDAG] Add new ISD Node for vector saturating truncation~~ [SelectionDAG] Add a new ISD Node for vector saturating truncation Mar 20, 2024

sun-jacobi assigned ParkHanbum Jul 4, 2024

RKSimon mentioned this issue Jul 18, 2024

[DAG] Support saturated truncate #99418

Merged

RKSimon closed this as completed in #99418 Aug 14, 2024

RKSimon closed this as completed in 0d074ba Aug 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SelectionDAG] Add a new ISD Node for vector saturating truncation #85903

[SelectionDAG] Add a new ISD Node for vector saturating truncation #85903

sun-jacobi commented Mar 20, 2024 •

edited

Loading

llvmbot commented Mar 20, 2024

llvmbot commented Mar 20, 2024

llvmbot commented Mar 20, 2024

dtcxzyw commented Mar 20, 2024

RKSimon commented Mar 20, 2024

sun-jacobi commented Mar 20, 2024

RKSimon commented Mar 20, 2024

davemgreen commented Mar 21, 2024

ParkHanbum commented Jul 3, 2024

topperc commented Jul 4, 2024 •

edited

Loading

ParkHanbum commented Jul 4, 2024 •

edited

Loading

topperc commented Jul 4, 2024

sun-jacobi commented Jul 4, 2024 •

edited

Loading

RKSimon commented Jul 4, 2024

ParkHanbum commented Jul 8, 2024

inclyc commented Jul 8, 2024

ParkHanbum commented Jul 8, 2024

RKSimon commented Jul 8, 2024

ParkHanbum commented Jul 8, 2024

ParkHanbum commented Jul 9, 2024 •

edited

Loading

RKSimon commented Jul 9, 2024

ParkHanbum commented Jul 9, 2024

ParkHanbum commented Jul 17, 2024

RKSimon commented Jul 17, 2024

[SelectionDAG] Add a new ISD Node for vector saturating truncation #85903

[SelectionDAG] Add a new ISD Node for vector saturating truncation #85903

Comments

sun-jacobi commented Mar 20, 2024 • edited Loading

llvmbot commented Mar 20, 2024

llvmbot commented Mar 20, 2024

llvmbot commented Mar 20, 2024

dtcxzyw commented Mar 20, 2024

RKSimon commented Mar 20, 2024

sun-jacobi commented Mar 20, 2024

RKSimon commented Mar 20, 2024

davemgreen commented Mar 21, 2024

ParkHanbum commented Jul 3, 2024

topperc commented Jul 4, 2024 • edited Loading

ParkHanbum commented Jul 4, 2024 • edited Loading

topperc commented Jul 4, 2024

sun-jacobi commented Jul 4, 2024 • edited Loading

RKSimon commented Jul 4, 2024

ParkHanbum commented Jul 8, 2024

inclyc commented Jul 8, 2024

ParkHanbum commented Jul 8, 2024

RKSimon commented Jul 8, 2024

ParkHanbum commented Jul 8, 2024

ParkHanbum commented Jul 9, 2024 • edited Loading

RKSimon commented Jul 9, 2024

ParkHanbum commented Jul 9, 2024

ParkHanbum commented Jul 17, 2024

RKSimon commented Jul 17, 2024

sun-jacobi commented Mar 20, 2024 •

edited

Loading

topperc commented Jul 4, 2024 •

edited

Loading

ParkHanbum commented Jul 4, 2024 •

edited

Loading

sun-jacobi commented Jul 4, 2024 •

edited

Loading

ParkHanbum commented Jul 9, 2024 •

edited

Loading