Add tests for the generated assembly of mask related simd instructions. #121953

jhorstmann · 2024-03-03T21:19:19Z

The tests show that the code generation currently uses the least significant bits of vector masks when converting to . This leads to an additional left shift operation in the assembly for x86, since mask operations on x86 operate based on the most significant bit.

The exception is simd_bitmask, which already uses the most-significant bit.

This additional instruction would be removed by the changes in #104693, which makes all mask operations consistently use the most significant bits.

By using the "C" calling convention the tests should be stable regarding changes in register allocation, but it is possible that future llvm updates will require updating some of the checks.

rustbot · 2024-03-03T21:19:27Z

r? @Mark-Simulacrum

rustbot has assigned @Mark-Simulacrum.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

tests/assembly/simd-intrinsic-gather.rs

tgross35 · 2024-03-05T09:38:53Z

I don't think Mark reviews compiler stuff

@rustbot label +T-compiler +A-simd
r? compiler

oli-obk · 2024-03-05T10:00:02Z

r? @workingjubilee halp, lots of SIMD stuff

workingjubilee · 2024-03-06T07:04:57Z

Oh cool. Will try to review this tomorrow.

tests/assembly/simd-intrinsic-select.rs

tests/assembly/simd-intrinsic-gather.rs

tests/assembly/simd-bitmask.rs

tests/assembly/simd-intrinsic-mask-load.rs

tests/assembly/simd-intrinsic-mask-reduce.rs

tests/assembly/simd-intrinsic-mask-load.rs

tests/assembly/simd-intrinsic-scatter.rs

workingjubilee · 2024-03-07T05:15:12Z

Do all the pre-EVEX examples need AVX2? I thought vmaskmov was an AVX instruction?

jhorstmann · 2024-03-09T10:26:19Z

Do all the pre-EVEX examples need AVX2? I thought vmaskmov was an AVX instruction?

psslw/d/q on ymm regs is only avx2, with avx only those would use twice the operations on xmm registers. That might actually get fixed if the shift was no longer needed.

I'm thinking of rather adding tests for SSE2, since that is unfortunately still the baseline.

Thank you for the review! I'll push an update shortly.

workingjubilee

Thank you! Things are much clearer now. Unfortunately that means I noticed things I didn't notice the first time! Apologies. Yes, CHECK-DAG is quite excellent for this sort of thing!

tests/assembly/simd-intrinsic-mask-reduce.rs

tests/assembly/simd-intrinsic-select.rs

tests/assembly/simd-intrinsic-mask-load.rs

tests/assembly/simd-bitmask.rs

workingjubilee · 2024-03-09T18:25:04Z

@jhorstmann I'm pretty happy with the state of the avx2/avx512 tests assembled here, so if you would rather, we could cut the aarch64 material, land the rest, and then reintroduce it in a later PR that also adds the sse2 tests you are thinking of.

workingjubilee · 2024-03-09T18:31:12Z

Also feel free to rebase this. I personally basically completely reread PRs on rereview which is why I often notice things on the second pass, rather than assuming the history is informative, though it does mean reviewing a huge PR takes forever.

jhorstmann · 2024-03-10T13:23:55Z

tests/assembly/simd-bitmask.rs

+    // aarch64-NEXT: ext
+    // aarch64-NEXT: zip1
+    // aarch64-NEXT: addv
+    // aarch64-NEXT: fmov


This looks ok. Full listing including the constants:

5: .LCPI0_0: 6: .byte 1 7: .byte 2 8: .byte 4 9: .byte 8 10: .byte 16 11: .byte 32 12: .byte 64 13: .byte 128 14: .byte 1 15: .byte 2 16: .byte 4 17: .byte 8 18: .byte 16 19: .byte 32 20: .byte 64 21: .byte 128 22: .section .text.bitmask_m8x16,"ax",@progbits 23: .globl bitmask_m8x16 24: .p2align 2 25: .type bitmask_m8x16,@function 26: bitmask_m8x16: 27: .cfi_startproc 28: adrp x8, .LCPI0_0 29: cmlt v0.16b, v0.16b, #0 30: ldr q1, [x8, :lo12:.LCPI0_0] 31: and v0.16b, v0.16b, v1.16b 32: ext v1.16b, v0.16b, v0.16b, #8 33: zip1 v0.16b, v0.16b, v1.16b 34: addv h0, v0.8h 35: fmov w0, s0 36: ret

workingjubilee

Cool.

workingjubilee · 2024-03-11T16:14:23Z

r=me with the fixups squashed out.
@bors delegate+

bors · 2024-03-11T16:14:27Z

✌️ @jhorstmann, you can now approve this pull request!

If @workingjubilee told you to "r=me" after making some further change, please make that change, then do @bors r=@workingjubilee

The tests show that the code generation currently uses the least significant bits of <iX x N> vector masks when converting to <i1 xN>. This leads to an additional left shift operation in the assembly for x86, since mask operations on x86 operate based on the most significant bit. On aarch64 the left shift is followed by a comparison against zero, which repeats the sign bit across the whole lane. The exception, which does not introduce an unneeded shift, is simd_bitmask, because the code generation already shifts before truncating. By using the "C" calling convention the tests should be stable regarding changes in register allocation, but it is possible that future llvm updates will require updating some of the checks. This additional instruction would be removed by the fix in rust-lang#104693, which uses the most significant bit for all mask operations.

jhorstmann · 2024-03-12T07:55:37Z

@bors r=@workingjubilee

bors · 2024-03-12T07:55:40Z

📌 Commit e91f937 has been approved by workingjubilee

It is now in the queue for this repository.

workingjubilee · 2024-03-12T15:59:37Z

@bors rollup=always

…kingjubilee Rollup of 12 pull requests Successful merges: - rust-lang#121754 ([bootstrap] Move the `split-debuginfo` setting to the per-target section) - rust-lang#121953 (Add tests for the generated assembly of mask related simd instructions.) - rust-lang#122081 (validate `builder::PATH_REMAP`) - rust-lang#122245 (Detect truncated DepGraph files) - rust-lang#122354 (bootstrap: Don't eagerly format verbose messages) - rust-lang#122355 (rustdoc: fix up old test) - rust-lang#122363 (tests: Add ui/attributes/unix_sigpipe/unix_sigpipe-str-list.rs) - rust-lang#122366 (Fix stack overflow with recursive associated types) - rust-lang#122377 (Fix discriminant_kind copy paste from the pointee trait case) - rust-lang#122378 (Properly rebuild rustbooks) - rust-lang#122380 (Fix typo in lib.rs of proc_macro) - rust-lang#122381 (llvm-wrapper: adapt for LLVM API changes) r? `@ghost` `@rustbot` modify labels: rollup

Rollup merge of rust-lang#121953 - jhorstmann:assembly-tests-for-masked-simd-instructions, r=workingjubilee Add tests for the generated assembly of mask related simd instructions. The tests show that the code generation currently uses the least significant bits of <iX x N> vector masks when converting to <i1 xN>. This leads to an additional left shift operation in the assembly for x86, since mask operations on x86 operate based on the most significant bit. The exception is simd_bitmask, which already uses the most-significant bit. This additional instruction would be removed by the changes in rust-lang#104693, which makes all mask operations consistently use the most significant bits. By using the "C" calling convention the tests should be stable regarding changes in register allocation, but it is possible that future llvm updates will require updating some of the checks.

RalfJung · 2024-07-22T14:51:23Z

tests/assembly/simd-intrinsic-select.rs

+// CHECK-LABEL: select_f64x8
+#[no_mangle]
+pub unsafe extern "C" fn select_f64x8(mask: m64x8, a: f64x8, b: f64x8) -> f64x8 {
+    // The parameter is a 256 bit vector which in the C abi is only valid for avx512 targets.


Is this a typo? The comment says "256 bit vector" but the arguments are actually 512 bits large.

Oops, good catch, that looks to be copy-paste mistake.

rustbot assigned Mark-Simulacrum Mar 3, 2024

rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Mar 3, 2024

jhorstmann mentioned this pull request Mar 3, 2024

Consistently use the highest bit of vector masks when converting to i1 vectors #104693

Open

jhorstmann commented Mar 3, 2024

View reviewed changes

tests/assembly/simd-intrinsic-gather.rs Show resolved Hide resolved

This comment has been minimized.

Sign in to view

rustbot added A-SIMD Area: SIMD (Single Instruction Multiple Data) T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Mar 5, 2024

rustbot assigned oli-obk and unassigned Mark-Simulacrum Mar 5, 2024

rustbot assigned workingjubilee and unassigned oli-obk Mar 5, 2024

workingjubilee reviewed Mar 7, 2024

View reviewed changes

This comment has been minimized.

Sign in to view

workingjubilee reviewed Mar 9, 2024

View reviewed changes

tests/assembly/simd-intrinsic-mask-reduce.rs Outdated Show resolved Hide resolved

tests/assembly/simd-intrinsic-select.rs Outdated Show resolved Hide resolved

tests/assembly/simd-intrinsic-mask-load.rs Show resolved Hide resolved

tests/assembly/simd-bitmask.rs Outdated Show resolved Hide resolved

jhorstmann commented Mar 10, 2024

View reviewed changes

workingjubilee approved these changes Mar 10, 2024

View reviewed changes

jhorstmann force-pushed the assembly-tests-for-masked-simd-instructions branch from fe08d2d to ce470fa Compare March 12, 2024 07:52

jhorstmann force-pushed the assembly-tests-for-masked-simd-instructions branch from ce470fa to e91f937 Compare March 12, 2024 07:53

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 12, 2024

workingjubilee mentioned this pull request Mar 12, 2024

Rollup of 12 pull requests #122389

Merged

bors merged commit 947d960 into rust-lang:master Mar 12, 2024
11 checks passed

rustbot added this to the 1.78.0 milestone Mar 12, 2024

RalfJung reviewed Jul 22, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add tests for the generated assembly of mask related simd instructions. #121953

Add tests for the generated assembly of mask related simd instructions. #121953

jhorstmann commented Mar 3, 2024

rustbot commented Mar 3, 2024

This comment has been minimized.

tgross35 commented Mar 5, 2024

oli-obk commented Mar 5, 2024

workingjubilee commented Mar 6, 2024

workingjubilee commented Mar 7, 2024

jhorstmann commented Mar 9, 2024

This comment has been minimized.

This comment has been minimized.

workingjubilee left a comment

workingjubilee commented Mar 9, 2024 •

edited

Loading

workingjubilee commented Mar 9, 2024 •

edited

Loading

jhorstmann Mar 10, 2024

workingjubilee Mar 10, 2024

workingjubilee left a comment

workingjubilee commented Mar 11, 2024

bors commented Mar 11, 2024

jhorstmann commented Mar 12, 2024

bors commented Mar 12, 2024

workingjubilee commented Mar 12, 2024

RalfJung Jul 22, 2024

jhorstmann Jul 22, 2024

Add tests for the generated assembly of mask related simd instructions. #121953

Add tests for the generated assembly of mask related simd instructions. #121953

Conversation

jhorstmann commented Mar 3, 2024

rustbot commented Mar 3, 2024

This comment has been minimized.

tgross35 commented Mar 5, 2024

oli-obk commented Mar 5, 2024

workingjubilee commented Mar 6, 2024

workingjubilee commented Mar 7, 2024

jhorstmann commented Mar 9, 2024

This comment has been minimized.

This comment has been minimized.

workingjubilee left a comment

Choose a reason for hiding this comment

workingjubilee commented Mar 9, 2024 • edited Loading

workingjubilee commented Mar 9, 2024 • edited Loading

jhorstmann Mar 10, 2024

Choose a reason for hiding this comment

workingjubilee Mar 10, 2024

Choose a reason for hiding this comment

workingjubilee left a comment

Choose a reason for hiding this comment

workingjubilee commented Mar 11, 2024

bors commented Mar 11, 2024

jhorstmann commented Mar 12, 2024

bors commented Mar 12, 2024

workingjubilee commented Mar 12, 2024

RalfJung Jul 22, 2024

Choose a reason for hiding this comment

jhorstmann Jul 22, 2024

Choose a reason for hiding this comment

workingjubilee commented Mar 9, 2024 •

edited

Loading

workingjubilee commented Mar 9, 2024 •

edited

Loading