Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tests for the generated assembly of mask related simd instructions. #121953

Merged

Conversation

jhorstmann
Copy link
Contributor

The tests show that the code generation currently uses the least significant bits of vector masks when converting to . This leads to an additional left shift operation in the assembly for x86, since mask operations on x86 operate based on the most significant bit.

The exception is simd_bitmask, which already uses the most-significant bit.

This additional instruction would be removed by the changes in #104693, which makes all mask operations consistently use the most significant bits.

By using the "C" calling convention the tests should be stable regarding changes in register allocation, but it is possible that future llvm updates will require updating some of the checks.

@rustbot
Copy link
Collaborator

rustbot commented Mar 3, 2024

r? @Mark-Simulacrum

rustbot has assigned @Mark-Simulacrum.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Mar 3, 2024
@rust-log-analyzer

This comment has been minimized.

@tgross35
Copy link
Contributor

tgross35 commented Mar 5, 2024

I don't think Mark reviews compiler stuff

@rustbot label +T-compiler +A-simd
r? compiler

@rustbot rustbot added A-SIMD Area: SIMD (Single Instruction Multiple Data) T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Mar 5, 2024
@rustbot rustbot assigned oli-obk and unassigned Mark-Simulacrum Mar 5, 2024
@oli-obk
Copy link
Contributor

oli-obk commented Mar 5, 2024

r? @workingjubilee halp, lots of SIMD stuff

@rustbot rustbot assigned workingjubilee and unassigned oli-obk Mar 5, 2024
@workingjubilee
Copy link
Member

Oh cool. Will try to review this tomorrow.

tests/assembly/simd-intrinsic-select.rs Outdated Show resolved Hide resolved
tests/assembly/simd-intrinsic-select.rs Outdated Show resolved Hide resolved
tests/assembly/simd-intrinsic-gather.rs Show resolved Hide resolved
tests/assembly/simd-bitmask.rs Outdated Show resolved Hide resolved
tests/assembly/simd-intrinsic-mask-load.rs Outdated Show resolved Hide resolved
tests/assembly/simd-intrinsic-mask-load.rs Outdated Show resolved Hide resolved
tests/assembly/simd-intrinsic-mask-reduce.rs Outdated Show resolved Hide resolved
tests/assembly/simd-intrinsic-mask-load.rs Outdated Show resolved Hide resolved
tests/assembly/simd-intrinsic-scatter.rs Show resolved Hide resolved
@workingjubilee
Copy link
Member

Do all the pre-EVEX examples need AVX2? I thought vmaskmov was an AVX instruction?

@jhorstmann
Copy link
Contributor Author

Do all the pre-EVEX examples need AVX2? I thought vmaskmov was an AVX instruction?

psslw/d/q on ymm regs is only avx2, with avx only those would use twice the operations on xmm registers. That might actually get fixed if the shift was no longer needed.

I'm thinking of rather adding tests for SSE2, since that is unfortunately still the baseline.

Thank you for the review! I'll push an update shortly.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

Copy link
Member

@workingjubilee workingjubilee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Things are much clearer now. Unfortunately that means I noticed things I didn't notice the first time! Apologies. Yes, CHECK-DAG is quite excellent for this sort of thing!

tests/assembly/simd-intrinsic-mask-reduce.rs Outdated Show resolved Hide resolved
tests/assembly/simd-intrinsic-select.rs Outdated Show resolved Hide resolved
tests/assembly/simd-intrinsic-mask-load.rs Show resolved Hide resolved
tests/assembly/simd-bitmask.rs Outdated Show resolved Hide resolved
@workingjubilee
Copy link
Member

workingjubilee commented Mar 9, 2024

@jhorstmann I'm pretty happy with the state of the avx2/avx512 tests assembled here, so if you would rather, we could cut the aarch64 material, land the rest, and then reintroduce it in a later PR that also adds the sse2 tests you are thinking of.

@workingjubilee
Copy link
Member

workingjubilee commented Mar 9, 2024

Also feel free to rebase this. I personally basically completely reread PRs on rereview which is why I often notice things on the second pass, rather than assuming the history is informative, though it does mean reviewing a huge PR takes forever.

// aarch64-NEXT: ext
// aarch64-NEXT: zip1
// aarch64-NEXT: addv
// aarch64-NEXT: fmov
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks ok. Full listing including the constants:

           5: .LCPI0_0: 
           6:  .byte 1 
           7:  .byte 2 
           8:  .byte 4 
           9:  .byte 8 
          10:  .byte 16 
          11:  .byte 32 
          12:  .byte 64 
          13:  .byte 128 
          14:  .byte 1 
          15:  .byte 2 
          16:  .byte 4 
          17:  .byte 8 
          18:  .byte 16 
          19:  .byte 32 
          20:  .byte 64 
          21:  .byte 128 
          22:  .section .text.bitmask_m8x16,"ax",@progbits 
          23:  .globl bitmask_m8x16 
          24:  .p2align 2 
          25:  .type bitmask_m8x16,@function 
          26: bitmask_m8x16: 
          27:  .cfi_startproc 
          28:  adrp x8, .LCPI0_0 
          29:  cmlt v0.16b, v0.16b, #0 
          30:  ldr q1, [x8, :lo12:.LCPI0_0] 
          31:  and v0.16b, v0.16b, v1.16b 
          32:  ext v1.16b, v0.16b, v0.16b, #8 
          33:  zip1 v0.16b, v0.16b, v1.16b 
          34:  addv h0, v0.8h 
          35:  fmov w0, s0 
          36:  ret 

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice.

Copy link
Member

@workingjubilee workingjubilee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool.

@workingjubilee
Copy link
Member

r=me with the fixups squashed out.
@bors delegate+

@bors
Copy link
Contributor

bors commented Mar 11, 2024

✌️ @jhorstmann, you can now approve this pull request!

If @workingjubilee told you to "r=me" after making some further change, please make that change, then do @bors r=@workingjubilee

@jhorstmann jhorstmann force-pushed the assembly-tests-for-masked-simd-instructions branch from fe08d2d to ce470fa Compare March 12, 2024 07:52
The tests show that the code generation currently uses the least
significant bits of <iX x N> vector masks when converting to <i1 xN>.
This leads to an additional left shift operation in the assembly for
x86, since mask operations on x86 operate based on the most significant
bit. On aarch64 the left shift is followed by a comparison against zero,
which repeats the sign bit across the whole lane.

The exception, which does not introduce an unneeded shift, is
simd_bitmask, because the code generation already shifts before
truncating.

By using the "C" calling convention the tests should be stable regarding
changes in register allocation, but it is possible that future llvm
updates will require updating some of the checks.

This additional instruction would be removed by the fix in rust-lang#104693,
which uses the most significant bit for all mask operations.
@jhorstmann jhorstmann force-pushed the assembly-tests-for-masked-simd-instructions branch from ce470fa to e91f937 Compare March 12, 2024 07:53
@jhorstmann
Copy link
Contributor Author

@bors r=@workingjubilee

@bors
Copy link
Contributor

bors commented Mar 12, 2024

📌 Commit e91f937 has been approved by workingjubilee

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 12, 2024
@workingjubilee
Copy link
Member

@bors rollup=always

bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 12, 2024
…kingjubilee

Rollup of 12 pull requests

Successful merges:

 - rust-lang#121754 ([bootstrap] Move the `split-debuginfo` setting to the per-target section)
 - rust-lang#121953 (Add tests for the generated assembly of mask related simd instructions.)
 - rust-lang#122081 (validate `builder::PATH_REMAP`)
 - rust-lang#122245 (Detect truncated DepGraph files)
 - rust-lang#122354 (bootstrap: Don't eagerly format verbose messages)
 - rust-lang#122355 (rustdoc: fix up old test)
 - rust-lang#122363 (tests: Add ui/attributes/unix_sigpipe/unix_sigpipe-str-list.rs)
 - rust-lang#122366 (Fix stack overflow with recursive associated types)
 - rust-lang#122377 (Fix discriminant_kind copy paste from the pointee trait case)
 - rust-lang#122378 (Properly rebuild rustbooks)
 - rust-lang#122380 (Fix typo in lib.rs of proc_macro)
 - rust-lang#122381 (llvm-wrapper: adapt for LLVM API changes)

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit 947d960 into rust-lang:master Mar 12, 2024
11 checks passed
@rustbot rustbot added this to the 1.78.0 milestone Mar 12, 2024
rust-timer added a commit to rust-lang-ci/rust that referenced this pull request Mar 12, 2024
Rollup merge of rust-lang#121953 - jhorstmann:assembly-tests-for-masked-simd-instructions, r=workingjubilee

Add tests for the generated assembly of mask related simd instructions.

The tests show that the code generation currently uses the least significant bits of <iX x N> vector masks when converting to <i1 xN>. This leads to an additional left shift operation in the assembly for x86, since mask operations on x86 operate based on the most significant bit.

The exception is simd_bitmask, which already uses the most-significant bit.

This additional instruction would be removed by the changes in rust-lang#104693, which makes all mask operations consistently use the most significant bits.

By using the "C" calling convention the tests should be stable regarding changes in register allocation, but it is possible that future llvm updates will require updating some of the checks.
// CHECK-LABEL: select_f64x8
#[no_mangle]
pub unsafe extern "C" fn select_f64x8(mask: m64x8, a: f64x8, b: f64x8) -> f64x8 {
// The parameter is a 256 bit vector which in the C abi is only valid for avx512 targets.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a typo? The comment says "256 bit vector" but the arguments are actually 512 bits large.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, good catch, that looks to be copy-paste mistake.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-SIMD Area: SIMD (Single Instruction Multiple Data) S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants