Add `Ord::cmp` for primitives as a `BinOp` in MIR #118310

scottmcm · 2023-11-26T11:30:33Z

Update: most of this OP was written months ago. See #118310 (comment) below for where we got to recently that made it ready for review.

There are dozens of reasonable ways to implement Ord::cmp for integers using comparison, bit-ops, and branches. Those differences are irrelevant at the rust level, however, so we can make things better by adding BinOp::Cmp at the MIR level:

Exactly how to implement it is left up to the backends, so LLVM can use whatever pattern its optimizer best recognizes and cranelift can use whichever pattern codegens the fastest.
By not inlining those details for every use of cmp, we drastically reduce the amount of MIR generated for derived PartialOrd, while also making it more amenable to MIR-level optimizations.

Having extremely careful if ordering to μoptimize resource usage on broadwell (#63767) is great, but it really feels to me like libcore is the wrong place to put that logic. Similarly, using subtraction tricks (#105840) is arguably even nicer, but depends on the optimizer understanding it (llvm/llvm-project#73417) to be practical. Or maybe bitor is better than add? But maybe only on a future version that has or disjoint support? And just because one of those forms happens to be good for LLVM, there's no guarantee that it'd be the same form that GCC or Cranelift would rather see -- especially given their very different optimizers. Not to mention that if LLVM gets a spaceship intrinsic -- which it should -- we'll need at least a rustc intrinsic to be able to call it.

As for simplifying it in Rust, we now regularly inline {integer}::partial_cmp, but it's quite a large amount of IR. The best way to see that is with 8811efa#diff-d134c32d028fbe2bf835fef2df9aca9d13332dd82284ff21ee7ebf717bfa4765R113 -- I added a new pre-codegen MIR test for a simple 3-tuple struct, and this PR change it from 36 locals and 26 basic blocks down to 24 locals and 8 basic blocks. Even better, as soon as the construct-Some-then-match-it-in-same-BB noise is cleaned up, this'll expose the Cmp == 0 branches clearly in MIR, so that an InstCombine (#105808) can simplify that to just a BinOp::Eq and thus fix some of our generated code perf issues. (Tracking that through today's if a < b { Less } else if a == b { Equal } else { Greater } would be much harder.)

r? @ghost
But first I should check that perf is ok with this
~~...and my true nemesis, tidy.~~

scottmcm · 2023-11-26T13:25:17Z

@bors try @rust-timer queue

bors · 2023-11-26T13:26:28Z

⌛ Trying commit 9cfc79a with merge d5cf045...

Add `Ord::cmp` for primitives as a `BinOp` in MIR There are dozens of reasonable ways to implement `Ord::cmp` for integers using comparison, bit-ops, and branches. Those differences are irrelevant at the rust level, however, so we can make things better by adding `BinOp::Cmp` at the MIR level: 1. Exactly how to implement it is left up to the backends, so LLVM can use whatever pattern its optimizer best recognizes and cranelift can use whichever pattern codegens the fastest. 2. By not inlining those details for every use of `cmp`, we drastically reduce the amount of MIR generated for `derive`d `PartialOrd`, while also making it more amenable to MIR-level optimizations. Having extremely careful `if` ordering to μoptimize resource usage on broadwell (rust-lang#63767) is great, but it really feels to me like libcore is the wrong place to put that logic. Similarly, using subtraction [tricks](https://graphics.stanford.edu/~seander/bithacks.html#CopyIntegerSign) (rust-lang#105840) is arguably even nicer, but depends on the optimizer understanding it (llvm/llvm-project#73417) to be practical. Or maybe [bitor is better than add](https://discourse.llvm.org/t/representing-in-ir/67369/2?u=scottmcm)? But maybe only on a future version that [has `or disjoint` support](https://discourse.llvm.org/t/rfc-add-or-disjoint-flag/75036?u=scottmcm)? And just because one of those forms happens to be good for LLVM, there's no guarantee that it'd be the same form that GCC or Cranelift would rather see -- especially given their very different optimizers. Not to mention that if LLVM gets a spaceship intrinsic -- [which it should](https://rust-lang.zulipchat.com/#narrow/stream/131828-t-compiler/topic/Suboptimal.20inlining.20in.20std.20function.20.60binary_search.60/near/404250586) -- we'll need at least a rustc intrinsic to be able to call it. As for simplifying it in Rust, we now regularly inline `{integer}::partial_cmp`, but it's quite a large amount of IR. The best way to see that is with rust-lang@8811efa#diff-d134c32d028fbe2bf835fef2df9aca9d13332dd82284ff21ee7ebf717bfa4765R113 -- I added a new pre-codegen MIR test for a simple 3-tuple struct, and this PR change it from 36 locals and 26 basic blocks down to 24 locals and 8 basic blocks. Even better, as soon as the construct-`Some`-then-match-it-in-same-BB noise is cleaned up, this'll expose the `Cmp == 0` branches clearly in MIR, so that an InstCombine (rust-lang#105808) can simplify that to just a `BinOp::Eq` and thus fix some of our generated code perf issues. (Tracking that through today's `if a < b { Less } else if a == b { Equal } else { Greater }` would be *much* harder.) --- r? `@ghost` But first I should check that perf is ok with this ~~...and my true nemesis, tidy.~~

bors · 2023-11-26T14:53:28Z

☀️ Try build successful - checks-actions
Build commit: d5cf045 (d5cf04528062d622d35ccb36d43a437871ee8b27)

rust-timer · 2023-11-26T17:24:26Z

Finished benchmarking commit (d5cf045): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.9%	[0.4%, 2.2%]	5
Regressions ❌ (secondary)	0.2%	[0.2%, 0.2%]	1
Improvements ✅ (primary)	-0.7%	[-0.7%, -0.7%]	1
Improvements ✅ (secondary)	-3.4%	[-3.4%, -3.4%]	1
All ❌✅ (primary)	0.7%	[-0.7%, 2.2%]	6

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	5.7%	[2.1%, 11.8%]	3
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-3.9%	[-3.9%, -3.9%]	1
Improvements ✅ (secondary)	-5.0%	[-5.0%, -5.0%]	1
All ❌✅ (primary)	3.3%	[-3.9%, 11.8%]	4

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	2.2%	[2.2%, 2.2%]	1
Regressions ❌ (secondary)	2.4%	[2.4%, 2.4%]	1
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-2.9%	[-2.9%, -2.9%]	1
All ❌✅ (primary)	2.2%	[2.2%, 2.2%]	1

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.2%	[0.1%, 0.3%]	10
Regressions ❌ (secondary)	0.7%	[0.2%, 1.3%]	2
Improvements ✅ (primary)	-0.2%	[-0.4%, -0.0%]	19
Improvements ✅ (secondary)	-6.6%	[-6.6%, -6.6%]	1
All ❌✅ (primary)	-0.1%	[-0.4%, 0.3%]	29

Bootstrap: 675.058s -> 677.549s (0.37%)
Artifact size: 313.35 MiB -> 313.33 MiB (-0.01%)

scottmcm · 2023-11-27T08:46:43Z

tests/assembly/x86_64-cmp.rs

+    // DEBUG: cmp
+    // DEBUG: setg
+    // DEBUG: and
+    // DEBUG: cmp
+    // DEBUG: setl
+    // DEBUG: and
+    // DEBUG: sub


With any optimization this would be down to just cmp+setg+setl+sub, but at opt-level=0 it doesn't know that.

scottmcm · 2023-11-27T08:52:17Z

compiler/rustc_codegen_ssa/src/mir/rvalue.rs

+                use std::cmp::Ordering;
+                debug_assert!(!is_float);
+                let pred = |op| base::bin_op_to_icmp_predicate(op, is_signed);
+                if bx.cx().tcx().sess.opts.optimize == OptLevel::No {


The different approach makes quite a remarkable difference in unoptimized mode. I haven't investigated exactly what's happening in tablegen, but the gt+lt+sub approach in unoptimized gives

unsigned_cmp: cmp cx, dx seta al and al, 1 cmp cx, dx setb cl and cl, 1 sub al, cl ret

whereas the other approach gives

unsigned_cmp: push rax mov word ptr [rsp + 2], dx mov word ptr [rsp + 4], cx mov al, 1 xor r8d, r8d mov byte ptr [rsp + 6], r8b cmp cx, dx mov byte ptr [rsp + 7], al jne .LBB1_2 mov al, byte ptr [rsp + 6] mov byte ptr [rsp + 7], al .LBB1_2: mov cx, word ptr [rsp + 4] mov dx, word ptr [rsp + 2] mov al, byte ptr [rsp + 7] mov byte ptr [rsp], al mov al, 255 cmp cx, dx mov byte ptr [rsp + 1], al jb .LBB1_4 mov al, byte ptr [rsp] mov byte ptr [rsp + 1], al .LBB1_4: mov al, byte ptr [rsp + 1] pop rcx ret

scottmcm · 2023-11-27T09:03:59Z

@bors try @rust-timer queue

bors · 2023-11-27T09:05:10Z

⌛ Trying commit 82551ca with merge 3d40b1d...

Add `Ord::cmp` for primitives as a `BinOp` in MIR There are dozens of reasonable ways to implement `Ord::cmp` for integers using comparison, bit-ops, and branches. Those differences are irrelevant at the rust level, however, so we can make things better by adding `BinOp::Cmp` at the MIR level: 1. Exactly how to implement it is left up to the backends, so LLVM can use whatever pattern its optimizer best recognizes and cranelift can use whichever pattern codegens the fastest. 2. By not inlining those details for every use of `cmp`, we drastically reduce the amount of MIR generated for `derive`d `PartialOrd`, while also making it more amenable to MIR-level optimizations. Having extremely careful `if` ordering to μoptimize resource usage on broadwell (rust-lang#63767) is great, but it really feels to me like libcore is the wrong place to put that logic. Similarly, using subtraction [tricks](https://graphics.stanford.edu/~seander/bithacks.html#CopyIntegerSign) (rust-lang#105840) is arguably even nicer, but depends on the optimizer understanding it (llvm/llvm-project#73417) to be practical. Or maybe [bitor is better than add](https://discourse.llvm.org/t/representing-in-ir/67369/2?u=scottmcm)? But maybe only on a future version that [has `or disjoint` support](https://discourse.llvm.org/t/rfc-add-or-disjoint-flag/75036?u=scottmcm)? And just because one of those forms happens to be good for LLVM, there's no guarantee that it'd be the same form that GCC or Cranelift would rather see -- especially given their very different optimizers. Not to mention that if LLVM gets a spaceship intrinsic -- [which it should](https://rust-lang.zulipchat.com/#narrow/stream/131828-t-compiler/topic/Suboptimal.20inlining.20in.20std.20function.20.60binary_search.60/near/404250586) -- we'll need at least a rustc intrinsic to be able to call it. As for simplifying it in Rust, we now regularly inline `{integer}::partial_cmp`, but it's quite a large amount of IR. The best way to see that is with rust-lang@8811efa#diff-d134c32d028fbe2bf835fef2df9aca9d13332dd82284ff21ee7ebf717bfa4765R113 -- I added a new pre-codegen MIR test for a simple 3-tuple struct, and this PR change it from 36 locals and 26 basic blocks down to 24 locals and 8 basic blocks. Even better, as soon as the construct-`Some`-then-match-it-in-same-BB noise is cleaned up, this'll expose the `Cmp == 0` branches clearly in MIR, so that an InstCombine (rust-lang#105808) can simplify that to just a `BinOp::Eq` and thus fix some of our generated code perf issues. (Tracking that through today's `if a < b { Less } else if a == b { Equal } else { Greater }` would be *much* harder.) --- r? `@ghost` But first I should check that perf is ok with this ~~...and my true nemesis, tidy.~~

davidtwco

LGTM, but I'm not familiar with Stable MIR and whether or not this should be added to that.

cc @rust-lang/project-stable-mir

r=me once someone from that group signs-off on that part

MinerSebas · 2024-04-02T12:25:38Z

compiler/rustc_codegen_cranelift/src/codegen_i128.rs

+        BinOp::Cmp => unreachable!(),
        BinOp::Lt | BinOp::Le | BinOp::Eq | BinOp::Ge | BinOp::Gt | BinOp::Ne => unreachable!(),


oli-obk

sgtm from the smir side. We'll figure out how to get the type of a lang item

scottmcm · 2024-04-02T16:01:27Z

@bors r=davidtwco

bors · 2024-04-02T16:01:29Z

📌 Commit c59e93c has been approved by davidtwco

It is now in the queue for this repository.

bors · 2024-04-02T19:21:47Z

⌛ Testing commit c59e93c with merge a77322c...

bors · 2024-04-02T21:22:45Z

☀️ Test successful - checks-actions
Approved by: davidtwco
Pushing a77322c to master...

rust-timer · 2024-04-02T22:38:47Z

Finished benchmarking commit (a77322c): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please open an issue or create a new PR that fixes the regressions, add a comment linking to the newly created issue or PR, and then add the perf-regression-triaged label to this PR.

@rustbot label: +perf-regression
cc @rust-lang/wg-compiler-performance

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.4%	[0.2%, 0.7%]	3
Regressions ❌ (secondary)	0.3%	[0.2%, 0.4%]	2
Improvements ✅ (primary)	-0.6%	[-0.6%, -0.6%]	1
Improvements ✅ (secondary)	-3.1%	[-3.1%, -3.1%]	1
All ❌✅ (primary)	0.2%	[-0.6%, 0.7%]	4

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	3.7%	[2.7%, 4.5%]	4
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-4.5%	[-5.9%, -2.5%]	3
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	0.2%	[-5.9%, 4.5%]	7

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-2.9%	[-2.9%, -2.9%]	1
All ❌✅ (primary)	-	-	0

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.2%	[0.0%, 0.5%]	6
Regressions ❌ (secondary)	0.0%	[0.0%, 0.0%]	1
Improvements ✅ (primary)	-0.2%	[-0.8%, -0.0%]	16
Improvements ✅ (secondary)	-5.6%	[-5.6%, -5.6%]	1
All ❌✅ (primary)	-0.1%	[-0.8%, 0.5%]	22

Bootstrap: 667.286s -> 667.104s (-0.03%)
Artifact size: 315.68 MiB -> 315.58 MiB (-0.03%)

Add `Ord::cmp` for primitives as a `BinOp` in MIR Update: most of this OP was written months ago. See rust-lang/rust#118310 (comment) below for where we got to recently that made it ready for review. --- There are dozens of reasonable ways to implement `Ord::cmp` for integers using comparison, bit-ops, and branches. Those differences are irrelevant at the rust level, however, so we can make things better by adding `BinOp::Cmp` at the MIR level: 1. Exactly how to implement it is left up to the backends, so LLVM can use whatever pattern its optimizer best recognizes and cranelift can use whichever pattern codegens the fastest. 2. By not inlining those details for every use of `cmp`, we drastically reduce the amount of MIR generated for `derive`d `PartialOrd`, while also making it more amenable to MIR-level optimizations. Having extremely careful `if` ordering to μoptimize resource usage on broadwell (#63767) is great, but it really feels to me like libcore is the wrong place to put that logic. Similarly, using subtraction [tricks](https://graphics.stanford.edu/~seander/bithacks.html#CopyIntegerSign) (#105840) is arguably even nicer, but depends on the optimizer understanding it (llvm/llvm-project#73417) to be practical. Or maybe [bitor is better than add](https://discourse.llvm.org/t/representing-in-ir/67369/2?u=scottmcm)? But maybe only on a future version that [has `or disjoint` support](https://discourse.llvm.org/t/rfc-add-or-disjoint-flag/75036?u=scottmcm)? And just because one of those forms happens to be good for LLVM, there's no guarantee that it'd be the same form that GCC or Cranelift would rather see -- especially given their very different optimizers. Not to mention that if LLVM gets a spaceship intrinsic -- [which it should](https://rust-lang.zulipchat.com/#narrow/stream/131828-t-compiler/topic/Suboptimal.20inlining.20in.20std.20function.20.60binary_search.60/near/404250586) -- we'll need at least a rustc intrinsic to be able to call it. As for simplifying it in Rust, we now regularly inline `{integer}::partial_cmp`, but it's quite a large amount of IR. The best way to see that is with rust-lang/rust@8811efa#diff-d134c32d028fbe2bf835fef2df9aca9d13332dd82284ff21ee7ebf717bfa4765R113 -- I added a new pre-codegen MIR test for a simple 3-tuple struct, and this PR change it from 36 locals and 26 basic blocks down to 24 locals and 8 basic blocks. Even better, as soon as the construct-`Some`-then-match-it-in-same-BB noise is cleaned up, this'll expose the `Cmp == 0` branches clearly in MIR, so that an InstCombine (#105808) can simplify that to just a `BinOp::Eq` and thus fix some of our generated code perf issues. (Tracking that through today's `if a < b { Less } else if a == b { Equal } else { Greater }` would be *much* harder.) --- r? `@ghost` But first I should check that perf is ok with this ~~...and my true nemesis, tidy.~~

scottmcm · 2024-04-03T05:50:16Z

image-0.24.1 opt-full is +0.69% on instructions, but also -6.23% on wall time, so I don't know what to think about these perf results. I guess it's the classic unpredictable results from more inlining.

Add `Ord::cmp` for primitives as a `BinOp` in MIR Update: most of this OP was written months ago. See rust-lang/rust#118310 (comment) below for where we got to recently that made it ready for review. --- There are dozens of reasonable ways to implement `Ord::cmp` for integers using comparison, bit-ops, and branches. Those differences are irrelevant at the rust level, however, so we can make things better by adding `BinOp::Cmp` at the MIR level: 1. Exactly how to implement it is left up to the backends, so LLVM can use whatever pattern its optimizer best recognizes and cranelift can use whichever pattern codegens the fastest. 2. By not inlining those details for every use of `cmp`, we drastically reduce the amount of MIR generated for `derive`d `PartialOrd`, while also making it more amenable to MIR-level optimizations. Having extremely careful `if` ordering to μoptimize resource usage on broadwell (#63767) is great, but it really feels to me like libcore is the wrong place to put that logic. Similarly, using subtraction [tricks](https://graphics.stanford.edu/~seander/bithacks.html#CopyIntegerSign) (#105840) is arguably even nicer, but depends on the optimizer understanding it (llvm/llvm-project#73417) to be practical. Or maybe [bitor is better than add](https://discourse.llvm.org/t/representing-in-ir/67369/2?u=scottmcm)? But maybe only on a future version that [has `or disjoint` support](https://discourse.llvm.org/t/rfc-add-or-disjoint-flag/75036?u=scottmcm)? And just because one of those forms happens to be good for LLVM, there's no guarantee that it'd be the same form that GCC or Cranelift would rather see -- especially given their very different optimizers. Not to mention that if LLVM gets a spaceship intrinsic -- [which it should](https://rust-lang.zulipchat.com/#narrow/stream/131828-t-compiler/topic/Suboptimal.20inlining.20in.20std.20function.20.60binary_search.60/near/404250586) -- we'll need at least a rustc intrinsic to be able to call it. As for simplifying it in Rust, we now regularly inline `{integer}::partial_cmp`, but it's quite a large amount of IR. The best way to see that is with rust-lang/rust@8811efa#diff-d134c32d028fbe2bf835fef2df9aca9d13332dd82284ff21ee7ebf717bfa4765R113 -- I added a new pre-codegen MIR test for a simple 3-tuple struct, and this PR change it from 36 locals and 26 basic blocks down to 24 locals and 8 basic blocks. Even better, as soon as the construct-`Some`-then-match-it-in-same-BB noise is cleaned up, this'll expose the `Cmp == 0` branches clearly in MIR, so that an InstCombine (#105808) can simplify that to just a `BinOp::Eq` and thus fix some of our generated code perf issues. (Tracking that through today's `if a < b { Less } else if a == b { Equal } else { Greater }` would be *much* harder.) --- r? `@ghost` But first I should check that perf is ok with this ~~...and my true nemesis, tidy.~~

pnkfelix · 2024-04-09T13:28:37Z

@scottmcm from looking at the graphs, I'm inclined to categorize the majority of the outliers w.r.t. wall-time (especially the negative ones) as just noise here. (Of course wall-time itself is already ridiculously noisy.)

image-0.24.1 itself does seem like something interesting though, I freely admit that.

pnkfelix · 2024-04-09T20:38:12Z

The impact here is somewhat limited, and the graph indicates that the regression for image-0.24.1 was subsequently recovered.

@rustbot label: +perf-regression-triaged

Related changes: - rust-lang/rust#118310: Add `Ord::cmp` for primitives as a `BinOp` in MIR - rust-lang/rust#120131: Add support to `Pat` pattern type - rust-lang/rust#122935: Rename CastKind::PointerWithExposedProvenance - rust-lang/rust#123097: Adapt to changes to local_def_path_hash_to_def_id Resolves #3130, #3142

Related changes: - rust-lang/rust#118310: Add `Ord::cmp` for primitives as a `BinOp` in MIR - rust-lang/rust#120131: Add support to `Pat` pattern type - rust-lang/rust#122935: Rename CastKind::PointerWithExposedProvenance - rust-lang/rust#123097: Adapt to changes to local_def_path_hash_to_def_id Resolves model-checking#3130, model-checking#3142

Add `Ord::cmp` for primitives as a `BinOp` in MIR Update: most of this OP was written months ago. See rust-lang/rust#118310 (comment) below for where we got to recently that made it ready for review. --- There are dozens of reasonable ways to implement `Ord::cmp` for integers using comparison, bit-ops, and branches. Those differences are irrelevant at the rust level, however, so we can make things better by adding `BinOp::Cmp` at the MIR level: 1. Exactly how to implement it is left up to the backends, so LLVM can use whatever pattern its optimizer best recognizes and cranelift can use whichever pattern codegens the fastest. 2. By not inlining those details for every use of `cmp`, we drastically reduce the amount of MIR generated for `derive`d `PartialOrd`, while also making it more amenable to MIR-level optimizations. Having extremely careful `if` ordering to μoptimize resource usage on broadwell (#63767) is great, but it really feels to me like libcore is the wrong place to put that logic. Similarly, using subtraction [tricks](https://graphics.stanford.edu/~seander/bithacks.html#CopyIntegerSign) (#105840) is arguably even nicer, but depends on the optimizer understanding it (llvm/llvm-project#73417) to be practical. Or maybe [bitor is better than add](https://discourse.llvm.org/t/representing-in-ir/67369/2?u=scottmcm)? But maybe only on a future version that [has `or disjoint` support](https://discourse.llvm.org/t/rfc-add-or-disjoint-flag/75036?u=scottmcm)? And just because one of those forms happens to be good for LLVM, there's no guarantee that it'd be the same form that GCC or Cranelift would rather see -- especially given their very different optimizers. Not to mention that if LLVM gets a spaceship intrinsic -- [which it should](https://rust-lang.zulipchat.com/#narrow/stream/131828-t-compiler/topic/Suboptimal.20inlining.20in.20std.20function.20.60binary_search.60/near/404250586) -- we'll need at least a rustc intrinsic to be able to call it. As for simplifying it in Rust, we now regularly inline `{integer}::partial_cmp`, but it's quite a large amount of IR. The best way to see that is with rust-lang/rust@8811efa#diff-d134c32d028fbe2bf835fef2df9aca9d13332dd82284ff21ee7ebf717bfa4765R113 -- I added a new pre-codegen MIR test for a simple 3-tuple struct, and this PR change it from 36 locals and 26 basic blocks down to 24 locals and 8 basic blocks. Even better, as soon as the construct-`Some`-then-match-it-in-same-BB noise is cleaned up, this'll expose the `Cmp == 0` branches clearly in MIR, so that an InstCombine (#105808) can simplify that to just a `BinOp::Eq` and thus fix some of our generated code perf issues. (Tracking that through today's `if a < b { Less } else if a == b { Equal } else { Greater }` would be *much* harder.) --- r? `@ghost` But first I should check that perf is ok with this ~~...and my true nemesis, tidy.~~

Related changes: - rust-lang/rust#118310: Add `Ord::cmp` for primitives as a `BinOp` in MIR - rust-lang/rust#120131: Add support to `Pat` pattern type - rust-lang/rust#122935: Rename CastKind::PointerWithExposedProvenance - rust-lang/rust#123097: Adapt to changes to local_def_path_hash_to_def_id Resolves model-checking#3130, model-checking#3142

Add `Ord::cmp` for primitives as a `BinOp` in MIR Update: most of this OP was written months ago. See rust-lang/rust#118310 (comment) below for where we got to recently that made it ready for review. --- There are dozens of reasonable ways to implement `Ord::cmp` for integers using comparison, bit-ops, and branches. Those differences are irrelevant at the rust level, however, so we can make things better by adding `BinOp::Cmp` at the MIR level: 1. Exactly how to implement it is left up to the backends, so LLVM can use whatever pattern its optimizer best recognizes and cranelift can use whichever pattern codegens the fastest. 2. By not inlining those details for every use of `cmp`, we drastically reduce the amount of MIR generated for `derive`d `PartialOrd`, while also making it more amenable to MIR-level optimizations. Having extremely careful `if` ordering to μoptimize resource usage on broadwell (#63767) is great, but it really feels to me like libcore is the wrong place to put that logic. Similarly, using subtraction [tricks](https://graphics.stanford.edu/~seander/bithacks.html#CopyIntegerSign) (#105840) is arguably even nicer, but depends on the optimizer understanding it (llvm/llvm-project#73417) to be practical. Or maybe [bitor is better than add](https://discourse.llvm.org/t/representing-in-ir/67369/2?u=scottmcm)? But maybe only on a future version that [has `or disjoint` support](https://discourse.llvm.org/t/rfc-add-or-disjoint-flag/75036?u=scottmcm)? And just because one of those forms happens to be good for LLVM, there's no guarantee that it'd be the same form that GCC or Cranelift would rather see -- especially given their very different optimizers. Not to mention that if LLVM gets a spaceship intrinsic -- [which it should](https://rust-lang.zulipchat.com/#narrow/stream/131828-t-compiler/topic/Suboptimal.20inlining.20in.20std.20function.20.60binary_search.60/near/404250586) -- we'll need at least a rustc intrinsic to be able to call it. As for simplifying it in Rust, we now regularly inline `{integer}::partial_cmp`, but it's quite a large amount of IR. The best way to see that is with rust-lang/rust@8811efa#diff-d134c32d028fbe2bf835fef2df9aca9d13332dd82284ff21ee7ebf717bfa4765R113 -- I added a new pre-codegen MIR test for a simple 3-tuple struct, and this PR change it from 36 locals and 26 basic blocks down to 24 locals and 8 basic blocks. Even better, as soon as the construct-`Some`-then-match-it-in-same-BB noise is cleaned up, this'll expose the `Cmp == 0` branches clearly in MIR, so that an InstCombine (#105808) can simplify that to just a `BinOp::Eq` and thus fix some of our generated code perf issues. (Tracking that through today's `if a < b { Less } else if a == b { Equal } else { Greater }` would be *much* harder.) --- r? `@ghost` But first I should check that perf is ok with this ~~...and my true nemesis, tidy.~~

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Nov 26, 2023