Severe perf regression in optimized debug builds due to extra UB checks #121245

RalfJung · 2024-02-17T22:10:59Z

In Miri built with optimized and debug assertions, we have recently seen a 50% increase in CI times. Bisecting points at #120594, and @saethlin confirmed that disabling one of these UB checks gives a speedup of the same ballpark as that slowdown.

Several things could be done here:

We could just remove this particular check.
We could have some way to disable the checks in a particular scope.
We could decouple "UB-checking assertions" from the general debug_assertions cfg-flag, similar to how overflow checks already have their own flag.

Update (2024-02-25): CI times are back to normal, now only one specific benchmark is affected.

The text was updated successfully, but these errors were encountered:

apiraino · 2024-02-19T10:49:35Z

WG-prioritization assigning priority (Zulip discussion).

@rustbot label -I-prioritize +P-high

clarfonthey · 2024-02-23T02:37:14Z

Just want to inject a few thoughts here as I work on #117494, which (as mentioned in that PR), will probably exacerbate this issue.

First, I noticed that there seems to be a few issues with double-checks getting optimized out for checked methods, as demonstrated in my analysis in this comment. This could be a potential factor in the performance issue, either by making the optimiser work harder to remove them (increasing compile times) or by them being run twice because the optimiser failed to remove them. To clarify, I mean cases like this not being optimised properly:

fn checked_thing(...) {
    if condition(...) {
        unsafe { unchecked_thing(...) }
    } else {
        // ...
    }
}

fn unchecked_thing(...) {
    debug_assert_nounwind!(condition(...), ...);
    // ...
}

Note that in some cases, the two conditions might be logically equivalent but not exactly the same code. (In the example of unchecked_shX, the two conditions were rhs < Self::BITS and rhs & (Self::BITS - 1) < Self::BITS.)

Second, I want to voice support for just making these unsafe asserts into their own dedicated flag that can be disabled, like debug_assertions, but technically separate to it. If we additionally offered this to downstream crate authors, it could be a useful tool to enable for debugging when something goes wrong, but to be disabled if it causes too much stress on testing (like here). Additionally, making it a compiler flag means you can enable it sometimes and disable it sometimes, e.g. ensure it's on when unit testing methods that contain unsafe code but not when testing the safe methods that call them.

RalfJung · 2024-02-23T07:28:17Z

This could be a potential factor in the performance issue, either by making the optimiser work harder to remove them (increasing compile times) or by them being run twice because the optimiser failed to remove them.

This issue is about the second point: performance of rustc-generated code, not compile times.

Additionally, making it a compiler flag means you can enable it sometimes and disable it sometimes, e.g. ensure it's on when unit testing methods that contain unsafe code but not when testing the safe methods that call them.

How do you set compiler flags on a per-test basis...?

clarfonthey · 2024-02-23T19:30:36Z

Additionally, making it a compiler flag means you can enable it sometimes and disable it sometimes, e.g. ensure it's on when unit testing methods that contain unsafe code but not when testing the safe methods that call them.

How do you set compiler flags on a per-test basis...?

I mostly meant in the sense of like, having them enabled when testing libraries and not on the larger compiler tests, since generally those have separate test suites. But also, the compiler tests themselves tend to set compiler flags when they're running, so you could configure them there.

Basically, not within a single crate, but for multi-crate projects like the compiler and for compiler tests specifically.

Noratrieb · 2024-02-24T09:30:34Z

#121114 will fix or at least significantly help with this. The checks are currently wayyy slower than they need to be because they are outlined for compile time reasons.

RalfJung · 2024-02-24T10:53:47Z

@Nilstrieb you linked this issue to itself? I assume you mean #121114?

Noratrieb · 2024-02-24T12:13:34Z

yes, i don't think the issue will fix itself

RalfJung · 2024-02-25T10:48:35Z

With #121114, CI times are indeed back to normal. 🎉

However, some individual benchmarks still show a severe regression. Specifically, CARGO_EXTRA_FLAGS="--profile=dev" ./miri bench slice-get-unchecked:

before (Miri 5f25162abf547fa3d306d2bdd7277f7a034fce5f):

Benchmark 1: cargo miri run --profile=dev --manifest-path "bench-cargo-miri/slice-get-unchecked/Cargo.toml"
  Time (mean ± σ):      3.810 s ±  0.030 s    [User: 3.758 s, System: 0.048 s]
  Range (min … max):    3.785 s …  3.858 s    5 runs

today:

Benchmark 1: cargo miri run --profile=dev --manifest-path "bench-cargo-miri/slice-get-unchecked/Cargo.toml"
  Time (mean ± σ):      5.766 s ±  0.006 s    [User: 5.723 s, System: 0.040 s]
  Range (min … max):    5.757 s …  5.773 s    5 runs

That's a 50% slowdown. Would be interesting to know if there's one particular check causing this or if it's everywhere. Interestingly, two other benchmarks I tried are completely unaffected, so it seems likely that this is one particular check.

saethlin · 2024-02-26T02:17:32Z

CARGO_EXTRA_FLAGS="--profile=dev" ./miri bench slice-get-unchecked:

slice-get-unchecked is a Stack cache stress-test. This benchmark scenario with debug assertions enabled now spends 88% of its runtime validating the Stacked Borrows cache. And using perf to jump into that function, there are no hot preconditions checks. In addition, I've been able to induce swings between -1 and +3 seconds of runtime by deleting parts of that function.

The hot part of verify_cache_consistency (95% of samples in that function) is this loop:

rust/src/tools/miri/src/borrow_tracker/stacked_borrows/stack.rs

Lines 160 to 169 in 0ecbd06

    
           for (idx, item) in self.borrows.iter().enumerate() { 
        
               if item.perm() == Permission::Unique { 
        
                   assert!( 
        
                       self.unique_range.contains(&idx), 
        
                       "{:?} {:?}", 
        
                       self.unique_range, 
        
                       self.borrows 
        
                   ); 
        
               } 
        
           }

which even with debug assertions is compiled to what looks to me like the ideal sequence of instructions:

 16.00 │100:┌─→inc    %r10
 15.93 │    │  add    $0xfffffffffffffff8,%r9
 30.41 │    │↓ je     17d
 16.90 │109:├──testb  $0x60,0x7(%r8,%r10,8)
 15.96 │    └──jne    100

I cannot explain why adding checks around this loop is making the program faster or slower. I tried reducing the number of checks by doing a let borrows = self.borrows.as_slice() then replacing all mention of self.borrows with borrows to deduplicate the Vec::deref checks, but that made the benchmark a tiny bit faster, except that I missed the use in the formatting of the assert! and when I replaced that use of self.borrows, the benchmark got slower. I tried deleting the helpful formatting in the assert! of the hot loop, and that also made the benchmark slower.

My only guess is that this benchmark is exceedingly sensitive to code alignment. Or something like that. In any case, the fact that ~84% of the program's cycles are in a perfectly-optimized loop that doesn't contain any checks makes me doubt the causal link between the checks and the slowdown.

RalfJung · 2024-02-26T07:28:34Z

~84% of the program's cycles are in a perfectly-optimized loop that doesn't contain any checks

How can that be true when the program gets 33% faster by removing the checks? Is that 84% of the cycles in the slow or the fast version of the program?

RalfJung · 2024-02-26T07:37:35Z

I tried deleting the helpful formatting in the assert! of the hot loop, and that also made the benchmark slower.

I can't reproduce this. I applied this diff on old Miri (before your checks were added)

diff --git a/src/borrow_tracker/stacked_borrows/stack.rs b/src/borrow_tracker/stacked_borrows/stack.rs
index 712c26a9..d107bd9a 100644
--- a/src/borrow_tracker/stacked_borrows/stack.rs
+++ b/src/borrow_tracker/stacked_borrows/stack.rs
@@ -161,9 +161,7 @@ impl<'tcx> Stack {
             if item.perm() == Permission::Unique {
                 assert!(
                     self.unique_range.contains(&idx),
-                    "{:?} {:?}",
-                    self.unique_range,
-                    self.borrows
+                    "bad cache",
                 );
             }
         }

Performance is the same with and without this.

saethlin · 2024-02-26T23:21:27Z

I ran

git checkout 5f25162abf547fa3d306d2bdd7277f7a034fce5f
./miri toolchain
CARGO_EXTRA_FLAGS="--profile=dev" ./miri bench slice-get-unchecked

And I get:

Benchmark 1: cargo miri run --profile=dev --manifest-path "bench-cargo-miri/slice-get-unchecked/Cargo.toml"
  Time (mean ± σ):      6.261 s ±  0.019 s    [User: 6.193 s, System: 0.058 s]
  Range (min … max):    6.239 s …  6.281 s    5 runs

With the same profile structure: ~84% of cycles are in the verify_cache_consistency loop that walks the entire stack.

But that's on my x86_64 desktop. On an M1 I see 6.8 seconds now and 5.6 seconds on commit 5f25162abf547fa3d306d2bdd7277f7a034fce5f. Still not as dramatic as the change you're seeing... What kind of machine are you on?

RalfJung · 2024-02-27T05:55:59Z

And I get:

And then when you apply just the diff I showed above it gets slower? Or what was your diff for "deleting the helpful formatting in the assert! of the hot loop"?

What kind of machine are you on?

A Lenovo laptop with an i7-12700H.

saethlin · 2024-02-27T14:47:04Z

Or what was your diff for "deleting the helpful formatting in the assert! of the hot loop"?

diff --git a/src/borrow_tracker/stacked_borrows/stack.rs b/src/borrow_tracker/stacked_borrows/stack.rs
index 712c26a9..d107bd9a 100644
--- a/src/borrow_tracker/stacked_borrows/stack.rs
+++ b/src/borrow_tracker/stacked_borrows/stack.rs
@@ -161,9 +161,7 @@ impl<'tcx> Stack {
             if item.perm() == Permission::Unique {
                 assert!(
                     self.unique_range.contains(&idx),
-                    "{:?} {:?}",
-                    self.unique_range,
-                    self.borrows
                 );
             }
         }

RalfJung · 2024-02-28T20:49:19Z

Yeah that diff makes no perf difference for me. If anything I am seeing a slight speedup (without: 3.806 s ± 0.014 s, with patch: 3.785 s ± 0.006 s).

apiraino · 2024-03-07T11:55:38Z

assigning to Nils who is authoring #121114 to signal that this issue is being taken care of. Thanks!

@rustbot assign @Nilstrieb

RalfJung · 2024-04-07T05:41:03Z

Fixed by #123411 -- now one can disable the UB checks if they are too expensive. At least on nightly. Tracking issue: #123499.

rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Feb 17, 2024

RalfJung added I-slow Issue: Problems and improvements with respect to performance of generated code. regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. labels Feb 17, 2024

rustbot added the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Feb 17, 2024

RalfJung mentioned this issue Feb 17, 2024

Disabling particular unsafe precondition checks under -Cdebug-assertions=y #120969

Open

jieyouxu added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Feb 17, 2024

saethlin changed the title ~~Severe perf regression in optimized debug builds due to extrac UB checks~~ Severe perf regression in optimized debug builds due to extra UB checks Feb 17, 2024

rustbot added P-high High priority and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Feb 19, 2024

clarfonthey mentioned this issue Feb 23, 2024

Add debug_assert_nounwind to unchecked_{add,sub,neg,mul,shl,shr} methods (haunted PR) #117494

Closed

Noratrieb mentioned this issue Feb 24, 2024

Add #[rustc_no_mir_inline] for standard library UB checks #121114

Merged

RalfJung mentioned this issue Feb 25, 2024

Add assert_unsafe_precondition to unchecked_{add,sub,neg,mul,shl,shr} methods #121571

Merged

RalfJung mentioned this issue Feb 26, 2024

Put "checks that detect UB" under their own flag below debug_assertions rust-lang/compiler-team#725

Closed

3 tasks

rustbot assigned Noratrieb Mar 7, 2024

RalfJung closed this as completed Apr 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Severe perf regression in optimized debug builds due to extra UB checks #121245

Severe perf regression in optimized debug builds due to extra UB checks #121245

RalfJung commented Feb 17, 2024 •

edited by rustbot

Loading

apiraino commented Feb 19, 2024

clarfonthey commented Feb 23, 2024

RalfJung commented Feb 23, 2024

clarfonthey commented Feb 23, 2024

Noratrieb commented Feb 24, 2024 •

edited

Loading

RalfJung commented Feb 24, 2024

Noratrieb commented Feb 24, 2024 •

edited

Loading

RalfJung commented Feb 25, 2024 •

edited

Loading

saethlin commented Feb 26, 2024

RalfJung commented Feb 26, 2024 •

edited

Loading

RalfJung commented Feb 26, 2024

saethlin commented Feb 26, 2024

RalfJung commented Feb 27, 2024

saethlin commented Feb 27, 2024

RalfJung commented Feb 28, 2024

apiraino commented Mar 7, 2024

RalfJung commented Apr 7, 2024

Severe perf regression in optimized debug builds due to extra UB checks #121245

Severe perf regression in optimized debug builds due to extra UB checks #121245

Comments

RalfJung commented Feb 17, 2024 • edited by rustbot Loading

apiraino commented Feb 19, 2024

clarfonthey commented Feb 23, 2024

RalfJung commented Feb 23, 2024

clarfonthey commented Feb 23, 2024

Noratrieb commented Feb 24, 2024 • edited Loading

RalfJung commented Feb 24, 2024

Noratrieb commented Feb 24, 2024 • edited Loading

RalfJung commented Feb 25, 2024 • edited Loading

saethlin commented Feb 26, 2024

RalfJung commented Feb 26, 2024 • edited Loading

RalfJung commented Feb 26, 2024

saethlin commented Feb 26, 2024

RalfJung commented Feb 27, 2024

saethlin commented Feb 27, 2024

RalfJung commented Feb 28, 2024

apiraino commented Mar 7, 2024

RalfJung commented Apr 7, 2024

RalfJung commented Feb 17, 2024 •

edited by rustbot

Loading

Noratrieb commented Feb 24, 2024 •

edited

Loading

Noratrieb commented Feb 24, 2024 •

edited

Loading

RalfJung commented Feb 25, 2024 •

edited

Loading

RalfJung commented Feb 26, 2024 •

edited

Loading