-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Severe perf regression in optimized debug builds due to extra UB checks #121245
Comments
WG-prioritization assigning priority (Zulip discussion). @rustbot label -I-prioritize +P-high |
Just want to inject a few thoughts here as I work on #117494, which (as mentioned in that PR), will probably exacerbate this issue. First, I noticed that there seems to be a few issues with double-checks getting optimized out for checked methods, as demonstrated in my analysis in this comment. This could be a potential factor in the performance issue, either by making the optimiser work harder to remove them (increasing compile times) or by them being run twice because the optimiser failed to remove them. To clarify, I mean cases like this not being optimised properly: fn checked_thing(...) {
if condition(...) {
unsafe { unchecked_thing(...) }
} else {
// ...
}
}
fn unchecked_thing(...) {
debug_assert_nounwind!(condition(...), ...);
// ...
} Note that in some cases, the two conditions might be logically equivalent but not exactly the same code. (In the example of Second, I want to voice support for just making these unsafe asserts into their own dedicated flag that can be disabled, like |
This issue is about the second point: performance of rustc-generated code, not compile times.
How do you set compiler flags on a per-test basis...? |
I mostly meant in the sense of like, having them enabled when testing libraries and not on the larger compiler tests, since generally those have separate test suites. But also, the compiler tests themselves tend to set compiler flags when they're running, so you could configure them there. Basically, not within a single crate, but for multi-crate projects like the compiler and for compiler tests specifically. |
#121114 will fix or at least significantly help with this. The checks are currently wayyy slower than they need to be because they are outlined for compile time reasons. |
@Nilstrieb you linked this issue to itself? I assume you mean #121114? |
yes, i don't think the issue will fix itself |
With #121114, CI times are indeed back to normal. 🎉 However, some individual benchmarks still show a severe regression. Specifically, before (Miri 5f25162abf547fa3d306d2bdd7277f7a034fce5f):
today:
That's a 50% slowdown. Would be interesting to know if there's one particular check causing this or if it's everywhere. Interestingly, two other benchmarks I tried are completely unaffected, so it seems likely that this is one particular check. |
The hot part of rust/src/tools/miri/src/borrow_tracker/stacked_borrows/stack.rs Lines 160 to 169 in 0ecbd06
I cannot explain why adding checks around this loop is making the program faster or slower. I tried reducing the number of checks by doing a My only guess is that this benchmark is exceedingly sensitive to code alignment. Or something like that. In any case, the fact that ~84% of the program's cycles are in a perfectly-optimized loop that doesn't contain any checks makes me doubt the causal link between the checks and the slowdown. |
How can that be true when the program gets 33% faster by removing the checks? Is that 84% of the cycles in the slow or the fast version of the program? |
I can't reproduce this. I applied this diff on old Miri (before your checks were added) diff --git a/src/borrow_tracker/stacked_borrows/stack.rs b/src/borrow_tracker/stacked_borrows/stack.rs
index 712c26a9..d107bd9a 100644
--- a/src/borrow_tracker/stacked_borrows/stack.rs
+++ b/src/borrow_tracker/stacked_borrows/stack.rs
@@ -161,9 +161,7 @@ impl<'tcx> Stack {
if item.perm() == Permission::Unique {
assert!(
self.unique_range.contains(&idx),
- "{:?} {:?}",
- self.unique_range,
- self.borrows
+ "bad cache",
);
}
} Performance is the same with and without this. |
I ran
And I get:
With the same profile structure: ~84% of cycles are in the But that's on my x86_64 desktop. On an M1 I see 6.8 seconds now and 5.6 seconds on commit 5f25162abf547fa3d306d2bdd7277f7a034fce5f. Still not as dramatic as the change you're seeing... What kind of machine are you on? |
And then when you apply just the diff I showed above it gets slower? Or what was your diff for "deleting the helpful formatting in the assert! of the hot loop"?
A Lenovo laptop with an i7-12700H. |
diff --git a/src/borrow_tracker/stacked_borrows/stack.rs b/src/borrow_tracker/stacked_borrows/stack.rs
index 712c26a9..d107bd9a 100644
--- a/src/borrow_tracker/stacked_borrows/stack.rs
+++ b/src/borrow_tracker/stacked_borrows/stack.rs
@@ -161,9 +161,7 @@ impl<'tcx> Stack {
if item.perm() == Permission::Unique {
assert!(
self.unique_range.contains(&idx),
- "{:?} {:?}",
- self.unique_range,
- self.borrows
);
}
} |
Yeah that diff makes no perf difference for me. If anything I am seeing a slight speedup (without: 3.806 s ± 0.014 s, with patch: 3.785 s ± 0.006 s). |
assigning to Nils who is authoring #121114 to signal that this issue is being taken care of. Thanks! @rustbot assign @Nilstrieb |
In Miri built with optimized and debug assertions, we have recently seen a 50% increase in CI times. Bisecting points at #120594, and @saethlin confirmed that disabling one of these UB checks gives a speedup of the same ballpark as that slowdown.
Several things could be done here:
debug_assertions
cfg-flag, similar to how overflow checks already have their own flag.Update (2024-02-25): CI times are back to normal, now only one specific benchmark is affected.
The text was updated successfully, but these errors were encountered: