-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A more efficient slice comparison implementation for T: !BytewiseEq #116846
Conversation
(rustbot has picked a reviewer for you, use r? to override) |
This comment has been minimized.
This comment has been minimized.
9348c33
to
0cc5c97
Compare
The following code seems to generate identical code as this PR for most types and better code for float types. (also it's safe!) However, I haven't benchmarked it, so maybe there's something I'm not seeing. if a.len() != b.len() {
return false;
}
for idx in 0..a.len() {
if a[idx] != b[idx] {
return false;
}
}
true |
library/core/src/slice/cmp.rs
Outdated
// SAFETY: | ||
// This is sound because: | ||
// - self.len == other.len | ||
// - self.len <= isize::MAX |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That isn't true for ZSTs. Though the result still happens to work because bumping zst pointers does nothing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review!
Oh ! Very nice catch ! A quick check on godbolt shows me that this is only properly optimized since 1.73.0, and I didn''t recheck this code from my previous PR before resubmitting. I'll update this soon. Thanks. |
Wow, I'm shocked that this was not as well optimized for that long, this should have been easy enough for the optimizer. Oh well, sometimes they're tricky like that! I did some history, and of note is that the |
0cc5c97
to
a70613b
Compare
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
…=<try> A more efficient slice comparison implementation for T: !BytewiseEq (This is a follow up PR on rust-lang#113654) This PR changes the implementation for `[T]` slice comparison when `T: !BytewiseEq`. The previous implementation using zip was not optimized properly by the compiler, which didn't leverage the fact that both length were equal. Performance improvements are for example 20% when testing that `[Some(0_u64); 4096].as_slice() == [Some(0_u64); 4096].as_slice()`.
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (c12f891): comparison URL. Overall result: ✅ improvements - no action neededBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 669.803s -> 668.21s (-0.24%) |
That in itself seems like an issue... ah yes, #100124 last attempted to fix this but that stalled. Would you be willing to work out the critical difference in LLVM IR and add a codegen test? That's optional though, I can accept the PR without that. |
The previous implementation was not optimized properly by the compiler, which didn't leverage the fact that both length were equal.
a70613b
to
5b041ab
Compare
I had a look but couldn't figure out a way to characterize the difference between the two IR. |
@bors r+ |
☀️ Test successful - checks-actions |
Finished benchmarking commit (190f4c9): comparison URL. Overall result: ✅ improvements - no action needed@rustbot label: -perf-regression Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 667.74s -> 666.209s (-0.23%) |
Thanks @the8472 👍 |
(This is a follow up PR on #113654)
This PR changes the implementation for
[T]
slice comparison whenT: !BytewiseEq
. The previous implementation using zip was not optimized properly by the compiler, which didn't leverage the fact that both length were equal. Performance improvements are for example 20% when testing that[Some(0_u64); 4096].as_slice() == [Some(0_u64); 4096].as_slice()
.