-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid useless sift_down when std::collections::binary_heap::PeekMut is never mutably dereferenced #75974
Conversation
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @LukasKalbertodt (or someone else) soon. If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes. Please see the contribution instructions for more information. |
But in those cases, couldn't you simply not call The assignment So yeah, unfortunately, it's not clear to me if this would be an actual improvement :/ |
I could, but I need to know how the internals of Consider this code (taken from the example linked below): let mut max = heap.peek_mut().unwrap();
if x < *max {
*max = x;
} vs if x < heap.peek().unwrap() {
heap.peek_mut().unwrap() = x;
} Why would someone choose the second one? It requires more and uglier code! Intuitively the first one should be the way to go, it has less boilerplate, less method calls and you would expect it to also be faster, but that's wrong.
LLVM should be able to optimize it, at least in most trivial usecases. But even if it can't, chances are that the rest of your code is at least an order of magnitude slower than that assignment.
This reddit thread has a simple example that runs considerably slower (1.5-2x slower) because of this missed optimization. |
Ok, that makes sense, thanks for that explanation. Thinking a bit more about it, you are probably right that the However, it would be great to have some benchmarks to confirm our assumptions. I took a quick look, but couldn't find any binary heap benchmarks yet. They should be in |
Might conflict with #76334 . |
I am concerned about the "without internal mutability" caveat here - I don't think we should assume that's not the case, especially when the user has explicitly called peek_mut() hinting that they're changing the value. We could set the flag in both deref and derefmut, but presumably that's not helpful? Maybe an entry-like API, which provides upgradable shared access would help here. |
@LukasKalbertodt I'm working on them, you can expect an update later today or tomorrow. @Mark-Simulacrum This PR aims to improve If we leave this PR as it is then if someone wants to use internal mutability it can make an explicit call to
We could take this opportunity to fix |
Yes, that's true that the documentation technically permits us to do this. I don't think that there's a need to add an explicit "resift" method to PeekMut. I would not want to document deref_mut as special. I am okay with this as-is -- I missed that we made the internal mutability claim that strong in the docs. As an aside, I am surprised that we document PeekMut as "Cost is O(1) in the worst case." -- it seems like the sifting in the destructor is O(log n) in the worst case. |
I think the documentation is referring to the |
The generated assembly after this change confirms that Since |
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
Oops, looks like I messed up something with git. |
I don't know how I ended up with those changes...
I couldn't figure out how to easily benchmark something like Dijkstra. What should be the input to test? Could it be randomized? In the end I opted for simple benchmarks for the main functions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a bunch for those benchmarks! Running the benchmarks locally:
With your sift
changes:
test binaryheap::bench_find_smallest_1000 ... bench: 347,545 ns/iter (+/- 6,682)
test binaryheap::bench_from_vec ... bench: 786,415 ns/iter (+/- 20,890)
test binaryheap::bench_into_sorted_vec ... bench: 493,413 ns/iter (+/- 3,795)
test binaryheap::bench_peek_mut_deref_mut ... bench: 1,203,026 ns/iter (+/- 8,469)
test binaryheap::bench_pop ... bench: 598,884 ns/iter (+/- 22,753)
test binaryheap::bench_push ... bench: 688,020 ns/iter (+/- 8,597)
Without the sift
changes:
test binaryheap::bench_find_smallest_1000 ... bench: 752,333 ns/iter (+/- 14,042)
test binaryheap::bench_from_vec ... bench: 796,535 ns/iter (+/- 73,442)
test binaryheap::bench_into_sorted_vec ... bench: 497,072 ns/iter (+/- 8,736)
test binaryheap::bench_peek_mut_deref_mut ... bench: 717,605 ns/iter (+/- 2,112)
test binaryheap::bench_pop ... bench: 606,103 ns/iter (+/- 4,773)
test binaryheap::bench_push ... bench: 680,548 ns/iter (+/- 3,965)
The three general benchmarks don't show any meaningful difference. The "smallest 1000" one showed a more than 2x performance improvement, while the bench_peek_mut_deref_mut
one ran notably slower. However, bench_peek_mut_deref_mut
is highly artificial and as stated above, I think hardly any real world code will be affected by this slow down. On the other hand, "smallest 1000" is real code, so 👍 on that change.
I left three mini inline comments. Also, could you fix the confusion in the docs (see this comment)? Just add a short sentence "If the item is modified, the the worst case time complexity is O(log n)" ... or something like that.
Finally, before merging this, could you fix the git history? The merge commits and "revert merge" commit should be removed. It would be best if your PR contains 3 commits in this order: "add benchmarks", "avoid useless sift_down", "fix peek_mut docs". In particularly, adding those benchmarks in a commit before the sift change makes it easier to compare both versions.
How experienced are you with git? If you want, I can guide you through it on Zulip or tell you all the commands necessary.
0716aab
to
abba50e
Compare
abba50e
to
ca15e9d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks a bunch! :)
@bors r+ |
📌 Commit ca15e9d has been approved by |
…Kalbertodt Avoid useless sift_down when std::collections::binary_heap::PeekMut is never mutably dereferenced If `deref_mut` is never called then it's not possible for the element to be mutated without internal mutability, meaning there's no need to call `sift_down`. This could be a little improvement in cases where you want to mutate the biggest element of the heap only if it satisfies a certain predicate that needs only read access to the element.
…Kalbertodt Avoid useless sift_down when std::collections::binary_heap::PeekMut is never mutably dereferenced If `deref_mut` is never called then it's not possible for the element to be mutated without internal mutability, meaning there's no need to call `sift_down`. This could be a little improvement in cases where you want to mutate the biggest element of the heap only if it satisfies a certain predicate that needs only read access to the element.
☀️ Test successful - checks-actions, checks-azure |
If
deref_mut
is never called then it's not possible for the element to be mutated without internal mutability, meaning there's no need to callsift_down
.This could be a little improvement in cases where you want to mutate the biggest element of the heap only if it satisfies a certain predicate that needs only read access to the element.