Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid useless sift_down when std::collections::binary_heap::PeekMut is never mutably dereferenced #75974

Merged
merged 3 commits into from
Sep 21, 2020

Conversation

SkiFire13
Copy link
Contributor

If deref_mut is never called then it's not possible for the element to be mutated without internal mutability, meaning there's no need to call sift_down.

This could be a little improvement in cases where you want to mutate the biggest element of the heap only if it satisfies a certain predicate that needs only read access to the element.

@rust-highfive
Copy link
Collaborator

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @LukasKalbertodt (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Aug 27, 2020
@LukasKalbertodt
Copy link
Member

This could be a little improvement in cases where you want to mutate the biggest element of the heap only if it satisfies a certain predicate that needs only read access to the element.

But in those cases, couldn't you simply not call peek_mut until you checked that condition? I guess it is rather unlikely that people call peek_mut and then at most immutably deref it later, right?

The assignment self.sift = true in the DerefMut worries me, as it makes it slightly slower which might or might not have an effect in practice. And note that one sift_down is also not terribly slow: if there is no need to sift, that's only one 2 * pos + 1 and one comparison.

So yeah, unfortunately, it's not clear to me if this would be an actual improvement :/
Do you have a specific use case for this?

@SkiFire13
Copy link
Contributor Author

But in those cases, couldn't you simply not call peek_mut until you checked that condition? I guess it is rather unlikely that people call peek_mut and then at most immutably deref it later, right?

I could, but I need to know how the internals of PeekMut work in order to realize I should. Otherwise it would be counterintuitive.

Consider this code (taken from the example linked below):

let mut max = heap.peek_mut().unwrap();
if x < *max {
    *max = x;
}

vs

if x < heap.peek().unwrap() {
    heap.peek_mut().unwrap() = x;
}

Why would someone choose the second one? It requires more and uglier code! Intuitively the first one should be the way to go, it has less boilerplate, less method calls and you would expect it to also be faster, but that's wrong.

The assignment self.sift = true in the DerefMut worries me, as it makes it slightly slower which might or might not have an effect in practice. And note that one sift_down is also not terribly slow: if there is no need to sift, that's only one 2 * pos + 1 and one comparison.

LLVM should be able to optimize it, at least in most trivial usecases. But even if it can't, chances are that the rest of your code is at least an order of magnitude slower than that assignment.

sift_down is not terribly slow, but it may become a bottleneck since it may be as expensive as the rest of your code (like in the following example)

Do you have a specific use case for this?

This reddit thread has a simple example that runs considerably slower (1.5-2x slower) because of this missed optimization.

@jyn514 jyn514 added T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Sep 3, 2020
@LukasKalbertodt
Copy link
Member

Ok, that makes sense, thanks for that explanation. Thinking a bit more about it, you are probably right that the self.sift = true will be optimized out or not make a significant difference.

However, it would be great to have some benchmarks to confirm our assumptions. I took a quick look, but couldn't find any binary heap benchmarks yet. They should be in library/alloc/benches/. Could you add a couple of benchmarks? For this PR, only benchmarks using peek_mut are relevant: please add one that shows the speedup of this PR, but also one that might get slower due to this change. And if you have the time, adding some other benchmarks not using peek_mut (something like Dijkstra maybe?) would be appreciated as well!
If you don't have any time to add benchmarks, just let me know!

@tesuji
Copy link
Contributor

tesuji commented Sep 5, 2020

Might conflict with #76334 .

@Mark-Simulacrum
Copy link
Member

I am concerned about the "without internal mutability" caveat here - I don't think we should assume that's not the case, especially when the user has explicitly called peek_mut() hinting that they're changing the value.

We could set the flag in both deref and derefmut, but presumably that's not helpful?

Maybe an entry-like API, which provides upgradable shared access would help here.

@SkiFire13
Copy link
Contributor Author

@LukasKalbertodt I'm working on them, you can expect an update later today or tomorrow.

@Mark-Simulacrum This PR aims to improve PeekMut's performance by not setting the flag when only deref is called, so just changing it would clearly not be an acceptable solution. Maybe add another method that acts as deref but doesn't set the flag? It would be explicit, so it wouldn't conflict with internal mutability, but would also not be intuitive.
The entry-like API could be another solution, but it also doesn't seem intuitive.

If we leave this PR as it is then if someone wants to use internal mutability it can make an explicit call to deref_mut to mark that the item has been mutated. IMO this is ok because the current docs don't give any guarantee on the use of peek_mut, let alone with internal mutability. In fact BinaryHeap's docs say that

It is a logic error for an item to be modified in such a way that the item's ordering relative to any other item, as determined by the Ord trait, changes while it is in the heap.

PeekMut's item is in the heap, so this applies, and neither peek_mut's nor PeekMut's docs say anything to correct this! So for the docs using peek_mut to modify the greatest element and make it smaller than another element is a logic error. Of course that's not the current case, but anyone that is currently using peek_mut is just relying on implementation details.

We could take this opportunity to fix peek_mut docs and specify its behaviour when using internal mutability according to this PR (that is, it doesn't guarantee to leave the heap in a consistent state, unless deref_mut is called).
There's also the possibility to add a clippy lint when PeekMut is never mutably dereferenced (and maybe consumed?).

@Mark-Simulacrum
Copy link
Member

Yes, that's true that the documentation technically permits us to do this. I don't think that there's a need to add an explicit "resift" method to PeekMut. I would not want to document deref_mut as special. I am okay with this as-is -- I missed that we made the internal mutability claim that strong in the docs.

As an aside, I am surprised that we document PeekMut as "Cost is O(1) in the worst case." -- it seems like the sifting in the destructor is O(log n) in the worst case.

@calebsander
Copy link
Contributor

As an aside, I am surprised that we document PeekMut as "Cost is O(1) in the worst case." -- it seems like the sifting in the destructor is O(log n) in the worst case.

I think the documentation is referring to the peek_mut() method itself, which is constant-time. But I agree it's a bit misleading to discount the cost of the destructor, as it will always run unless PeekMut::pop() is called.

@calebsander
Copy link
Contributor

calebsander commented Sep 5, 2020

You are probably right that the self.sift = true will be optimized out or not make a significant difference.

The generated assembly after this change confirms that sift is optimized out. If the x < *max comparison fails, the function returns immediately. Otherwise, it swaps the value and runs BinaryHeap::sift_down().

Since sift is optimized out anyways, I think this PR makes a lot more sense than the one I proposed (#76334, which removes the sift field entirely and always performs a sift_down() if PeekMut::pop() isn't called).

@rust-log-analyzer
Copy link
Collaborator

The job mingw-check of your PR failed (pretty log, raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
Initialized empty Git repository in /home/runner/work/rust/rust/.git/
##[command]git remote add origin https://github.com/rust-lang/rust
##[command]git config gc.auto 0
##[command]git config --get-all http.https://github.com/rust-lang/rust.extraheader
##[command]git -c http.extraheader="AUTHORIZATION: basic ***" fetch --tags --prune --progress --no-recurse-submodules --depth=2 origin +refs/heads/*:refs/remotes/origin/* +refs/pull/75974/merge:refs/remotes/pull/75974/merge

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @rust-lang/infra. (Feature Requests)

@SkiFire13
Copy link
Contributor Author

Oops, looks like I messed up something with git.

@SkiFire13
Copy link
Contributor Author

I don't know how I ended up with those changes...
Anyway, I added some benchmarks, in particular:

  • bench_find_smallest_1000: shows the improvements this PR brings compared to the current implementation in a possible usecase
  • bench_peek_mut_deref_mut: shows the overhead this PR brings in what should be the worst case.
  • bench_from_vec, bench_into_sorted_vec, bench_push and bench_pop: just benchmarks for the main BinaryHeap's functions

I couldn't figure out how to easily benchmark something like Dijkstra. What should be the input to test? Could it be randomized? In the end I opted for simple benchmarks for the main functions.

Copy link
Member

@LukasKalbertodt LukasKalbertodt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a bunch for those benchmarks! Running the benchmarks locally:

With your sift changes:

test binaryheap::bench_find_smallest_1000   ... bench:     347,545 ns/iter (+/- 6,682)
test binaryheap::bench_from_vec             ... bench:     786,415 ns/iter (+/- 20,890)
test binaryheap::bench_into_sorted_vec      ... bench:     493,413 ns/iter (+/- 3,795)
test binaryheap::bench_peek_mut_deref_mut   ... bench:   1,203,026 ns/iter (+/- 8,469)
test binaryheap::bench_pop                  ... bench:     598,884 ns/iter (+/- 22,753)
test binaryheap::bench_push                 ... bench:     688,020 ns/iter (+/- 8,597)

Without the sift changes:

test binaryheap::bench_find_smallest_1000   ... bench:     752,333 ns/iter (+/- 14,042)
test binaryheap::bench_from_vec             ... bench:     796,535 ns/iter (+/- 73,442)
test binaryheap::bench_into_sorted_vec      ... bench:     497,072 ns/iter (+/- 8,736)
test binaryheap::bench_peek_mut_deref_mut   ... bench:     717,605 ns/iter (+/- 2,112)
test binaryheap::bench_pop                  ... bench:     606,103 ns/iter (+/- 4,773)
test binaryheap::bench_push                 ... bench:     680,548 ns/iter (+/- 3,965)

The three general benchmarks don't show any meaningful difference. The "smallest 1000" one showed a more than 2x performance improvement, while the bench_peek_mut_deref_mut one ran notably slower. However, bench_peek_mut_deref_mut is highly artificial and as stated above, I think hardly any real world code will be affected by this slow down. On the other hand, "smallest 1000" is real code, so 👍 on that change.


I left three mini inline comments. Also, could you fix the confusion in the docs (see this comment)? Just add a short sentence "If the item is modified, the the worst case time complexity is O(log n)" ... or something like that.

Finally, before merging this, could you fix the git history? The merge commits and "revert merge" commit should be removed. It would be best if your PR contains 3 commits in this order: "add benchmarks", "avoid useless sift_down", "fix peek_mut docs". In particularly, adding those benchmarks in a commit before the sift change makes it easier to compare both versions.

How experienced are you with git? If you want, I can guide you through it on Zulip or tell you all the commands necessary.

library/alloc/benches/binaryheap.rs Outdated Show resolved Hide resolved
library/alloc/benches/binaryheap.rs Outdated Show resolved Hide resolved
library/alloc/benches/binaryheap.rs Outdated Show resolved Hide resolved
Copy link
Member

@LukasKalbertodt LukasKalbertodt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks a bunch! :)

@LukasKalbertodt
Copy link
Member

@bors r+

@bors
Copy link
Contributor

bors commented Sep 20, 2020

📌 Commit ca15e9d has been approved by LukasKalbertodt

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 20, 2020
Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this pull request Sep 21, 2020
…Kalbertodt

Avoid useless sift_down when std::collections::binary_heap::PeekMut is never mutably dereferenced

If `deref_mut` is never called then it's not possible for the element to be mutated without internal mutability, meaning there's no need to call `sift_down`.

This could be a little improvement in cases where you want to mutate the biggest element of the heap only if it satisfies a certain predicate that needs only read access to the element.
Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this pull request Sep 21, 2020
…Kalbertodt

Avoid useless sift_down when std::collections::binary_heap::PeekMut is never mutably dereferenced

If `deref_mut` is never called then it's not possible for the element to be mutated without internal mutability, meaning there's no need to call `sift_down`.

This could be a little improvement in cases where you want to mutate the biggest element of the heap only if it satisfies a certain predicate that needs only read access to the element.
@bors
Copy link
Contributor

bors commented Sep 21, 2020

⌛ Testing commit ca15e9d with merge a409a23...

@bors
Copy link
Contributor

bors commented Sep 21, 2020

☀️ Test successful - checks-actions, checks-azure
Approved by: LukasKalbertodt
Pushing a409a23 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Sep 21, 2020
@bors bors merged commit a409a23 into rust-lang:master Sep 21, 2020
@rustbot rustbot added this to the 1.48.0 milestone Sep 21, 2020
@SkiFire13 SkiFire13 deleted the peekmut-opt-sift branch September 21, 2020 08:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs Relevant to the library team, which will review and decide on the PR/issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants