-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
optimize implementation of Zip::fold and company #100124
optimize implementation of Zip::fold and company #100124
Conversation
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @m-ou-se (or someone else) soon. Please see the contribution instructions for more information. |
Hey! It looks like you've submitted a new PR for the library teams! If this PR contains changes to any Examples of
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an excellent first contribution 👍
I've left some initial feedback. Zip
is weird in the sense that it needs to choose which inner iterator to iterate internally and which one externally, but to me choosing the first one to iterate internally seems like a very reasonable choice.
#[inline] | ||
default fn fold<T, F>(self, init: T, mut f: F) -> T | ||
where | ||
F: FnMut(T, Self::Item) -> T, | ||
{ | ||
let mut a = self.a; | ||
let mut b = self.b; | ||
|
||
let acc = a.try_fold(init, move |acc, x| match b.next() { | ||
Some(y) => Ok(f(acc, (x, y))), | ||
None => Err(acc), | ||
}); | ||
|
||
match acc { | ||
Ok(exhausted_a) => exhausted_a, | ||
Err(exhausted_b) => exhausted_b, | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When a fold
implementation can't be implemented on top of an inner fold
, it's common in libcore to delegate to self
's try_fold
instead in order to avoid some duplicate logic. See e.g. Take::fold
. TakeWhile
, MapWhile
, and Scan
do the same thing. You could see if it affects benchmark results in any way, but I'd hope that it won't.
SIde note: closures in the implementation of iterator adapters are typically not written outright, but returned from a nested function, see #62429.
let acc = a.try_fold(init, move |acc, x| match b.next() { | ||
Some(y) => { | ||
let result = f(acc, (x, y)); | ||
match result.branch() { | ||
ControlFlow::Continue(continue_) => Ok(continue_), | ||
ControlFlow::Break(break_) => Err(R::from_residual(break_)), | ||
} | ||
} | ||
None => Err(R::from_output(acc)), | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pattern of either continuing with an output value or breaking with an output value or a residual value can be expressed a lot simpler by the ControlFlow
enum, and specifically its (non-public) from_try
and into_try
methods. Take::try_fold
illustrates this really well.
#[inline] | ||
default fn rfold<T, F>(mut self, init: T, mut f: F) -> T | ||
where | ||
A: DoubleEndedIterator + ExactSizeIterator, | ||
B: DoubleEndedIterator + ExactSizeIterator, | ||
F: FnMut(T, Self::Item) -> T, | ||
{ | ||
self.adjust_back(); | ||
let mut a = self.a; | ||
let mut b = self.b; | ||
|
||
let acc = a.try_rfold(init, move |acc, x| match b.next_back() { | ||
Some(y) => Ok(f(acc, (x, y))), | ||
None => Err(acc), | ||
}); | ||
|
||
match acc { | ||
Ok(exhausted_a) => exhausted_a, | ||
Err(exhausted_b) => exhausted_b, | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we know that the inner iterators have the same length (after adjusting), so it might be worth writing this in terms of self.a.rfold
instead. We'd have to panic in case b.next_back
returns None
. I have not benchmarked this, but in theory rfold
might lead to more efficient machine code, depending on the underlying iterator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this gave me the idea of applying a similar optimization to fold
. by checking if both iterators have the same length (which i assumed is a common case for zip
), we can use a.fold
instead of a.try_fold
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool! I didn't think about doing that with size_hint
.
thanks for the feedback! i've applied your suggestions to the code, and additionally made some further optimizations. (though they don't show a difference on the current zip benchmarks, which are somewhat limited) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there tests that target the new unsafe TrustedRandomAccess
implementations? If not we need them because that particular specialization has a long history of unsoundness.
- remove size_hint check in fold - add tests for new unsafe fold implementations
2b57dd9
to
8b87335
Compare
i've applied the previous suggestions. i tried to rebase on the master branch to be able to get relevant comparisons from benchmarks. i hope i didn't mess up any git stuff |
I'm curious |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
The job Click to see the possible cause of the failure (guessed by this bot)
|
💔 Test failed - checks-actions |
Whelp, that didn't work, but that also means it would have failed when approved anyway. @sarah-ek can you rebase this again and see if you can fix the compile error that appears? |
@sarah-ek FYI: when a PR is ready for review, send a message containing |
☔ The latest upstream changes (presumably #107634) made this pull request unmergeable. Please resolve the merge conflicts. |
Closing this as inactive. Feel free to reöpen this pr or create a new pr if you get the time to work on this. Thanks |
adds a more efficient implementation of
core::iter::Zip::{fold, rfold, try_fold, try_rfold}
on my machine, this gives a 15% speedup on the
iter::bench_skip_cycle_skip_zip_add_sum
benchmark, and a 32% speedup oniter::bench_skip_then_zip
.i haven't noticed any performance regressions