-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ironing out StepBy<Range>'s performance issues #51557
Comments
Overall yes; someone should make a PR 🙂 Probably best to use specialization for now to hide the detail. (I suppose a doc-hidden unstable method on Iterator could work too, but that feels wrong.) A few minor things:
It would still be needed for non-
It's hard to guess at exactly how it should be written without looking at how LLVM handles it. Something like that seems plausible, but I could imagine seemingly-unimportant things like the branch orderings affecting the loop passes. Maybe it would be possible to have a codegen test to make sure? |
I've copied the std implementations onto godbolt and added the specialization. As expected, it improves the generated code a lot. Adding this specializion slightly changes what we're expecting from the Is there any way we can get rid of the TryFrom baggage? |
I haven't looked at it in detail yet, but reminder that this isn't exactly what we want to reach: // what we want to reach
pub fn manual_while() {
let mut n = 0;
while n < UPPER {
test::black_box(n);
n += STEP;
}
} Because that doesn't have overflow detection, and is thus an infinite loop for something like |
Good catch. That's also what caused the difference between manual and specialized for u8. |
Another issue I've found is that with
|
Specialize StepBy<Range(Inclusive)> Part of #51557, related to #43064, #31155 As discussed in the above issues, `step_by` optimizes very badly on ranges which is related to 1. the special casing of the first `StepBy::next()` call 2. the need to do 2 additions of `n - 1` and `1` inside the range's `next()` This PR eliminates both by overriding `next()` to always produce the current element and also step ahead by `n` elements in one go. The generated code is much better, even identical in the case of a `Range` with constant `start` and `end` where `start+step` can't overflow. Without constant bounds it's a bit longer than the manual loop. `RangeInclusive` doesn't optimize as nicely but is still much better than the original asm. Unsigned integers optimize better than signed ones for some reason. See the following two links for a comparison. [godbolt: specialization for ..](https://godbolt.org/g/haHLJr) [godbolt: specialization for ..=](https://godbolt.org/g/ewyMu6) `RangeFrom`, the only other range with an `Iterator` implementation can't be specialized like this without changing behaviour due to overflow. There is no way to save "finished-ness". The approach can not be used in general, because it would produce side effects of the underlying iterator too early. May obsolete #51435, haven't checked.
#111850 solved this for unsigned integers. Is that sufficient? |
The behaviour of
<Range<_> as Iterator>::nth
has a slight mismatch withStepBy
(orStep
depending on your viewpoint) as @scottmcm has found out, resulting in sub-optimal performance.On every iteration, the range has to first step forwards
n-1
times to get the next element and then advance again by 1.I'm hoping we can improve
step_by
into a 100% zero-cost abstraction.It seems like the performance issue is specific to
StepBy<Range>
. I'm thinking therefore that we could specializeIterator for StepBy<Range<I>>
such that it would use @scottmcm's suggested semantics. Like this:That also avoids the branch on a regular
next()
. I haven't looked at the other methods but that boolean inStepBy
could possibly become superfluous. During construction of theStepBy
adapter, thesize
in.step_by(size)
is decremented and this specialization has to counter-add 1 every time but that should be optimized away if inlined.If someone were to depend on side-effects in
Step::add_usize
(when the trait is stabilized), this pre-stepping would become weird. Same thing with a hypotheticalnext_and_skip_ahead()
.@scottmcm what do you think of this?
The text was updated successfully, but these errors were encountered: