-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
StepBy<_, Range<_>> optimises poorly #31155
Comments
Here's my ideal step by: https://play.rust-lang.org/?gist=f967691b33eef2f2de70&version=stable For example, the boundary checks in the loop are eliminated. The idea is simply to do exactly what a for loop in C would do. |
I'm guessing this is very likely a case of the |
It looks like it might be possible to get rid of |
I think we decided the user has to ensure that they don't take too many elements from a |
i.e. behavior past the overflow point is not part of the interface. |
@bluss hm, while it's nice to have something to compare against for the simple cases, your ideal version is unfortunately somewhat useless: we obviously want to do what a C |
Maybe useless to replace current step_by, yes. It's a tall order, isn't it? Do what the for loop does, Zero overhead, Check for overflow. Computing the end up front, it sounds doable though. For the |
I think that making It seems to me that it makes most sense for let mut count = 0;
some_iter.filter_map(|x| { if count == 0 { count = n - 1; Some(x) } else { count -= 1; None }) (So, yes, some scheme that computes the appropriate upper bound seems like it may be the best plan of attack.) |
I was against having But I don't agree that it has infinite length. It has a debug assertion for overflow. |
Yes, the Debug assertions makes the "real" behaviour a type a little bit of a grey area, but I think one certainly couldn't say that an overflowing version of that iterator is finite: no matter what build configuration you have, you'll never get a |
Very fair point. The nice behavior is certainly preferable. When people pull out a while loop on the forum and discover "this has better performance than step_by", it's because they completely disregard the overflow (wraparound) case, though. RangeFrom is neither finite nor infinite, but it ends with a bang 😉 |
Introducing new footguns definitely doesn't sound like a desirable thing. It seems to me that the current behavior (ie. overflow checking) is the most desirable behavior for the general use of the That I think it's worth asking whether we can—and whether it's worth—creating an optimized path for
|
Today's reproduction still doesn't optimize |
Specialize StepBy<Range(Inclusive)> Part of #51557, related to #43064, #31155 As discussed in the above issues, `step_by` optimizes very badly on ranges which is related to 1. the special casing of the first `StepBy::next()` call 2. the need to do 2 additions of `n - 1` and `1` inside the range's `next()` This PR eliminates both by overriding `next()` to always produce the current element and also step ahead by `n` elements in one go. The generated code is much better, even identical in the case of a `Range` with constant `start` and `end` where `start+step` can't overflow. Without constant bounds it's a bit longer than the manual loop. `RangeInclusive` doesn't optimize as nicely but is still much better than the original asm. Unsigned integers optimize better than signed ones for some reason. See the following two links for a comparison. [godbolt: specialization for ..](https://godbolt.org/g/haHLJr) [godbolt: specialization for ..=](https://godbolt.org/g/ewyMu6) `RangeFrom`, the only other range with an `Iterator` implementation can't be specialized like this without changing behaviour due to overflow. There is no way to save "finished-ness". The approach can not be used in general, because it would produce side effects of the underlying iterator too early. May obsolete #51435, haven't checked.
Today, the feature is no longer in nightly, and we get fully optimized results for both:
closing! |
There's a lot going on inside
StepBy<_, Range<_>>
'sIterator
implementation and LLVM does a reasonable job of cutting things down, but doesn't get all the way (definition inlined for context if it changes in future, and to allow easy experimentation):Optimised asm (it'd be great for the first to be like the second):
https://play.rust-lang.org/?gist=a926869a4cf59d6683c4
#24660 previously had a somewhat similar problem, although this one is compounded by using
checked_add
implemented in terms of LLVM's overflow intrinsics, which the LLVM performance tips explicitly recommend against:The text was updated successfully, but these errors were encountered: