-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tracking Issue for feature(iter_advance_by) #77404
Comments
I’ve already written implementations of these methods for most iterators in |
…tmcm Implement advance_by, advance_back_by for iter::Chain Part of rust-lang#77404. This PR does two things: - implement `Chain::advance[_back]_by` in terms of `advance[_back]_by` on `self.a` and `advance[_back]_by` on `self.b` - change `Chain::nth[_back]` to use `advance[_back]_by` on `self.a` and `nth[_back]` on `self.b` This ensures that `Chain::nth` can take advantage of an efficient `nth` implementation on the second iterator, in case it doesn't implement `advance_by`. cc `@scottmcm` in case you want to review this
The fact that |
That means that every exising Iterator implementation out there that already has an efficient nth() will now automatically get a default slow advance_by() added, implemented as a next() loop, right? I don't think it can be expected that all these implementations will be updated overnight the moment this hits stable, or even get updated at all. Generic code will then have to make a choice between 1) being efficient/simple with std's Iterators but potentially very slow with other Iterators ( I like |
Returning an I'm wondering about the downside of having to return the 'missing length' from advance_by. I can imagine there are data structures that do not store their length, but do have an upper bound on their length. (E.g. a fixed size buffer containing a C-style 0-terminated string.) Those can just return What are the advantages of returning |
Since However even if it didn't return the length it isn't clear how the default would be implemented. You can't have the defaults of So while I admit that it is unfortunate that switching callers from |
You're absolutely right, which is why we're not going to change any uses of
The main problem that
I don't really have a satisfying answer to this... There are indeed iterators out there for which |
I also don't have any solutions here. :( Would've been nice if Anyway, let's make sure these concerns are discussed properly before stabilization. Can you add them to the 'steps' above? (The return type, the inefficient default implementation for existing 'random access' iterators, and having to calculate the number of elements even when that's unused in many cases.) |
@kevincox It's unfortunate, but it can be solved by allowing "a minimum complete implementation to be specified mutually for recursive default methods" (see e.g.: rust-lang/rfcs#628). It's a really neat and useful feature, but I'm not sure if lang-team want to spend it's time budget on this currently (It doesn't seem like there were much progress since 2015...). |
Is that really the intended specification? Some adapters and sources could implement Wouldn't it be better to explicitly allow implementations to skip side-effects? Having an |
If the function passed to |
Optimizations are far from perfect, we have a lot of code that aims to make things more efficient. And I gave a simple example, in practice iterator pipelines are often more complex and can run into inlining thresholds which defeat any further optimizations.
There always has to be a first, so that's not a strong argument. There's prior art for that, the java stream API explicitly doesn't guarantee that side-effects for many of its operations happen at all or happen in any particular order. That enables this kind of optimization. Of course that behavior should be noted in its documentation, that's why I quoted and asked about the specification. |
Perhaps it could be useful to have a version of |
Since Having separate opt-in methods would be a lot trickier in rust because Java has a private default implementation which examines and optimizes the whole stream pipeline while in Rust the adapters operate more independently and talk less to each other. Propagating behavior backwards without introducing new fields in each adapter that would hold such configuration requires new methods to be implemented that have different behavior. Opting out of an optimization is a lot easier than opting into it for the whole pipeline. Opt-in requires a more efficient method to be implemented for each adapter. Opting out only needs to add 1 adapter that doesn't have the optimization and uses the naive approach instead. |
It is important to note that the compilers definition of side effects is very different than a programmers may be. For example you may use |
Hmm, that's true... But what about Unless there are any cases where you would want to sometimes skip side effects and sometimes execute them, with the same iterator? Also, in terms of what other methods would need multiple versions: Yes/maybe:
No:
Nothing else accepts a function and returns an iterator. |
There also are So duplicating those would require several adapter structs and copy-paste (or macros) rather than just implementing |
Hmm, that's true... I guess you could instead make wrapper structs, but there would still be a lot of boilerplate, although probably per-method rather than per-implementor... Also, for |
Actually there are hypothetical optimizations that could skip those. E.g. if you take an iterator from a sorted collection such as The same can be done with
Indeed, that's what I said earlier #77404 (comment) |
For an opt-out we could do |
Another adapter that could benefit from skipping side-effects is |
Well, eliminating side-effects of closures may also seem surprising, at least when you're explicitly asking for a mutating iterator. Currently there's a lot of untapped optimization opportunity in iterators. But backwards compatibility makes it difficult to change behavior. Java got this right from the start with its Stream API which explicitly says side-effects may be elided. I think |
Another possibility could be to have some sort of |
That would be a much bigger change since it would require compiler changes and people might want to rely on pure functions for other reasons and then maybe ask for The other problem is that what the compiler considers side-effects may not be the same what the developer thinks are relevant side-effects. E.g. we might want to make different tradeoffs for |
I'm currently working on implementing I'm also testing whether |
…tmcm Implement advance_by, advance_back_by for slice::{Iter, IterMut} Part of rust-lang#77404. Picking up where rust-lang#77633 was closed. I have addressed rust-lang#77633 (comment) by restoring `nth` and `nth_back`. So according to that comment this should already be r=m-ou-se, but it has been sitting for a while.
Wouldn't the default implementation possibly be more efficient if it were to be implemented using |
…shtriplett implement advance_(back_)_by on more iterators Add more efficient, non-default implementations for `feature(iter_advance_by)` (rust-lang#77404) on more iterators and adapters. This PR only contains implementations where skipping over items doesn't elide any observable side-effects such as user-provided closures or `clone()` functions. I'll put those in a separate PR.
…shtriplett implement advance_(back_)_by on more iterators Add more efficient, non-default implementations for `feature(iter_advance_by)` (rust-lang#77404) on more iterators and adapters. This PR only contains implementations where skipping over items doesn't elide any observable side-effects such as user-provided closures or `clone()` functions. I'll put those in a separate PR.
…shtriplett implement advance_(back_)_by on more iterators Add more efficient, non-default implementations for `feature(iter_advance_by)` (rust-lang#77404) on more iterators and adapters. This PR only contains implementations where skipping over items doesn't elide any observable side-effects such as user-provided closures or `clone()` functions. I'll put those in a separate PR.
While working on another advance_by implementation I realized that I introduced an inconsistency in #87091. Now the documentation says:
k is always less than n is incompatible with My confusion stems from the Ok/Err distinction conveying the same information as k < n already would. It encodes information redundantly. That might be useful for optimizations or to simplify control flow, e.g. by letting flags on which we would branch bubble up through a return chain. The use of Maybe a different return type would be better. Or maybe just the documentation needs to be fixed. |
I think that |
That presupposes Result is the right return type, which I'm also uncertain about. |
Right, I should have been explicit. I think it is the right return type. You have the "happy case" and the "unhappy case". At least for the caller this makes sense. However you are right that the type is broader than it needs to be. It is possible for I don't think I think we could come up with other return types but I think the Ok and Err cases are important to make obviously distinct. Just returning usize doesn't really make that easy to do as you either have to store the expected amount to compare or we make it count down to zero but now you need to remember to compare to zero (I guess we could mark it as If fact thinking about it I may consider changing the return value to the be amount not returned AKA the amount remaining. Basically a fallback fn advance_by(&mut self, mut count: usize) -> Result<(), usize> {
while count > 0 {
if self.next().is_none() {
return Err(count)
}
count -= 1;
}
Ok(())
} Then you can easily convert the return value to remaining by Another option would be to make a custom try type where 0 was considered "Ok" and non-zero was considered "Err" and provides |
It would be mostly useful for minor optimization to skip a followup operation, such as the Granted, the value of that distinction is minor, all it costs to notice the exhaustion is another call to
Many of the implementations need to do that anyway. |
#89916 removes the inconsistency from the docs and updates the implementations to always return |
For discussion: #92284 which changes the return type to |
What is blocking this feature? advance_by is super useful. |
This method is one of only 4 in the vtable of |
…ochenkov Expose size_hint() for TokenStream's iterator The iterator for `proc_macro::TokenStream` is a wrapper around a `Vec` iterator: https://github.com/rust-lang/rust/blob/babff2211e3ae9ef52852dc1b01f3eacdd94c12e/library/proc_macro/src/lib.rs#L363-L371 so it can cheaply provide a perfectly precise size hint, with just a pointer subtraction: https://github.com/rust-lang/rust/blob/babff2211e3ae9ef52852dc1b01f3eacdd94c12e/library/alloc/src/vec/into_iter.rs#L170-L177 I need the size hint in syn (https://github.com/dtolnay/syn/blob/1.0.98/src/buffer.rs) to reduce allocations when converting TokenStream into syn's internal TokenBuffer representation. Aside from `size_hint`, the other non-default methods in `std::vec::IntoIter`'s `Iterator` impl are `advance_by`, `count`, and `__iterator_get_unchecked`. I've included `count` in this PR since it is trivial. I did not include `__iterator_get_unchecked` because it is spoopy and I did not feel like dealing with that. Lastly, I did not include `advance_by` because that requires `feature(iter_advance_by)` (rust-lang#77404) and I noticed this comment at the top of libproc_macro: https://github.com/rust-lang/rust/blob/babff2211e3ae9ef52852dc1b01f3eacdd94c12e/library/proc_macro/src/lib.rs#L20-L22
Expose size_hint() for TokenStream's iterator The iterator for `proc_macro::TokenStream` is a wrapper around a `Vec` iterator: https://github.com/rust-lang/rust/blob/0f39245cd48329538f97bdf8796a5b1521dceb52/library/proc_macro/src/lib.rs#L363-L371 so it can cheaply provide a perfectly precise size hint, with just a pointer subtraction: https://github.com/rust-lang/rust/blob/0f39245cd48329538f97bdf8796a5b1521dceb52/library/alloc/src/vec/into_iter.rs#L170-L177 I need the size hint in syn (https://github.com/dtolnay/syn/blob/1.0.98/src/buffer.rs) to reduce allocations when converting TokenStream into syn's internal TokenBuffer representation. Aside from `size_hint`, the other non-default methods in `std::vec::IntoIter`'s `Iterator` impl are `advance_by`, `count`, and `__iterator_get_unchecked`. I've included `count` in this PR since it is trivial. I did not include `__iterator_get_unchecked` because it is spoopy and I did not feel like dealing with that. Lastly, I did not include `advance_by` because that requires `feature(iter_advance_by)` (rust-lang/rust#77404) and I noticed this comment at the top of libproc_macro: https://github.com/rust-lang/rust/blob/0f39245cd48329538f97bdf8796a5b1521dceb52/library/proc_macro/src/lib.rs#L20-L22
#92284 has been updated if anyone wants to take a look, it now returns a |
A previous versin of PathIterator used `advance_by`. I fell for all the issues described in rust-lang/rust#77404 (comment), and eventually switched to using `nth_back` which has already stabilised.
A previous versin of PathIterator used `advance_by`. I fell for all the issues described in rust-lang/rust#77404 (comment), and eventually switched to using `nth_back` which has already stabilised.
Previously, there was a comment stating that we must wait for [`Iterator::advance_by` to be stabilizied](rust-lang/rust#77404), but it makes more sense to use the [existing `Iterator::nth` method](https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.nth) anyway, so I updated all callsites to use it. Depending on the iterator implementation, this could result in a performance improvement.
This is a tracking issue for the methods
Iterator::advance_by
andDoubleEndedIterator::advance_back_by
.The feature gate for the issue is
#![feature(iter_advance_by)]
.Previously the recommendation was to implement
nth
andnth_back
on your iterators to efficiently advance them by multiple elements at once (useful for.skip(n)
and.step_by(n)
). After this feature is stabilized the recommendation will/should be to implementadvance_by
andadvance_back_by
instead, because they compose better and are often simpler to implement.Iterators in libcore that wrap another iterator (possibly from elsewhere than libcore) will need to keep their
nth
andnth_back
implementations for the foreseeable future and perhaps indefinitely, because the inner iterator may have an efficientnth
implementation without implementingadvance_by
as well.About tracking issues
Tracking issues are used to record the overall progress of implementation.
They are also used as hubs connecting to other relevant issues, e.g., bugs or open design questions.
A tracking issue is however not meant for large scale discussion, questions, or bug reports about a feature.
Instead, open a dedicated issue for the specific matter and add the relevant feature gate label.
Steps
Implementation history
Chain
(Implement advance_by, advance_back_by for iter::Chain #77594)slice::{Iter, IterMut}
(Implement advance_by, advance_back_by for slice::{Iter, IterMut} #87387, #[inline] slice::Iter::advance_by #87736)vec::IntoIter, ops::Range, iter::{Cycle, Skip, Take, Copied, Flatten}
(implement advance_(back_)_by on more iterators #87091)The text was updated successfully, but these errors were encountered: