-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Iterator::intersperse #75784
Add Iterator::intersperse #75784
Conversation
(rust_highfive has picked a reviewer for you, use r? to override) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any benefit of intersperse
over something like:
let range = 0..100;
let a = repeat(3);
range.zip(a).flat_map(|(i, j)| {
once(i).chain(once(j))
});
The only difference would be that there would be an extra element at the end in this case.
Edit: Sorry for the late edit, but I realize the fold portion of this could also be represented as
.fold_first(|acc, next| f(acc, interspersed_element, next))
. You can also create an iterator by taking the first element off and prefixing the rest with the repeated one, which handles the case where you need an interspersed iterator. I think that this iterator can be represented by other more primitive iterators, but if it is a common use case I think it could be useful.
@jonas-schievink Why not add intercalate while we are at this? https://hackage.haskell.org/package/base-4.14.0.0/docs/Data-List.html#v:intercalate |
Avoiding that last element is exactly why you'd use intersperse.
Never heard of that one. It should probably be in its own PR. |
How far do we want to go with these methods? I'd vote for "probably less commonly used ones" to live in |
At least this is generating a better assembly than a manual hand rolled iterator and other methods, weird that I can only find this method the only way to join without panicking. use std::iter::{once, repeat};
pub fn t1() -> String {
let mut iter = ["hello", "world"].iter().copied();
let first = iter.next();
let rest = repeat(" ")
.zip(iter)
.flat_map(|(x, y)| once(x).chain(once(y)));
first.iter().copied().chain(rest).collect()
} https://rust.godbolt.org/z/hK5jdc cc @lzutao did I did anything wrongly? |
Some(self.element.clone()) | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have a version of try_fold
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, hey, I wrote this! Specializing try_fold
is probably not necessary. Iterator::for_each
is implemented in terms of Iterator::fold
, not Iterator::try_fold
, so specializing fold
alone is sufficient to get really substantial performance improvements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I see that other specializations do include try_fold
too. I think there may be a reason they do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are several forwarding hierarchies in Iterator
. fold
and try_fold
are separate ones.
See the (now somewhat outdated) analysis in the PR that made fold
independent from try_fold
: #72139 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@the8472 So it means we should implement all the important ones?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the custom fold
implementation provides a speedup over the default one which relies on next
then the other ones would likely benefit too.
Additionally the other optimization and extension traits that most iterator adapters implement such as ExactSizeIterator
, TrustedLen
, DoubleEndedIterator
, FusedIterator
, TrustedRandomAccess
all seem applicable here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added an implementation of try_fold
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have removed try_fold
again, but not the tests. Any reason for that?
Ah in the link you sent I think the compiled section for intersperse has the |
@JulianKnodt Thanks, I didn't know about the rustc option can be set there. Updated https://rust.godbolt.org/z/az3r89 Looks like all of the versions perform good except there is bound check for the |
I really wish we had a nice way to let people disambiguate extension method conflicts 🙁 |
I'm going to r? @LukasKalbertodt as a libs team member. The implementation here looks good to me. I personally suspect that it might not be a good idea to land this given the itertools conflict, but I did have an idea for how we could alleviate that in the future at least: itertools could have That doesn't help us now though. |
I think we could itertools could use |
☔ The latest upstream changes (presumably #70793) made this pull request unmergeable. Please resolve the merge conflicts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I'm the current maintainer of Itertools. I'm thrilled to see this PR! Intersperse
is widely useful, and very deserving of being a method on Iterator
. My very first PR to Itertools was actually optimizing Intersperse::fold
, because chain
s of Intersperse
ed tokens are widespread in the internals of the standard proc macro libraries.
I personally suspect that it might not be a good idea to land this given the itertools conflict
I really wouldn't want Itertools to get in the way of a good addition. As an Itertools user, the fixes for these sorts of conflicts are pretty simple: you just used the fully-qualified method call syntax, or limit the scope of your import of Itertools
so that your call to libcore's intersperse
is unaffected. (For the uninitiated, the introduction of Iterator::flatten
caused the same issues. I don't think it posed much more than a minor inconvenience.)
It's only as an Itertools maintainer that these sorts of conflicts cause me a real headache. I really don't like that our README has to discourage contributions to minimize these sorts of issues, and I don't like not merging PRs that seem so useful that I can foresee them making their way into Iterator
.
On that note, our upcoming 0.10.0 release is going to include intersperse_with
, a generalized form of intersperse
. If Iterator
is going to gain intersperse
, it doesn't seem out of the question that it might want intersperse_with
too. Should I cut it from the release to avoid a potential future conflict?
in the future at least: itertools could have #[cfg(not(rust_has_intersperse))] on its extension trait method, set in build.rs via something like https://docs.rs/autocfg/1.0.1/autocfg/
This seems promising(!!!), but maybe a little unpleasant for contributors. Any new, non-allocating method added to Itertools
poses a potential future conflict with Itertools
. We'd need to test for every such method name in our build.rs
. I'd much prefer a permanent solution to this problem.
Some(self.element.clone()) | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, hey, I wrote this! Specializing try_fold
is probably not necessary. Iterator::for_each
is implemented in terms of Iterator::fold
, not Iterator::try_fold
, so specializing fold
alone is sufficient to get really substantial performance improvements.
Co-authored-by: Ivan Tham <[email protected]>
Co-authored-by: Oliver Middleton <[email protected]>
@@ -0,0 +1,76 @@ | |||
use super::Peekable; | |||
|
|||
/// An iterator adapter that places a separator between all elements. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// An iterator adapter that places a separator between all elements. | |
/// An iterator adapter that places a separator between all elements. | |
/// | |
/// This `struct` is created by [`Iterator::intersperse`]. See its | |
/// documentation for more. |
Document how was in created.
separator: I::Item, | ||
iter: Peekable<I>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe iter
should comes first since it is the most important thing?
separator: I::Item, | |
iter: Peekable<I>, | |
iter: Peekable<I>, | |
separator: I::Item, |
// Use `peek()` first to avoid calling `next()` on an empty iterator. | ||
if !self.needs_sep || self.iter.peek().is_some() { | ||
if let Some(x) = self.iter.next() { | ||
accum = f(accum, x); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't this result in sep, item, sep, item
if next()
was already called once before fold
on iterator with three item
?
I thought fold should always start with an item
instead of a sep
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought fold should always start with an
item
instead of asep
?
Why would that be? As far as I see it, that's intended behavior. The iterator is just a sequence of items and if you already got the first one, methods like fold
will start with the second item (which, in this case, is a separator).
let next_is_elem = !self.needs_sep; | ||
let lo = lo.saturating_sub(next_is_elem as usize).saturating_add(lo); | ||
let hi = match hi { | ||
Some(hi) => hi.saturating_sub(next_is_elem as usize).checked_add(hi), | ||
None => None, | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this be better?
let next_is_elem = !self.needs_sep; | |
let lo = lo.saturating_sub(next_is_elem as usize).saturating_add(lo); | |
let hi = match hi { | |
Some(hi) => hi.saturating_sub(next_is_elem as usize).checked_add(hi), | |
None => None, | |
}; | |
let next_is_elem = !self.needs_sep as usize; | |
let lo = lo.saturating_sub(next_is_elem).saturating_add(lo); | |
let hi = match hi { | |
Some(hi) => hi.saturating_sub(next_is_elem).checked_add(hi), | |
None => None, | |
}; |
#[test] | ||
fn test_intersperse() { | ||
let xs = ["a", "", "b", "c"]; | ||
let v: Vec<&str> = xs.iter().map(|x| x.clone()).intersperse(", ").collect(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let v: Vec<&str> = xs.iter().map(|x| x.clone()).intersperse(", ").collect(); | |
let v: Vec<_> = xs.iter().copied().intersperse(", ").collect(); |
assert_eq!(text, "a, , b, c".to_string()); | ||
|
||
let ys = [0, 1, 2, 3]; | ||
let mut it = ys[..0].iter().map(|x| *x).intersperse(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let mut it = ys[..0].iter().map(|x| *x).intersperse(1); | |
let mut it = ys[..0].iter().copied().intersperse(1); |
// copied is the same as .map(|&x| x)
Not sure if this works but probably it does.
assert_eq!(iter.next(), Some(", ")); | ||
assert_eq!(iter.size_hint(), (5, Some(5))); | ||
|
||
assert_eq!([].iter().intersperse(&()).size_hint(), (0, Some(0))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add a test for one without upper bound? I think this is sufficient but just wondering if we need it since someone could change None
to return Some(T)
.
let mut iter = (1..3).intersperse(0); | ||
iter.clone().for_each(|x| assert_eq!(Some(x), iter.next())); | ||
|
||
let mut iter = (1..4).intersperse(0); | ||
iter.clone().for_each(|x| assert_eq!(Some(x), iter.next())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what is the difference between these and the above test? Do we need these?
#[test] | ||
fn test_fold_specialization_intersperse() { | ||
let mut iter = (1..2).intersperse(0); | ||
iter.clone().for_each(|x| assert_eq!(Some(x), iter.next())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add a test for fold
and try_fold
after one item is consumed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like most people are in favor of adding this despite the conflict with itertools
. I'm also fine with it. But recently it seems like these kinds of conflicts popped up a lot, so, as others have said, a proper lang solution would be really great.
I left a few inline comments, but nothing serious.
What worries me a bit about this API is that it encourages lots of clone
s and makes it easy to accidentally clone a bunch. Sure, if you pay attention, you probably can avoid expensive clones by only using Copy
separators. But we all know how "if you pay attention" works out long term in large projects.
However, I don't know how to improve this :/
- Using
Copy
instead ofClone
as bound seems overly restrictive. Sometimes people might actually want to clone. - Forcing
I: Iterator<Item = &Separator>
could work but is also kinda awkward? And might not even work as iterators can't return references toself
. - Using
Borrow
bounds and yieldingCow
s (owned for items, borrowed for iterator) makes the API way too complicated.
{ | ||
let mut accum = init; | ||
|
||
// Use `peek()` first to avoid calling `next()` on an empty iterator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's only useful for iterators that are not fused, right? Or what do we gain by not calling next()
on an empty iterator?
Also, I think this would break for the following case:
// Imagine `it` is a non fused iterator that yields: `None`, `Some('a')`, `None`
it.intersperse('x').fold(String::new(), |mut s, c| { s.push(c); s })
That would result in "xa"
with this implementation, but should yield ""
, right?
// Use `peek()` first to avoid calling `next()` on an empty iterator. | ||
if !self.needs_sep || self.iter.peek().is_some() { | ||
if let Some(x) = self.iter.next() { | ||
accum = f(accum, x); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought fold should always start with an
item
instead of asep
?
Why would that be? As far as I see it, that's intended behavior. The iterator is just a sequence of items and if you already got the first one, methods like fold
will start with the second item (which, in this case, is a separator).
let hi = match hi { | ||
Some(hi) => hi.saturating_sub(next_is_elem as usize).checked_add(hi), | ||
None => None, | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No hard opinion, but this should work as well:
let hi = match hi { | |
Some(hi) => hi.saturating_sub(next_is_elem as usize).checked_add(hi), | |
None => None, | |
}; | |
let hi = hi.and_then(|hi| hi.saturating_sub(next_is_elem as usize).checked_add(hi)); |
Some(self.element.clone()) | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have removed try_fold
again, but not the tests. Any reason for that?
/// let hello = ["Hello", "World"].iter().copied().intersperse(" ").collect::<String>(); | ||
/// assert_eq!(hello, "Hello World"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another example where separator
is yielded more than once would be nice. And preferably with slices instead of strings.
☔ The latest upstream changes (presumably #77926) made this pull request unmergeable. Please resolve the merge conflicts. Note that reviewers usually do not review pull requests until merge conflicts are resolved! Once you resolve the conflicts, you should change the labels applied by bors to indicate that your PR is ready for review. Post this as a comment to change the labels:
|
@jonas-schievink if you can resolve the conflict, we can get this reviewed quickly :) |
I probably won't have time to drive this further and address all the comments, so closing. In case anyone wants to finish this, feel free! |
Thanks :) |
Add `Iterator::intersperse` This is a rebase of rust-lang#75784. I'm hoping to push this past the finish line! cc `@jonas-schievink`
This is occasionally useful when wanting to separate an iterator's items.
Implementation adapted from itertools.
Drawback: Breaks all code currently using itertools'
intersperse
method since the method name collides.