Add Iterator::intersperse #75784

jonas-schievink · 2020-08-21T19:58:39Z

This is occasionally useful when wanting to separate an iterator's items.

Implementation adapted from itertools.

Drawback: Breaks all code currently using itertools' intersperse method since the method name collides.

rust-highfive · 2020-08-21T19:58:42Z

r? @Mark-Simulacrum

(rust_highfive has picked a reviewer for you, use r? to override)

JulianKnodt

Is there any benefit of intersperse over something like:

let range = 0..100;
let a = repeat(3);
range.zip(a).flat_map(|(i, j)| {
    once(i).chain(once(j))
});

The only difference would be that there would be an extra element at the end in this case.

Edit: Sorry for the late edit, but I realize the fold portion of this could also be represented as
.fold_first(|acc, next| f(acc, interspersed_element, next)). You can also create an iterator by taking the first element off and prefixing the rest with the repeated one, which handles the case where you need an interspersed iterator. I think that this iterator can be represented by other more primitive iterators, but if it is a common use case I think it could be useful.

library/core/src/iter/adapters/intersperse.rs

library/core/src/iter/traits/iterator.rs

pickfire · 2020-08-22T07:00:51Z

@jonas-schievink Why not add intercalate while we are at this? https://hackage.haskell.org/package/base-4.14.0.0/docs/Data-List.html#v:intercalate

jonas-schievink · 2020-08-22T17:34:12Z

Is there any benefit of intersperse over something like:
let range = 0..100;
let a = repeat(3);
range.zip(a).flat_map(|(i, j)| {
    once(i).chain(once(j))
});
The only difference would be that there would be an extra element at the end in this case.

Avoiding that last element is exactly why you'd use intersperse.

@jonas-schievink Why not add intercalate while we are at this? https://hackage.haskell.org/package/base-4.14.0.0/docs/Data-List.html#v:intercalate

Never heard of that one. It should probably be in its own PR.

Lonami · 2020-08-22T19:55:34Z

It should probably be in its own PR.

How far do we want to go with these methods? I'd vote for "probably less commonly used ones" to live in itertools.

library/core/src/iter/adapters/intersperse.rs

pickfire · 2020-08-23T02:27:06Z

At least this is generating a better assembly than a manual hand rolled iterator and other methods, weird that I can only find this method the only way to join without panicking.

use std::iter::{once, repeat};

pub fn t1() -> String {
    let mut iter = ["hello", "world"].iter().copied();
    let first = iter.next();
    let rest = repeat(" ")
        .zip(iter)
        .flat_map(|(x, y)| once(x).chain(once(y)));
    first.iter().copied().chain(rest).collect()
}

https://rust.godbolt.org/z/hK5jdc cc @lzutao did I did anything wrongly?

pickfire · 2020-08-23T02:33:27Z

library/core/src/iter/adapters/intersperse.rs

+            Some(self.element.clone())
+        }
+    }
+


Should we have a version of try_fold?

Oh, hey, I wrote this! Specializing try_fold is probably not necessary. Iterator::for_each is implemented in terms of Iterator::fold, not Iterator::try_fold, so specializing fold alone is sufficient to get really substantial performance improvements.

But I see that other specializations do include try_fold too. I think there may be a reason they do it.

There are several forwarding hierarchies in Iterator. fold and try_fold are separate ones.
See the (now somewhat outdated) analysis in the PR that made fold independent from try_fold: #72139 (comment)

@the8472 So it means we should implement all the important ones?

If the custom fold implementation provides a speedup over the default one which relies on next then the other ones would likely benefit too.
Additionally the other optimization and extension traits that most iterator adapters implement such as ExactSizeIterator, TrustedLen, DoubleEndedIterator, FusedIterator, TrustedRandomAccess all seem applicable here.

Added an implementation of try_fold

You have removed try_fold again, but not the tests. Any reason for that?

JulianKnodt · 2020-08-23T02:44:40Z

Ah in the link you sent I think the compiled section for intersperse has the -O flag set, and setting it for the hand-made one provides somewhat similar assembly: https://rust.godbolt.org/z/baaP73

pickfire · 2020-08-23T08:09:16Z

@JulianKnodt Thanks, I didn't know about the rustc option can be set there. Updated https://rust.godbolt.org/z/az3r89 Looks like all of the versions perform good except there is bound check for the Join version for result.capacity() >= len.

scottmcm · 2020-08-24T17:43:57Z

Drawback: Breaks all code currently using itertools' intersperse method since the method name collides.

I really wish we had a nice way to let people disambiguate extension method conflicts 🙁

library/core/src/iter/adapters/intersperse.rs

Mark-Simulacrum · 2020-08-29T17:00:07Z

I'm going to r? @LukasKalbertodt as a libs team member. The implementation here looks good to me.

I personally suspect that it might not be a good idea to land this given the itertools conflict, but I did have an idea for how we could alleviate that in the future at least: itertools could have #[cfg(not(rust_has_intersperse))] on its extension trait method, set in build.rs via something like https://docs.rs/autocfg/1.0.1/autocfg/ or ideally cfg accessible (though that would need the feature stabilized and #72011 fixed).

That doesn't help us now though.

pickfire · 2020-08-30T00:59:06Z

I think we could itertools could use #[cfg(path = "Iterator::intersperse")].

bors · 2020-09-03T23:45:31Z

☔ The latest upstream changes (presumably #70793) made this pull request unmergeable. Please resolve the merge conflicts.

jswrenn

So, I'm the current maintainer of Itertools. I'm thrilled to see this PR! Intersperse is widely useful, and very deserving of being a method on Iterator. My very first PR to Itertools was actually optimizing Intersperse::fold, because chains of Intersperseed tokens are widespread in the internals of the standard proc macro libraries.

I personally suspect that it might not be a good idea to land this given the itertools conflict

I really wouldn't want Itertools to get in the way of a good addition. As an Itertools user, the fixes for these sorts of conflicts are pretty simple: you just used the fully-qualified method call syntax, or limit the scope of your import of Itertools so that your call to libcore's intersperse is unaffected. (For the uninitiated, the introduction of Iterator::flatten caused the same issues. I don't think it posed much more than a minor inconvenience.)

It's only as an Itertools maintainer that these sorts of conflicts cause me a real headache. I really don't like that our README has to discourage contributions to minimize these sorts of issues, and I don't like not merging PRs that seem so useful that I can foresee them making their way into Iterator.

On that note, our upcoming 0.10.0 release is going to include intersperse_with, a generalized form of intersperse. If Iterator is going to gain intersperse, it doesn't seem out of the question that it might want intersperse_with too. Should I cut it from the release to avoid a potential future conflict?

in the future at least: itertools could have #[cfg(not(rust_has_intersperse))] on its extension trait method, set in build.rs via something like https://docs.rs/autocfg/1.0.1/autocfg/

This seems promising(!!!), but maybe a little unpleasant for contributors. Any new, non-allocating method added to Itertools poses a potential future conflict with Itertools. We'd need to test for every such method name in our build.rs. I'd much prefer a permanent solution to this problem.

jswrenn · 2020-09-05T01:08:46Z

library/core/src/iter/adapters/intersperse.rs

+            Some(self.element.clone())
+        }
+    }
+


Oh, hey, I wrote this! Specializing try_fold is probably not necessary. Iterator::for_each is implemented in terms of Iterator::fold, not Iterator::try_fold, so specializing fold alone is sufficient to get really substantial performance improvements.

library/core/src/iter/adapters/intersperse.rs

Co-authored-by: Ivan Tham <[email protected]>

Co-authored-by: Oliver Middleton <[email protected]>

pickfire · 2020-09-12T03:25:55Z

library/core/src/iter/adapters/intersperse.rs

@@ -0,0 +1,76 @@
+use super::Peekable;
+
+/// An iterator adapter that places a separator between all elements.


Suggested change

/// An iterator adapter that places a separator between all elements.

/// An iterator adapter that places a separator between all elements.

///

/// This `struct` is created by [`Iterator::intersperse`]. See its

/// documentation for more.

Document how was in created.

pickfire · 2020-09-12T03:28:24Z

library/core/src/iter/adapters/intersperse.rs

+    separator: I::Item,
+    iter: Peekable<I>,


I believe iter should comes first since it is the most important thing?

Suggested change

separator: I::Item,

iter: Peekable<I>,

iter: Peekable<I>,

separator: I::Item,

pickfire · 2020-09-12T03:32:51Z

library/core/src/iter/adapters/intersperse.rs

+        // Use `peek()` first to avoid calling `next()` on an empty iterator.
+        if !self.needs_sep || self.iter.peek().is_some() {
+            if let Some(x) = self.iter.next() {
+                accum = f(accum, x);
+            }
+        }


Wouldn't this result in sep, item, sep, item if next() was already called once before fold on iterator with three item?

I thought fold should always start with an item instead of a sep?

I thought fold should always start with an item instead of a sep?

Why would that be? As far as I see it, that's intended behavior. The iterator is just a sequence of items and if you already got the first one, methods like fold will start with the second item (which, in this case, is a separator).

pickfire · 2020-09-12T03:35:08Z

library/core/src/iter/adapters/intersperse.rs

+        let next_is_elem = !self.needs_sep;
+        let lo = lo.saturating_sub(next_is_elem as usize).saturating_add(lo);
+        let hi = match hi {
+            Some(hi) => hi.saturating_sub(next_is_elem as usize).checked_add(hi),
+            None => None,
+        };


Would this be better?

Suggested change

let next_is_elem = !self.needs_sep;

let lo = lo.saturating_sub(next_is_elem as usize).saturating_add(lo);

let hi = match hi {

Some(hi) => hi.saturating_sub(next_is_elem as usize).checked_add(hi),

None => None,

};

let next_is_elem = !self.needs_sep as usize;

let lo = lo.saturating_sub(next_is_elem).saturating_add(lo);

let hi = match hi {

Some(hi) => hi.saturating_sub(next_is_elem).checked_add(hi),

None => None,

};

pickfire · 2020-09-12T03:37:10Z

library/core/tests/iter.rs

+#[test]
+fn test_intersperse() {
+    let xs = ["a", "", "b", "c"];
+    let v: Vec<&str> = xs.iter().map(|x| x.clone()).intersperse(", ").collect();


Suggested change

let v: Vec<&str> = xs.iter().map(|x| x.clone()).intersperse(", ").collect();

let v: Vec<_> = xs.iter().copied().intersperse(", ").collect();

pickfire · 2020-09-12T03:40:24Z

library/core/tests/iter.rs

+    assert_eq!(text, "a, , b, c".to_string());
+
+    let ys = [0, 1, 2, 3];
+    let mut it = ys[..0].iter().map(|x| *x).intersperse(1);


Suggested change

let mut it = ys[..0].iter().map(|x| *x).intersperse(1);

let mut it = ys[..0].iter().copied().intersperse(1);

// copied is the same as .map(|&x| x)

Not sure if this works but probably it does.

pickfire · 2020-09-12T03:41:44Z

library/core/tests/iter.rs

+    assert_eq!(iter.next(), Some(", "));
+    assert_eq!(iter.size_hint(), (5, Some(5)));
+
+    assert_eq!([].iter().intersperse(&()).size_hint(), (0, Some(0)));


Should we add a test for one without upper bound? I think this is sufficient but just wondering if we need it since someone could change None to return Some(T).

pickfire · 2020-09-12T03:43:55Z

library/core/tests/iter.rs

+    let mut iter = (1..3).intersperse(0);
+    iter.clone().for_each(|x| assert_eq!(Some(x), iter.next()));
+
+    let mut iter = (1..4).intersperse(0);
+    iter.clone().for_each(|x| assert_eq!(Some(x), iter.next()));


I don't understand what is the difference between these and the above test? Do we need these?

pickfire · 2020-09-12T03:47:59Z

library/core/tests/iter.rs

+#[test]
+fn test_fold_specialization_intersperse() {
+    let mut iter = (1..2).intersperse(0);
+    iter.clone().for_each(|x| assert_eq!(Some(x), iter.next()));


Should we add a test for fold and try_fold after one item is consumed?

LukasKalbertodt

It seems like most people are in favor of adding this despite the conflict with itertools. I'm also fine with it. But recently it seems like these kinds of conflicts popped up a lot, so, as others have said, a proper lang solution would be really great.

I left a few inline comments, but nothing serious.

What worries me a bit about this API is that it encourages lots of clones and makes it easy to accidentally clone a bunch. Sure, if you pay attention, you probably can avoid expensive clones by only using Copy separators. But we all know how "if you pay attention" works out long term in large projects.

However, I don't know how to improve this :/

Using Copy instead of Clone as bound seems overly restrictive. Sometimes people might actually want to clone.
Forcing I: Iterator<Item = &Separator> could work but is also kinda awkward? And might not even work as iterators can't return references to self.
Using Borrow bounds and yielding Cows (owned for items, borrowed for iterator) makes the API way too complicated.

LukasKalbertodt · 2020-09-20T08:57:46Z

library/core/src/iter/adapters/intersperse.rs

+    {
+        let mut accum = init;
+
+        // Use `peek()` first to avoid calling `next()` on an empty iterator.


That's only useful for iterators that are not fused, right? Or what do we gain by not calling next() on an empty iterator?

Also, I think this would break for the following case:

// Imagine `it` is a non fused iterator that yields: `None`, `Some('a')`, `None` it.intersperse('x').fold(String::new(), |mut s, c| { s.push(c); s })

That would result in "xa" with this implementation, but should yield "", right?

LukasKalbertodt · 2020-09-20T08:59:34Z

library/core/src/iter/adapters/intersperse.rs

+        // Use `peek()` first to avoid calling `next()` on an empty iterator.
+        if !self.needs_sep || self.iter.peek().is_some() {
+            if let Some(x) = self.iter.next() {
+                accum = f(accum, x);
+            }
+        }


I thought fold should always start with an item instead of a sep?

Why would that be? As far as I see it, that's intended behavior. The iterator is just a sequence of items and if you already got the first one, methods like fold will start with the second item (which, in this case, is a separator).

LukasKalbertodt · 2020-09-20T09:26:31Z

library/core/src/iter/adapters/intersperse.rs

+        let hi = match hi {
+            Some(hi) => hi.saturating_sub(next_is_elem as usize).checked_add(hi),
+            None => None,
+        };


No hard opinion, but this should work as well:

Suggested change

let hi = match hi {

Some(hi) => hi.saturating_sub(next_is_elem as usize).checked_add(hi),

None => None,

};

let hi = hi.and_then(|hi| hi.saturating_sub(next_is_elem as usize).checked_add(hi));

LukasKalbertodt · 2020-09-20T09:36:35Z

library/core/src/iter/adapters/intersperse.rs

+            Some(self.element.clone())
+        }
+    }
+


You have removed try_fold again, but not the tests. Any reason for that?

LukasKalbertodt · 2020-09-20T09:40:26Z

library/core/src/iter/traits/iterator.rs

+    /// let hello = ["Hello", "World"].iter().copied().intersperse(" ").collect::<String>();
+    /// assert_eq!(hello, "Hello World");


Another example where separator is yielded more than once would be nice. And preferably with slices instead of strings.

bors · 2020-10-14T05:09:52Z

☔ The latest upstream changes (presumably #77926) made this pull request unmergeable. Please resolve the merge conflicts.

Note that reviewers usually do not review pull requests until merge conflicts are resolved! Once you resolve the conflicts, you should change the labels applied by bors to indicate that your PR is ready for review. Post this as a comment to change the labels:

@rustbot modify labels: +S-waiting-on-review -S-waiting-on-author

Dylan-DPC-zz · 2020-10-30T19:14:40Z

@jonas-schievink if you can resolve the conflict, we can get this reviewed quickly :)

jonas-schievink · 2020-10-30T20:20:54Z

I probably won't have time to drive this further and address all the comments, so closing. In case anyone wants to finish this, feel free!

Dylan-DPC-zz · 2020-10-30T20:22:29Z

Thanks :)

Add `Iterator::intersperse` This is a rebase of rust-lang#75784. I'm hoping to push this past the finish line! cc `@jonas-schievink`

rust-highfive assigned Mark-Simulacrum Aug 21, 2020

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Aug 21, 2020

JulianKnodt reviewed Aug 21, 2020

View reviewed changes

library/core/src/iter/adapters/intersperse.rs Show resolved Hide resolved

library/core/src/iter/adapters/intersperse.rs Outdated Show resolved Hide resolved

tesuji reviewed Aug 22, 2020

View reviewed changes

library/core/src/iter/adapters/intersperse.rs Outdated Show resolved Hide resolved

pickfire reviewed Aug 22, 2020

View reviewed changes

library/core/src/iter/adapters/intersperse.rs Show resolved Hide resolved

pickfire reviewed Aug 22, 2020

View reviewed changes

library/core/src/iter/adapters/intersperse.rs Outdated Show resolved Hide resolved

pickfire reviewed Aug 22, 2020

View reviewed changes

library/core/src/iter/adapters/intersperse.rs Outdated Show resolved Hide resolved

pickfire reviewed Aug 22, 2020

View reviewed changes

library/core/src/iter/traits/iterator.rs Outdated Show resolved Hide resolved

pickfire reviewed Aug 22, 2020

View reviewed changes

library/core/src/iter/traits/iterator.rs Show resolved Hide resolved

pickfire reviewed Aug 23, 2020

View reviewed changes

library/core/src/iter/adapters/intersperse.rs Outdated Show resolved Hide resolved

pickfire reviewed Aug 23, 2020

View reviewed changes

ollie27 reviewed Aug 24, 2020

View reviewed changes

library/core/src/iter/adapters/intersperse.rs Outdated Show resolved Hide resolved

rust-highfive assigned LukasKalbertodt and unassigned Mark-Simulacrum Aug 29, 2020

jyn514 added T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. A-iterators Area: Iterators labels Aug 31, 2020

jswrenn reviewed Sep 5, 2020

View reviewed changes

the8472 reviewed Sep 6, 2020

View reviewed changes

library/core/src/iter/adapters/intersperse.rs Outdated Show resolved Hide resolved

pickfire reviewed Sep 6, 2020

View reviewed changes

library/core/src/iter/adapters/intersperse.rs Show resolved Hide resolved

jonas-schievink and others added 8 commits September 11, 2020 22:19

Add Iterator::intersperse

13d5e6d

Update library/core/src/iter/adapters/intersperse.rs

8a05fbc

Co-authored-by: Ivan Tham <[email protected]>

Address review comments

f470155

Implement size_hint

97039d8

Update library/core/src/iter/adapters/intersperse.rs

572b795

Co-authored-by: Oliver Middleton <[email protected]>

Rewrite and add tests, address remaining comments

00ad5c8

Remove incorrect try_fold and add more tests

d3122a8

Improve size_hint

3a069a0

pickfire reviewed Sep 12, 2020

View reviewed changes

LukasKalbertodt suggested changes Sep 20, 2020

View reviewed changes

jswrenn mentioned this pull request Sep 22, 2020

Add Iterator::join to combine Iterator and Join #75738

Closed

crlf0710 added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 8, 2020

camelid added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Oct 30, 2020

jonas-schievink closed this Oct 30, 2020

camelid mentioned this pull request Nov 27, 2020

Add Iterator::intersperse #79479

Merged

m-ou-se added a commit to m-ou-se/rust that referenced this pull request Dec 30, 2020

Rollup merge of rust-lang#79479 - camelid:intersperse, r=m-ou-se

9cf2438

Add `Iterator::intersperse` This is a rebase of rust-lang#75784. I'm hoping to push this past the finish line! cc `@jonas-schievink`

		@@ -0,0 +1,76 @@
		use super::Peekable;

		/// An iterator adapter that places a separator between all elements.

	let v: Vec<&str> = xs.iter().map(\|x\| x.clone()).intersperse(", ").collect();
	let v: Vec<_> = xs.iter().copied().intersperse(", ").collect();

	let mut it = ys[..0].iter().map(\|x\| *x).intersperse(1);
	let mut it = ys[..0].iter().copied().intersperse(1);

		/// let hello = ["Hello", "World"].iter().copied().intersperse(" ").collect::<String>();
		/// assert_eq!(hello, "Hello World");

Add Iterator::intersperse #75784

Add Iterator::intersperse #75784

Conversation

jonas-schievink commented Aug 21, 2020 • edited Loading

rust-highfive commented Aug 21, 2020

JulianKnodt left a comment • edited Loading

Choose a reason for hiding this comment

pickfire commented Aug 22, 2020

jonas-schievink commented Aug 22, 2020

Lonami commented Aug 22, 2020

pickfire commented Aug 23, 2020

pickfire Aug 23, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jonas-schievink Sep 6, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JulianKnodt commented Aug 23, 2020

pickfire commented Aug 23, 2020 • edited Loading

scottmcm commented Aug 24, 2020

Mark-Simulacrum commented Aug 29, 2020

pickfire commented Aug 30, 2020

bors commented Sep 3, 2020

jswrenn left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pickfire Sep 12, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pickfire Sep 12, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pickfire Sep 12, 2020 • edited Loading

Choose a reason for hiding this comment

pickfire Sep 12, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LukasKalbertodt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bors commented Oct 14, 2020

Dylan-DPC-zz commented Oct 30, 2020

jonas-schievink commented Oct 30, 2020

Dylan-DPC-zz commented Oct 30, 2020

jonas-schievink commented Aug 21, 2020 •

edited

Loading

JulianKnodt left a comment •

edited

Loading

pickfire Aug 23, 2020 •

edited

Loading

jonas-schievink Sep 6, 2020 •

edited

Loading

pickfire commented Aug 23, 2020 •

edited

Loading

pickfire Sep 12, 2020 •

edited

Loading

pickfire Sep 12, 2020 •

edited

Loading

pickfire Sep 12, 2020 •

edited

Loading

pickfire Sep 12, 2020 •

edited

Loading