Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Iterator::join to combine Iterator and Join #75738

Closed
wants to merge 1 commit into from

Conversation

pickfire
Copy link
Contributor

@pickfire pickfire commented Aug 20, 2020

Iterator::join will be added as extention trait IteratorExt inside
alloc crate as it requries allocation while joining which is the
reason why Join avaliable in alloc but not core. It is an easier
alternative to .collect::<Vec<_>>().join(sep).

Tracking issue: #75638

CC @Amanieu @cuviper
CC previous commentators @bluss @steveklabnik @dtolnay

@rust-highfive
Copy link
Collaborator

r? @kennytm

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Aug 20, 2020

/// Iterator extension traits that requires allocation.
#[unstable(feature = "iterator_join", issue = "75638")]
pub trait IteratorExt: Iterator {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The impl seems to be missing

Copy link
Contributor Author

@pickfire pickfire Aug 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to impl IteratorExt for all items that impl Iterator? I thought this already does that? Do we need to do like impl IteratorExt for dyn Iterator or impl<T: Iterator> IteratorExt for T?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without any impl, it's just specifying the supertrait relationship as a requirement to implementors. I would recommend a blanket impl<T: Iterator + ?Sized>, which will include the dyn case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added impl<T: ExactSizeIterator + ?Sized> IteratorExt for T {}

@Lonami
Copy link
Contributor

Lonami commented Aug 21, 2020

@pickfire asked me on my thoughts on this implementation and I responded with an alternative approach that (I believe) does not allocate (playground):

enum State {
    Start,
    Iterator,
    Separator,
}

struct Join<T: Clone, I: Iterator<Item=T>> {
    iter: I,
    item: Option<T>,
    sep: T,
    state: State,
}

impl<T: Clone, I: Iterator<Item=T>> Iterator for Join<T, I> {
    type Item = T;

    fn next(&mut self) -> Option<Self::Item> {
        match self.state {
            State::Start => {
                self.state = State::Separator;
                self.iter.next()
            }
            State::Iterator => {
                let item = self.item.take();
                self.state = State::Separator;
                item
            }
            State::Separator => {
                self.item = Some(self.iter.next()?);
                self.state = State::Iterator;
                Some(self.sep.clone())
            }
        }
    }
}

fn main() {
    let iter = vec![1, 2, 3].into_iter();
    let join = Join { iter, sep: 0, item: None, state: State::Start };
    for x in join {
        dbg!(x);
    }
}

However, @pickfire responded that this is not ideal, because "join should be eager". I fail to understand why, or why this needs an allocation at all. I'd appreciate if someone could explain why it's needed (I don't have a Zulip account so I couldn't read past discussion on it).

Regardless, I pointed out that the current implementation will not work for infinite iterators, and @pickfire said it would need to be implemented for FixedSizedIterator then.

@pickfire
Copy link
Contributor Author

pickfire commented Aug 21, 2020

@Lonami

chat logs
pickfire: Should we have join on iterator so that we could do the following and maybe have a bit of optimization without creating a Vec just to join an Iterator. Instead of ["hello", "world"].iter().collect::<Vec<_>>().join(" ") one could do ["hello", "world"].iter().join(" ")?

A rough idea on how it should look like:

pub trait Iterator {
    fn join<Separator>(self, sep: Separator) -> <[Self::Item] as Join<Separator>>::Output
    where
        [Self::Item]: Join<Separator>,
    {
        // ...
    }
}

I was surprised at first that I am not able to join into String from Iterator, I thought since Iterator may have a similar API to slice it would have join, I need to search to be able to know that I first need to get a Vec<String> to only be able to join, I wonder if it would be faster to join without first creating a Vec?

Steven Fackler: This has been something I've wished for as well, but I think it'd be better to separate it from the existing slice::Join trait so it doesn't have to buffer up the components first at all

cuviper: FWIW, itertools does have this already: https://docs.rs/itertools/0.9.0/itertools/trait.Itertools.html#method.join

cuviper: format and format_with are neat too

pickfire: Does it requires a FCP to move it to core? This does seemed like something quite common, not sure if it is good to keep it on itertools. Also, I wouldn't suggest the one on itertools, I don't think we should only be looking on String, PathBuf could be the output as well.

pickfire: As for me, format and format_with does not seemed to be as useful. I don't quite sure why I would need that, I think dbg!() could do that better.

XAMPPRocky: @pickfire I'm not on the libs team but generally single method additions only require a PR.

cuviper: Note that they're defined in different crates, core::iter::Iterator vs. alloc::slice::Join

cuviper: I think coherence would prevent moving Join to core, since the impls need to be in alloc to refer to Vec and String

cuviper: I found a related issue: https://github.com/rust-lang/rust/issues/22754
(connect was later deprecated / renamed to join)

pickfire: Maybe let me try adding it to itertools first and then send a PR to libs if that works, sounds like it needs an RFC from the comment.

pickfire: I did some tests on this, it generates the same output as assembly except it requires writing fewer types.

#![feature(slice_concat_trait)]

use std::iter::FromIterator;
use std::slice::Join;

pub fn collect<T, B>(iter: impl Iterator<Item = T>) -> B
where
    B: FromIterator<T>,
{
    iter.collect()
}

pub fn join<T, Separator>(iter: impl Iterator<Item = T>, sep: Separator) -> <[T] as Join<Separator>>::Output
where
    [T]: Join<Separator>,
{
    // <[S] as std::slice::Join<&str>>
    // <[V] as std::slice::Join<&T>>
    // <[V] as std::slice::Join<&[T]>>
    Join::join(iter.collect::<Vec<T>>().as_slice(), sep)
}

#[inline(never)]
pub fn test_collect<'a>(mut iter: impl Iterator<Item = &'a str>) -> String {
    let v: Vec<_> = iter.collect();
    v.as_slice().join(" ")
}

#[inline(never)]
pub fn test_join<'a>(mut iter: impl Iterator<Item = &'a str>) -> String {
    join(iter, " ")
}

#[inline(never)]
pub fn test_manual<'a>(mut iter: impl Iterator<Item = &'a str>) -> String {
    let sep = " ";

    let mut buf = match iter.next() {
        Some(buf) => String::from(buf),
        None => return String::new(),
    };
    iter.for_each(|s| {
        buf.push_str(sep);
        buf.push_str(s);
    });
    buf
}

fn main() {
    assert_eq!(test_join(["a", "b"].iter().copied()), "a b".to_string());
    assert_eq!(test_collect(["a", "b"].iter().copied()), "a b".to_string());
    assert_eq!(test_manual(["a", "b"].iter().copied()), "a b".to_string());
}

In which case, rather than writing

let s = iter.collect::<Vec<_>>().join(" ");

One can just write the following with the same generated assembly

let s = iter.join(" ");

Current limitations are that Join API is currently too limited, it implements quite less stuff, I wish it could use AsRef so that we don't need to do conversion on the iterator level.

I wonder if we could do optimization such as not needing to build a Vec first and directly construct the output but I haven't figure out a way that it could be faster yet, probably performance could improve. CC @lzutao

However, this is different from the one provided in itertools, itertools already have join but that is only limited to String concatenation, this version is just Iterator with Join. What do you all think?

lzutao: I am not familiar with these APIs. You probably want to ping people like @Amanieu .

pickfire: @lzutao No, I was discussing the performance. I was wondering if we could directly construct the output from join without having a separate vec first to make it faster.

pickfire: @Amanieu Do I need to do an RFC for this?

lzutao: You could look at Zip trait and its module to get more inspiration.
Especially TrustedRandomAccess trait and its super trait ExactSizeIterator.

Amanieu: I don't think an RFC is needed.

pickfire: I don't understand how is ExactSizeIterator is useful here since we need to get the length of the elements in the vector plus N times the length of the separator? I wonder if we can get the total length mentioned while copying the data so we don't need to iterate two times.

pickfire: By the way, Box<[T]>'s FromIterator looks way less efficient than Vec<T> when I looked into the generated code, I think I need to look into improving that while working on this.

pickfire: @Amanieu Does it need a tracking issue?

cuviper: Box<[T]> just collects a Vec<T> and converts -- the only way that could be less efficient is in the implied shrink

cuviper: new unstable APIs always need a tracking issue, but I usually wait until I have a feeling whether it's going to be accepted at all

pickfire: @cuviper I understand, but the assembly generated is 2x more (96 -> 184). https://godbolt.org/z/c9ffTT I will be opening an issue for this.

cuviper: hmm, some panic handling for "Tried to shrink to a larger capacity"

cuviper: (which should be statically unreachable)

pickfire: Is that even possible?

cuviper: no, but I guess it isn't seeing that len<=capacity

pickfire: @cuviper https://github.com/rust-lang/rust/issues/75636

cuviper: maybe it should be guarded like if self.len() < self.capacity() { self.shrink_to_fit(); } -- so the compiler may see that it's never greater

cuviper: @pickfire still, that should all be cold, not affecting real performance, except maybe icache pressure

pickfire: icache pressure?

cuviper: instruction cache memory

cuviper: pressure -> unused instructions still taking up some space in that cache, leaving less room for useful stuff

cuviper: but that's very much something that should be measured before it is targeted for optimization

pickfire: This issue https://github.com/rust-lang/rust/issues/75638

pickfire: Wow, my first time opening a tracking issue. Please let me work on the implementation, I want to try.

cuviper: oh, usually I do the implementation before a tracking issue, otherwise this is more of a request issue

pickfire: Ah, I thought it is the opposite.

pickfire: By the way, the POC is done so I don't see any issues doing the implementation.

pickfire: It's like a TODO issue for me, I just left it there, IIRC last time I just create a pull request without a tracking issue.

cuviper: The big question I still have is how you intend to reconcile this across different crates -- core::iter::Iterator can't reference alloc::slice::Join. The POC doesn't answer this at all.

pickfire: Oh, I didn't thought about that.

pickfire: Can we even implement it in std::iter::Iterator?

cuviper: std is just a re-export

cuviper: I mentioned this before: https://rust-lang.zulipchat.com/#narrow/stream/219381-t-libs/topic/join.20on.20Iterator/near/203757438

pickfire: Ah, I forgot we cannot use Vec in core, Vec is needed to run collect before joining, do we need to do extension trait for this in alloc and reuse it in std?
pickfire:

error[E0412]: cannot find type `Vec` in this scope
    --> library/core/src/iter/traits/iterator.rs:1676:35
     |
1676 |         Join::join(self.collect::<Vec<Self::Item>>().as_slice(), sep)
     |                                   ^^^ not found in this scope

error[E0038]: the trait `iter::traits::iterator::Iterator` cannot be made into an object
    --> library/core/src/iter/traits/iterator.rs:17:30
     |
17   | fn _assert_is_object_safe(_: &dyn Iterator<Item = ()>) {}
     |                              ^^^^^^^^^^^^^^^^^^^^^^^^ the trait `iter::traits::iterator::Iterator` cannot be made into an object
...
98   | pub trait Iterator {
     |           -------- this trait cannot be made into an object...
...
1672 |     fn join<Separator>(self, sep: Separator) -> <[Self::Item] as Join<Separator>>::Output
     |        ---- ...because method `join` has generic type parameters
     |
     = help: consider moving `join` to another trait

error: aborting due to 2 previous errors

pickfire: I think we cannot put join into Iterator because the output is generic, maybe an extension trait? But I never seen any extension trait for Iterator.

lcnr: considering that we don't have collections in core, would it make sense to instead add fn intersperse to Iterator, so the example would be

["hello", "world"].iter().intersperse(" ").fold(String::new(), |s, e| { s.append_str(e); s });

lcnr: fn intersperse(self, e: Self::Item) -> ... where Self::Item: Clone and fn intersperse_with<F>(self, f: F) -> ... where F: FnMut() -> Self::Item

lcnr: https://hackage.haskell.org/package/base-4.14.0.0/docs/Data-List.html#v:intersperse

pickfire: So it shouldn't be called join but instead intersperse?

pickfire: The term sounds complicated to me.

pickfire: But doesn't that only allows the same type of item to be the separator?

lcnr: Afaik, unlike intersperse, join is eager in other languages

lcnr: so it makes sense to take the name already used in haskell. We already mostly use that their names for Iterator methods

pickfire: Well, collect + join is eager too.

lcnr: exactly, and intersperse is lazy

pickfire: Yes, then join is the correct term.

lcnr: We seem to talk about slightly different things rn, sorry

lcnr: My concern is that we can't really implement join as part of Iterator as it requires allocations, is that right?

pickfire: Ah, I mean the process is supposed to be eager, not lazy. So it shouldn't be called intersperse.

pickfire: Yes.

pickfire: Unless we changed it like collect.

lcnr: So intersperse is a way to add this to core in a clean manner imo

pickfire: But I don't see what intersperse solve.

lcnr: what does join solve :sweat_smile:

pickfire: It should allow joining with different type.
pickfire:

intersperse ',' "abcde"

That looks like the same type to me.

lcnr: ah, hmm :thinking:

pickfire: The other issue I faced is that the join I can think of have generics return.
pickfire:

fn join<Separator>(self, sep: Separator) -> <[Self::Item] as Join<Separator>>::Output

pickfire: But at the end, it probably still needs allocation.

pickfire: So I don't think it is possible to put it into core.

lcnr: wouldn't you want <Self::Item as Join<Separater>>::Output?

I am personally quite interested in adding intersperse to iterator as I needed a few times in the past and I think it can mostly replace join when dealing with iterators.

    pickfire:

    wouldn't you want <Self::Item as Join<Separater>>::Output?

Yes, I want that.

pickfire: But the tests does not want that.

pickfire: I mean the compiler.

lcnr: but yeah, don't know how I feel about adding join to Iterator. I am also not part of T-libsso :shrug:

pickfire: @lcnr Can you do intersperse with different type?

lcnr: I doubt it

pickfire: Doubt? So no? Yes?

    lcnr:

    But the tests does not want that.

why
pickfire:

error[E0038]: the trait `iter::traits::iterator::Iterator` cannot be made into an object
    --> library/core/src/iter/traits/iterator.rs:17:30
     |
17   | fn _assert_is_object_safe(_: &dyn Iterator<Item = ()>) {}
     |                              ^^^^^^^^^^^^^^^^^^^^^^^^ the trait `iter::traits::iterator::Iterator` cannot be made into an object
...
98   | pub trait Iterator {
     |           -------- this trait cannot be made into an object...
...
1672 |     fn join<Separator>(self, sep: Separator) -> <[Self::Item] as Join<Separator>>::Output
     |        ---- ...because method `join` has generic type parameters
     |
     = help: consider moving `join` to another trait

lcnr: you need Self: Sized on join in this case

lcnr: I think

pickfire: Oh, collect have that too.

pickfire: Nice, it worked.

pickfire: Only one thing left.

error[E0412]: cannot find type `Vec` in this scope
    --> library/core/src/iter/traits/iterator.rs:1677:35
     |
1677 |         Join::join(self.collect::<Vec<Self::Item>>().as_slice(), sep)
     |                                   ^^^ not found in this scope

error: aborting due to previous error

For more information about this error, try `rustc --explain E0412`.
error: could not compile `core`.

    lcnr:

    Doubt? So no? Yes?

intersperse with different types seems difficult to do cleanly

lcnr: at least I can't think of a good API rn

pickfire: Join already have those.

pickfire: That's why I am just making use of the existing stuff to make this.

pickfire: https://github.com/rust-lang/rust/pull/75738

Joshua Nelson: FWIW I think intersperse is a great idea even if it requires the same types

Joshua Nelson: s.split().intersperse (" ").collect() is a lot easier to write than turbo fish for Vec

Joshua Nelson: And more efficient too I think since you only collect once

lcnr: intersperse is already part of Itertools, it might make sense to promote it to std

lzutao: yeah, it is brilliant idea, but let's cc @bluss on that

pickfire: @Joshua Nelson The join I sent also collect once, intersperse looks useful but I see it as being very useful yet, at least not in my case.

pickfire: I only wish to not type .collect::<Vec<_>>().join(sep), just .join(sep) would be more than good.

lzutao: the problem I see with join method on Iterator is that it requires allocation and an extension Trait.
Where people use core only, they don't have a lazy iterator to loop over.

Joshua Nelson: @pickfire no, your suggestion collects twice: once to Vec and once again to join them all by a separator. So it allocates twice.

Joshua Nelson: It improves the ergnomics but not the behavior

Jonas Schievink: Why would join need an extension trait but not collect?

cuviper: @Jonas Schievink because collect just calls FromIterator, generic to any output. This join creates a Vec intermediate, so it can't be in core.

cuviper: IMO, this isn't worth it just to save a little typing. It would be more motivating if it could build dynamically to the final result, but that will probably need changes to Join.

Jonas Schievink: Isn't join(sep) just intersperse(sep).collect()?

Jonas Schievink: Ah, this one uses slice::Join

cuviper: With itertools, yes, but then you also have Itertools::join

Jonas Schievink: My take on this is that we should totally add intersperse and format from itertools to libcore, but nothing that requires allocation

pickfire: Yeah, I think changes to Join is required. @Joshua Nelson I didn't noticed that. I am looking for a way to improve it, then all the users that currently do .collect::<Vec<_>>().join() needs to allocate twice then.

pickfire: @Jonas Schievink No, Join allows different types unlike intersperse.

Jonas Schievink: .collect::<Vec<_>>().join() already allocates twice

pickfire: Well, all the users have to do this right now which is the easiest method.

Jonas Schievink: pickfire said:

    Jonas Schievink No, Join allows different types unlike intersperse.

ah, true

pickfire: @Joshua Nelson Yes, indeed. What you said is correct. I am looking for a way to both improve the ergonomics and performance. But I don't know how to know how much to allocate at first without having the first Vec? How do you know how much to allocate for the final Vec?

I think we may probably need something like Join mechanism based on collect, FromIterator and IntoIterator, it would be good to make use of the existing Join but that would require us to do two allocations, maybe we could make one for slice too that only does one allocation. Let me rethink this.

@Lonami
Copy link
Contributor

Lonami commented Aug 21, 2020

All I could draw from the conversation was the following:

lcnr: Afaik, unlike intersperse, join is eager in other languages

But this doesn't seem like a strong enough motivation to do the same. One wants to avoid writing out collect yet internally works the same. I think it's better if we offer a lazy iterator approach (the point of using iterators is that they can be lazy after all).

I had never heard of interperse, and I think limiting to strings only is definitely bad.

@cuviper
Copy link
Member

cuviper commented Aug 21, 2020

FYI, rather than dumping zulip logs, you can link the public archive:
https://zulip-archive.rust-lang.org/219381tlibs/33464joinonIterator.html

@Lonami

This comment has been minimized.

@cuviper
Copy link
Member

cuviper commented Aug 21, 2020

@Lonami the link I gave should work anonymously.

}
}

// ExactSizeIterator is used to guarantee non-infinite iteration.
Copy link
Member

@bluss bluss Aug 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-infinite iteration sounds nice if it could be checked, but ESI is pretty far away from a perfect check for that, so I'd avoid that requirement - primarily for all the false negatives and how limiting it is. collect for example does not try to ensure the iterator is finite.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I never knew collect does not ensure the iterator is finite.

Iterator::join will be added as extention trait IteratorExt inside
alloc crate as it requries allocation while joining which is the
reason why Join avaliable in alloc but not core. It is an easier
alternative to `.collect::<Vec<_>>().join(sep)`.

Tracking issue: rust-lang#75638
Copy link
Member

@LukasKalbertodt LukasKalbertodt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few thoughts on this:

  • I don't particularly like adding IteratorExt. It's an extra trait in the API that needs to be imported (as we can't easily add traits to the prelude). And for most users (using std), the split doesn't make sense. Maybe this should just be a function?

  • The Join trait is still unstable, meaning we could still change it (as long as we keep the current impls). And we can certainly implement it for new types. With that, it should be possible to avoid the collect into a Vec, right? I mean, ideally we only want the target "collection" (e.g. String) to allocate, as mentioned in the chat.

  • Iterator[Ext]::join might not be a great name. Iterator uses lots of names from the functional programming/Haskell world (last, map, fold, any, all, scan, take_while, ...). And there, join already has a pretty established meaning that is very different from what this function does.

  • I think intersperse (returning a lazy iterator) is a useful method to have. However, I see that's kind of problematic to replace the join suggested here. If the separator is a string, not a single char, then the separator would need to be cloned a bunch of times which is likely not very fast. One might come up with a solution involving Cow (and String can be collected from an iterator over Cow<str>), but that's also not that nice of an API?

/// ```
#[inline]
#[unstable(feature = "iterator_join", issue = "75638")]
#[must_use = "if you really need to exhaust the iterator, consider `.for_each(drop)` instead"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This must_use is not appropriate here and probably a copy&paste bug, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True

@pickfire
Copy link
Contributor Author

The Join trait is still unstable, meaning we could still change it (as long as we keep the current impls). And we can certainly implement it for new types. With that, it should be possible to avoid the collect into a Vec, right? I mean, ideally we only want the target "collection" (e.g. String) to allocate, as mentioned in the chat.

I agree, I am very positive that Join could be changed like I mentioned, the performance here is not ideal requiring allocation twice. What I have in mind was making Join an Iterator based instead or maybe splitting it such that the one on iterator does not need separate allocation. Not using Join allows us to not allocate and keep this in core without additional trait.

I think intersperse (returning a lazy iterator) is a useful method to have. However, I see that's kind of problematic to replace the join suggested here. If the separator is a string, not a single char, then the separator would need to be cloned a bunch of times which is likely not very fast. One might come up with a solution involving Cow (and String can be collected from an iterator over Cow), but that's also not that nice of an API?

I am also planning to add and intercalate in the meantime. intersperse is in the processed of being added too by someone else. I could think that both intersperse and intercalate could replace join like I proposed here, but performance wise maybe not so good. Think, we could have iterator of &str but we could join by char or even u8 (unsafe or just perform one unicode check) which is not possible by those two.

I am wondering if we could leverage FromIterator instead which we could sort of tweak the code from FromIterator a bit to make this but I think that would require a lot of duplicate effort from users to implement both FromIterator and this new trait if we split it.

Don't merge this PR, I am thinking of improving it by the comments by so far I haven't have thoughts on how to leverage FromIterator to do this yet.

@jyn514 jyn514 added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Aug 28, 2020
@pickfire
Copy link
Contributor Author

pickfire commented Sep 5, 2020

@jswrenn Can you please help me to take a look at this?

@JohnCSimon JohnCSimon added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 21, 2020
@JohnCSimon
Copy link
Member

@pickfire moved this back to s-waiting-on-author

@Dylan-DPC-zz
Copy link

@pickfire any updates on this?

@jswrenn
Copy link
Member

jswrenn commented Sep 22, 2020

@pickfire, I didn't see your earlier ping—I can give this a review on Wednesday!

@pickfire
Copy link
Contributor Author

pickfire commented Sep 22, 2020

@JohnCSimon @Dylan-DPC Thanks for the ping but this is still WIP and experimental, maybe I should switch it to draft. Also I want to get more input on this.

@jswrenn Sure, thanks a lot. I have some ideas in mind which requires more steps in order to not allocate twice, it requires changing the unstable Join IIRC.

@pickfire pickfire marked this pull request as draft September 22, 2020 06:34
Copy link
Member

@jswrenn jswrenn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reservations. The proposed method is a worthy ergonomic addition to Rust's standard libraries, but the consequences are substantial. In brief:

  1. The proposed method conflicts with an Itertools method of the same name.
  2. The proposed IteratorExt trait opens a pandora's box of future conflicts.


/// Iterator extension traits that requires allocation.
#[unstable(feature = "iterator_join", issue = "75638")]
pub trait IteratorExt: Iterator {
Copy link
Member

@jswrenn jswrenn Sep 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pub trait IteratorExt: Iterator {
pub trait IteratorAlloc: Iterator {

I'd caution against calling this trait IteratorExt. The name isn't wrong—this is an extension trait of Iterator for contexts where alloc is available—but it's not the only reasonable extension trait of Iterator that could exist: what about an extension trait for Iterator that's usable in contexts where std is also available?

I'm really worried about the floodgates of future itertools breakage that this trait opens up. It's bad enough that the Itertools readme has to beg people not to contribute methods that could reside on Iterator, but at least we can welcome methods that involve allocation:

However, if your feature involves heap allocation, such as storing elements in a Vec, then it can't be accepted into libcore, and you should propose it for itertools directly instead.

An IteratorExt trait in alloc blows this advice up. If such a trait is introduced, I'm not sure there's any situation in which we can accept useful PRs without risking future conflict with the standard libraries. Itertools is less and less an ergonomic multitool than it is a ticking timebomb that will break your builds if you have the audacity to rustup. Can we please fix this problem?

#[inline]
#[unstable(feature = "iterator_join", issue = "75638")]
#[must_use = "if you really need to exhaust the iterator, consider `.for_each(drop)` instead"]
fn join<Separator>(self, sep: Separator) -> <[Self::Item] as Join<Separator>>::Output
Copy link
Member

@jswrenn jswrenn Sep 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sigh. Here we go again... :-(

Alternatively:

Suggested change
fn join<Separator>(self, sep: Separator) -> <[Self::Item] as Join<Separator>>::Output
fn smoosh<Separator>(self, sep: Separator) -> <[Self::Item] as Join<Separator>>::Output

😉

bunnies

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recall that Itertools::intersperse also exists, how was this issue solved?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh. It's not solved. Stabilizing Iterator::intersperse will cause breakages for Itertools users. In that case, the fix will be as simple as upgrading one's itertools dependency to a new version without Itertools::intersperse, since the signatures and behavior of Itertools::intersperse and Iterator::intersperse is virtually identical. That's not the case with join, unfortunately.

Copy link
Contributor Author

@pickfire pickfire Sep 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we do a breaking change in itertools to test out this join? Previously the team did suggest to make a change in itertools but the join is already there and is quite limited, this is similar but there is more choice on the type of things joinable.

Comment on lines +31 to +38
fn join<Separator>(self, sep: Separator) -> <[Self::Item] as Join<Separator>>::Output
where
[Self::Item]: Join<Separator>,
Self: Sized,
{
Join::join(self.collect::<Vec<Self::Item>>().as_slice(), sep)
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we're going to get Iterator::intersperse, so why not just:

Suggested change
fn join<Separator>(self, sep: Separator) -> <[Self::Item] as Join<Separator>>::Output
where
[Self::Item]: Join<Separator>,
Self: Sized,
{
Join::join(self.collect::<Vec<Self::Item>>().as_slice(), sep)
}
}
fn join<B>(self, sep: Self::Item) -> B
where
Self: Sized,
Self::Item: Clone,
B: FromIterator<Self::Item>,
{
self.intersperse(sep).collect()
}

This method can be defined on Iterator directly!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the reason why I don't want intersperse is that intersperse does not support joining with different types, it's the reason for this why I opened up this pull request, otherwise I will use intersperse instead.

@pickfire
Copy link
Contributor Author

pickfire commented Oct 9, 2020

I will be closing this as for now until I figure out how, maybe I will give it some thoughts this weekend.

@pickfire pickfire deleted the iterator_join branch June 2, 2022 02:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.