Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add filter_in_place to Vec and VecDeque #1353

Closed
wants to merge 2 commits into from

Conversation

Amanieu
Copy link
Member

@Amanieu Amanieu commented Nov 6, 2015

@Gankra Gankra self-assigned this Nov 8, 2015
@Gankra Gankra added the T-libs-api Relevant to the library API team, which will review and decide on the RFC. label Nov 8, 2015
@Gankra
Copy link
Contributor

Gankra commented Nov 8, 2015

The RFC alludes to this, but not everyone may be familiar with the details, so I'd just like to note that the "mutate and then remove" pattern often implies that you're using an array not as an ordered data store, but rather as simply an efficient container with no particular ordering constraints. If this is the case, the following may serve one's usecase:

// Pretend these are more complicated Enemy structs with an `hp` field
let mut enemies = vec![1, 3, 2, 1, 4];

// Iterate over the array backwards *by index*
for i in (0 .. enemies.len()).rev() {

    // Process the enemy, determine if it should be removed
    let should_remove = {
        // Everyone take some damage! Remove the dead!
        let enemy = &mut enemies[i];
        *enemy -= 1;
        *enemy == 0
    }

    if should_remove {
        // Swap this enemy with the end of the array, and then
        // pop it off. This "scrambles" the array, but that's
        // ok because we don't care about order. Also, the only
        // elements that are scrambled are the ones we've already
        // seen, so this won't cause us to accidentally skip or
        // reprocess an enemy. Further, the fact that the `len` of
        // the array is decreased by this op doesn't matter to us,
        // because we're about to go to a smaller index.
        enemies.swap_remove(i);
    }
}

In principle, retain_mut can probably be more efficient, I think. That's not to say this is particularly inefficient, though. I also wouldn't expect tons of people to know this trick -- I only know it because this is a standard gamedev problem. It's also relying on the fact that we can proxy array iteration with indices, so it doesn't work on e.g. BTree or HashMap, which presumably also want retain.

On that note, would BTree and HashMap provide retain and retain_mut, or would they "learn from Vec's mistake" and only provide retain_mut (possibly just called retain)?

Finally, there's a moonshot alternative here: impl<F> Fn(.., &mut, ..) for F where F: Fn(.., &, ..), which would allow us to upgrade this interface (and ones like it) without any breakage. This would have to be an exceptionally magical implementation, and may cause coherence, inference, and resolution to all melt into a pile of poop. That said, as far as I can tell it would be sound.

@Amanieu
Copy link
Member Author

Amanieu commented Nov 8, 2015

For my use case I actually do rely on the fact that retain preserves the order of elements that are not deleted. The documentation for retain specifies that ordering is maintained, so we should not break that guarantee.

As for adding retain/retain_mut to other containers, I think it should match the API for Vec. However if plain retain is going to be deprecated then only retain_mut should be provided.

@alexcrichton alexcrichton added the final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. label Jan 29, 2016
@alexcrichton
Copy link
Member

🔔 This RFC is now entering its week-long final comment period 🔔

@alexcrichton
Copy link
Member

My own personal feelings on this RFC is that it seems like a somewhat niche use case that we may not want to pursue adding to the standard library at this time. I haven't personally come across this use case much, but I would certainly believe it exists though!

vec.retain_mut(|&mut x| {
x -= 1;
x != 0
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example won't work as described. The &mut x pattern will copy x out of the vector and then mutate a stack-local x (which might not even compile because I don't think this x binding is mutable). The example should be written thusly:

vec.retain_mut(|x| {
    *x -= 1;
    *x != 0
});

@tbu-
Copy link
Contributor

tbu- commented Feb 2, 2016

It seems like a strictly-better retain.

@Ogeon
Copy link

Ogeon commented Feb 2, 2016

I wrote a similar function for a program where I needed a lot of in-place mapping and filtering, but I wanted to move items to an other vector instead of discarding them. The program required this to be done repeatedly, as fast as possible, so partition or filter_map weren't really great options. What I ended up with was a function with this signature:

fn retain_map<F>(&mut self, mut f: F) where F: FnMut(T) -> Option<T>

...where the closure takes ownership of each item and returns it if it should stay in the vector, similar to filter_map. You could say that it's the most permissive variant of retain, so I thought it would be interesting for the discussion.

@tbu-
Copy link
Contributor

tbu- commented Feb 2, 2016

That looks like it might change performance, because the items are moved around.

@Ogeon
Copy link

Ogeon commented Feb 2, 2016

Yes, I don't think it should be a replacement for retain and retain_mut. I suppose it would have worse performance when working with large items.

@frankmcsherry
Copy link

Just to toss it out there, here is the example I run into a lot: I have a Vec<Peekable<MyIter>>, typically as part of an ordered merge, and I really want to thin out empty iterators. So, I'd love to do

my_vec.retain(|x| x.peek().is_some())

except peek takes a &mut self because. It isn't the end of the world, and sometimes what Gankro describes could work. My understanding is that retain does some similar swapping to stay exception-safe, so it might just be an ergonomic thing.

It would be nice if Ogeon's suggestion actually optimized out all the moves. I've definitely wanted a retain that returns ownership of dropped items (like a general drain perhaps).

Edit: sorry, of course it moves, because retain moves things. Nevermind. >.<

@stepancheg
Copy link

My 2 cents.

retain_mut is niche, and does not cover all use cases:

  • user may want to navigate two sorted vecs simultaneously and for example remove equal elements of them
  • user may want to take removed elements and do something with them

retain_mut does not allow either use case. retain_map proposed above does not allow first use case.

More generic interface could be iterator-like:

impl<T> Vec<V> {
    fn retain_iter(&mut self) -> RetainIter<T> { ... }
}

struct RetainIter<T> {
    vec: *mut Vec<T>,
    read_pos: usize,
    write_pos: usize,
}

impl<T> RetainIter<T> {
    // like in `.iter()`
    fn next(&mut self) -> Option<&mut T> { ... }

    // like in `.into_iter()`
    fn next_take(&mut self) -> Option<T> { ... }

    // like `.next()` but does not advance
    fn lookahead(&mut self) -> Option<&mut T> { ... }

    // true iff following call to `.next()` returns `Some`
    fn has_next(&self) -> bool { self.lookahead().is_some() }
}

impl<T> Drop for RetainIter<T> {
    fn drop(&mut self) {
        if self.read != self.write {
            move remaining elements to restore vector consistency
        }
    }
}

So function that removes equal elements may look like this:

// sorted vectors of something
let mut v1: Vec<T> = ...;
let mut v2: Vec<T> = ... ;

let mut dups = Vec::new(); // put equal elements here

let mut i1 = v1.retain_iter();
let mut i2 = v2.retain_iter();

loop {
    match (i1.lookahead(), i2.lookahead()) {
        (Some(n1), Some(n2)) => {
            if n1 == n2 {
                dups.push(i1.take_next());
                i2.take_next();
            } else if n1 < n2 {
                i1.next();
            } else if n2 < n1 {
                i2.next();
            } else {
                panic!();
            }
        }
        (Some(..), None) => i1.next(),
        (None, Some(..)) => i2.next(),
        (None, None) => break,
    }
}

retain_iter is not as convenient as retain_mut, but it is OK, because it won't be used frequently, and retain_mut can be implemented in client code over retain_iter. And retain_iter is much more flexible and equally performant as retain_mut.

@stepancheg
Copy link

Actually, RetainIter can be even further generified.

impl<V> Vec<T> {
    fn view_with_hole(&mut self) -> VecViewWithHole<T> { ... }
}

struct<T, 'a> VecViewWithHole<T, 'a> {
    vec: *mut Vec<T>, // vec data is uninitialized before `begin` and after `end`
    begin: usize, // begin of hole
    end: usize, // end of hole
}

impl<T, 'a> VecViewWithHole<T, 'a> {
    // data before and after hole
    fn slices(&mut self) -> (&mut [T], &mut [T]) { ... }

    fn begin(&self) -> usize { self.begin }
    fn end(&self) -> usize { self.end }

    // move the hole to absolute position by moving elements
    // around the hole backwards or forwards
    fn position(&mut self, index: usize) { ... }

    // remove element from vec at index `begin - 1` and decrement `begin`
    fn remove_before_hole(&mut self) -> T { ... }

    // remove element from vec at index `end` and increment `end`
    fn remove_after_hole(&mut self) -> T { ... }

    // remove the last element of vec
    fn remove_last(&mut self) -> T { ... }
}

It is a minimal interface which can be used to implement retain, retain_mut, retain_map, retain_iter, and other things like drain with little or no overhead without unsafe code.

I'm not sure it is over-engineering, but it least it is simple and generic.

@tbu-
Copy link
Contributor

tbu- commented Feb 3, 2016

@stepancheg Yet, retain_mut covers all use cases of retain and isn't more complex. I feel like adding retain_iter should be part of a different RFC.

@alexcrichton
Copy link
Member

The libs team discussed this during triage today and due to the recent discussion that has happened we're going to re-up the final comment period for another week.

@aturon also brought up that we may be able to just find a new name for the retain method and deprecate this in favor of that. Arguably this should have been the signature of the method as-is today.

@aturon
Copy link
Member

aturon commented Feb 4, 2016

It might also be worth trying to change the signature directly and doing a crater run, though I suspect there's non-trivial breakage.

@Amanieu
Copy link
Member Author

Amanieu commented Feb 4, 2016

As I mentioned in the RFC, this will break any closure that uses |&x| instead of |x|, which is a very common pattern. retain is even used with such closures in quite a few places in rustc.

@birkenfeld
Copy link

Just had another use case for retain_mut today (in a debugger, iterating over a list of conditions that might cause a break at the current instruction, where some conditions count down and expire after they trigger once).

@gankro's version with iterating over indices and using swap_remove works, but feels somehow less abstract than it could be. So my vote would be to either add retain_mut or change retain, if the breakage is deemed bearable.

@bluss
Copy link
Member

bluss commented Feb 5, 2016

You can implement Vec::retain_mut in custom code fwiw (it requires no adaption, just copy the retain code and it works in stable rust).

@alexcrichton
Copy link
Member

The libs team discussed this RFC during triage yesterday and the conclusion was that we're going to move this out of FCP for now, there's not quite a sense of consensus just yet. There's a few alternatives we were discussion which seemed like the front-runners for what can happen here.

  1. We could implement the RFC as-is, with a retain_mut method and leaving retain stabilized.
  2. We could add a new method with the same signature here but deprecate retain as the &mut version is strictly more general. @aturon was proposing fn keep(&mut self, ...)
  3. @stepancheg's suggestion seems quite enticing by leveraging iterators as much as possible

The downside of (1) is that it's an extra method, with (2) we're breaking everyone's muscle memory they have today, and (3) is unfortunately a little vague still on the details as to how we may wish to implement it. For pursuing (3) it's probably best to prototype on crates.io first so we have a concrete implementation to talk about and look at.

Overall it didn't seem like there was a great sense of urgency in landing this RFC to get this functionality somehow, so we thought that moving this out of FCP for some more discussion was the best course of action for now.

@alexcrichton alexcrichton removed the final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. label Feb 18, 2016
@aturon aturon assigned aturon and unassigned Gankra Mar 2, 2016
@alexcrichton
Copy link
Member

@Amanieu do you have thoughts on my previous comment about tweaking this RFC? The libs team was discussing this as a candidate for FCP yesterday but it unfortunately hasn't moved much since that last comment :(

@Amanieu
Copy link
Member Author

Amanieu commented May 24, 2016

Well I don't have any particular preference for any of them, in the end they all bring in the same functionality. However I think adding a new iterator type just for this adds too much complexity for what is a rather niche use case.

@alexcrichton
Copy link
Member

I personally feel the same way as well, what do you think of the keep proposal?

@Amanieu
Copy link
Member Author

Amanieu commented May 26, 2016

I feel that filter is the right name for it, but that's already used by Iterator. How about filter_in_place?

@Amanieu Amanieu changed the title Add retain_mut to Vec and VecDeque Add filter_in_place and filter_in_place_unordered to Vec and VecDeque May 27, 2016
@Amanieu Amanieu changed the title Add filter_in_place and filter_in_place_unordered to Vec and VecDeque Add filter_in_place to Vec and VecDeque May 27, 2016
@Amanieu
Copy link
Member Author

Amanieu commented May 27, 2016

I changed the RFC to add 2 new functions: filter_in_place and filter_in_place_unordered. I also changed it to deprecate retain.

@bluss
Copy link
Member

bluss commented May 27, 2016

@alexcrichton @Amanieu Deprecating a perfectly fine method just because it is superseded is not something I agree with. Deprecate stable API when it is actively harmful or confusing to users.

@nagisa
Copy link
Member

nagisa commented May 27, 2016

To me both retain and this RFC fall short. Namely, it is not unusual for index to have some semantic value (consider, Vec<BasicBlockData>, where index of BasicBlockData in the Vec is also its ID of some sort). Neither of these two allow inspecting the index and make decisions based on that, so one would end up rolling their own filtering in this niche case again.

What I think is that retain here serves majority use-case where only the inspection of the value is necessary to decide about its retainment, whereas what I described above, as well as the use-case described in the RFC are more niche and do not have wide-enough application and ought to be implemented by people manually. Otherwise, if we’re considering accepting this, I feel like the retain_mut + retain option is the best.

Yet another thing that wasn’t voiced here, is that retain also acts as documentation to the reader that the elements won’t be mutated in the process of retaining them.

@bluss
Copy link
Member

bluss commented May 27, 2016

An advanced version of drain that uses a retain like filter would be interesting.

@shepmaster
Copy link
Member

From a comment by wthrowe:

Given that F: Fn(.., &T, ..) can be called anywhere that F: Fn(.., &mut T, ..) can, I wonder if it would be possible to make them coerce?

That was actually my first thought when I changed retain to pass a &mut T to the closure and saw the error. As a user, it appears straight-forward to rewrite the closure to add a mut and continue to preserve the same behavior. What are the downsides to making this change at the compiler level?

I'd like to see that section of the Alternatives expanded. If we can make that change, then we can "easily" change retain.

@krdln
Copy link
Contributor

krdln commented Jun 19, 2016

@nagisa

Neither of these two allow inspecting the index and make decisions based on that, so one would end up rolling their own filtering in this niche case again.

been recently bitten by the fact that the index was not provided by the retain

So for me, adding a retain_indexed method has a higher priority than a retain_mut. Such a method can also serve as workaround for retain_iter (by using a temporary Vec<bool>) or retain_zip. To avoid adding exponential number of methods, I'd propose to add just:

fn retain_indexed_mut<F>(&mut self, f: F)
where F: FnMut(usize, &mut T) -> bool.

It's currently possible to do it manually – by remembering the index outside the closure, but that requires assumption that retain checks elements sequentially, which is not stated in docs.

Speaking of the "two loops" workaround for lack of retain_mut, it does not have to be actually slower, because the mutating loop can be vectorized, but when merged with retain logic this will be a really hard job for compiler. That's of course not an argument against this method, but rather just an observation.

@Ogeon The retain_map could be usually emulated with mem::replace(x, Default::default()). I think that the optimal implementation, which won't require such replacing and won't leak any memory in case of sudden panic would be significantly more complex than the retain's, so I think it's not a priority to add it to std.

@Ogeon
Copy link

Ogeon commented Jun 19, 2016

The retain_map could be usually emulated with mem::replace(x, Default::default()).

...assuming that the type implements Default.

Anyway, I didn't say that it was a better alternative, than anything else. Just another one.

@alexcrichton
Copy link
Member

@Amanieu this RFC seems to have unfortunately stalled, and the @rust-lang/libs team's previous conclusions seem to not be reflected in the most recent update? Is there a strong reason for not taking any of those routes and going for these two functions instead?

@Amanieu
Copy link
Member Author

Amanieu commented Aug 26, 2016

My personal opinion is to just close this RFC since it seems to be too controversial.

@brson
Copy link
Contributor

brson commented Aug 29, 2016

Since there are many different counter-proposals here, no consensus, and no movement, closing per @Amanieu. Desirable feature, but back to the drawing board.

@brson brson closed this Aug 29, 2016
@ExpHP ExpHP mentioned this pull request Sep 22, 2017
@avl
Copy link

avl commented Aug 20, 2019

For what it's worth, I was really surprised today that there's no retain_mut. Lots of other methods on Vec have 'mut'-version, but not retain. Seems like a hole. Not a biggie, since it's obviously easy to implement it yourself. But still! Wish this hadn't stalled! :-)

To others coming here after searching for 'retain_mut': There seems to be a nightly function "drain_filter" which can be used, as well as the retain_mut crate with a trait implemented for Vec which adds a retain_mut to Vec. So lots of options and no need to get stuck! :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-libs-api Relevant to the library API team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.