-
Notifications
You must be signed in to change notification settings - Fork 506
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question: Concatenate list of indexed parallel iterators? #888
Conversation
Unindexed can just use |
Yes, this is what I am currently doing and performance is fine (the resulting splitting between archetypes seems a sensible choice for an ECS) but I am loosing a lot of API surface defined only for |
Part of the challenge is the generic
... but not with producers, rather using the low-level type-specific details to make the iterator. BTW, I would not implement both |
Yes, it seems a generic adaptor is not possible as after all some insidious parallel iterator could do something like fn with_producer<CB>(self, callback: CB) -> CB::Output
where
CB: ProducerCallback<Self::Item>,
{
if thread_rng().gen_bool(0.9) {
callback.callback(MyUsualProducer::new(self))
} else {
callback.callback(JustToMessAroundABit::New(self))
}
} I will try to implement this directly on the "set of matching archetypes" I start from, especially since I can clone/copy a shared reference to an archetype and do not need to synchronize anything. Thank you for taking the time to look into this!
Good point! I was admittedly lazy there. |
If you're curious about why the There's a supposition that associated type constructors (aka generic associated types = GATs) might let us handle |
As presented here, I think there should be no recursion. I am willing to pay collecting producers into newly allocated vectors though. However, I do not think GAT alone would help with this approach as the lifetime associated with the producer is currently generative, i.e. it is created inside the closure and cannot be named outside to e.g. store this outside of the scope created by the callback as I am trying to do here. I think this would be possible if the signature were something like fn with_producer<'iter, CB, R>(&'iter mut self, callback: CB) -> R where CB: FnOnce(Self::Producer<'iter>) -> R; to connect the producer lifetime with the lifetime of the reference to inner iterator. Making it a mutable reference would allow moving things out of the iterator at the price of having to perform the option dance or resort to |
You would still have to nest all |
I think I can choose |
Well I'm hoping for a normal closure instead, but yes I suppose you could push to a captured I'm going to experiment with this -- first trying to make it a GAT closure, then perhaps |
I do wonder whether there exist any "insidious" implementations as considered above today which do pass completely unrelated producer types as e.g. optimizations. 🤔 |
Oh, yes, Lines 51 to 54 in 2991c04
But I guess it could use |
I think this would also require to use trivial callbacks to stash away the inner producer as here? Is the callback structure really necessary if the lifetime would be bound to the reference to the inner iterator instead of being generative? Wouldn't fn as_producer<'iter>(&'iter mut self) -> Self::Producer<'iter>; work as well in that case? |
Maybe we don't need a callback in that case, since it could trivially dissolve into a captured producer anyway. Another thing I realized either way is that an external |
It's also weird that a |
I think this is the same issue as for |
It would not be possible to move the producer around otherwise, so that appears consistent to me. |
I am sorry if this is getting out of hand, but since we are here: Is the |
Yeah, sort of, but the compiler forbids explicit calls to
I meant with that state in the iterator type, not the producer, but still. A concrete example is: rayon/src/collections/binary_heap.rs Lines 103 to 110 in 2991c04
That's for |
I am also trying to solve this problem. Conceptually it seems like it should be possible to construct an In my specific use case I am trying to zip the elements of range maps (where keys are |
I have quite a naive (to me) implementation that uses
It compiles, but you can't use it as it hits the recursion limit while instantiating the big nest of |
Here's another implementation which compiles and runs, but my tests show that it's dropping some elements (summing a 1000x1000 vector of 1's comes out to 736605 instead of 1_000_000) - so I think my splitting is off by one in at least one place - but maybe this is a workable solution?
|
I realize now that impl is pretty broken, I didn't take into account the lefts and rights. I have another impl which passes tests, but I'm fairly certain it's pretty whack. I'm struggling a bit to understand the relationship between
|
Doing this would be a lot easier with #513, because we could easily split the iterator that's on the index. |
This is admittedly more of a question than a pull request but I thought that it might be easiest to answer this with the code at hand.
I have a situation in which I basically want to concatenate a
into a single indexed parallel iterator with the same semantics as if I would
.into_iter().flatten()
the vector.I tried to implement this by looking at the existing
Chain
adapter which however handles a fixed number of inner iterators with possibly heterogeneous types instead of a unknown number of iterators with homogeneous types.I am currently stuck at trying to collect the producers created by the inner iterators into a vector. This does not seem to work as it seems possible that each invocation of
will call the given callback with a different producer type.
Leaving aside the overhead, boxing does not seem possible due to
Producer
not being object safe. I did consider defining a simplerDynProducer
trait that would be, but I think this would end withwhich feels prohibitively expensive.
I also considered building a producer out of a list of iterators also recording a list of split positions for each one and only turn this into producers and split those when e.g.
Producer::into_iter
is actually called. But that does seem to imply at least sharing (and hence synchronizing) the original list between tasks if a split happens to saddle the boundary between two of original iterators. Also, I am not sure if this would avoid the issue of different producer types eventually.Does anybody know whether this is possible at all and if so how?