-
Notifications
You must be signed in to change notification settings - Fork 506
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prefix scans #393
Comments
Is that API on any I'd also like to hear how you would parallelize this. It seems inherently sequential, although we had some ideas for |
Hey @cuviper, Looking at the Rayon API, I think it would good to make it as generic as possible; so ideally all iterators and using The algorithm in a pseudo-vector-algebra notation:
|
Is this true? It seems to me that you have to compute the final result for one group before you know what offset to add to the next. If so, you're back to the sequential algorithm. For example, in this step:
How do you know to add |
Computing the offset/delta is sequential but the actual vector/scalar addition is parallelized. I can re-word it if it's not so clear. |
That sounds pretty reasonable.
Whether this belongs in rayon proper or its own crate is unclear. |
I think a generic |
We are two years down the road, but as I am also facing a situation in which a (parallel) scan operation is required, I'd love to contribute with a PR for this. @cuviper would that be acceptable? |
Sure, you're welcome to work on this. Don't forget #329 if you plan to make this more general than just sums and products. |
👍 All right! An question about the implementation suggestion you mentioned above: If I understand the source code for believe that the Are there ways to let the threads communicate? (e.g. using a crossbeam channel?)
The things I am unsure about here are thus:
|
I'm not able to dig into this right now, but...
You'd have the potential for deadlocks like #592 -- thread stealing could cause your receiver to be deeper on the same stack as the sender that would want to fill it.
We don't have runtime hooks into external blocking things like mutexes or channels. Even if we did, you'd have to actually suspend the whole stack frame to "store" it for later, or else switch to a new stack to execute stolen tasks, so that stored task can be resumed independently. |
Thank you for the information! That means that using channels is probably not the right way to go.
After digging a bit deeper, it seems that I was incorrect:
Does |
Another interesting note is that implementing something akin to the example code @cuviper gave above, we lose the indexability of our iterator, since I will probably start writing actual code and submitting a WIP PR next week (time permitting). |
A different way to implement this would be to use decoupled look-back: Merril et al., Single-pass Parallel Prefix Scan with Decoupled Look-back, 2016. EDIT: I needed a couple of prefix sums in the past last weeks, and ended up doing them in a more naive way, but I'll try to find the time to dig more into this. They are quite useful, I needed them in the context of implementing parallel algorithms for maximum subsequence (e.g. see Perumalla et al., Parallel algorithms for maximum subsequence and maximum subarray, 1995) where the interesting building blocks are parallel prefix and postfix sums. |
I heard from someone who wanted to use my above PR in their project, so I've released it as a separate crate: https://crates.io/crates/rayon-scan |
I have a use-case to do prefix scans to gather cumulative totals/products of various quantitative measures. This is how I envisage how to do prefix scans from an API perspective:
Would you like a PR for this or should I implement it as a separate crate?
The text was updated successfully, but these errors were encountered: