-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add static_multiset::for_each and its OA impl #506
Conversation
c82aad6
to
f14f521
Compare
* @param callback Function to call on every element found | ||
*/ | ||
template <class ProbeKey, class Callback> | ||
__device__ void for_each(ProbeKey const& key, Callback callback) const noexcept |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
__device__ void for_each(ProbeKey const& key, Callback callback) const noexcept | |
__device__ void for_each(ProbeKey const& key, Callback&& callback) const noexcept |
Unsure if this needs to be a mutable or even universal reference instead. Let's say we define a count
functor as such:
struct count_functor {
std::size_t thread_count = 0;
// counts the number of matching elements for this thread
template <class InputIt>
__device__ void operator()(InputIt) { thread_count++; }
};
And then call
//...
auto thread_counter = count_functor{};
set.for_each(key, thread_counter);
auto const key_count = thread_counter.count;
//...
Then we want the functor to be taken as a mutable reference, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pass by value is preferred.
The above example is a good example of a bad callback, especially in a parallel context
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it best practice to pass a callback by-value? I'd have to skim some stackoverflow/cppreference pages to get familiar with the topic. With pass-by-value we lose the ability of giving the callback an internal state that can hold the result of the operation. How would we solve the above example with a callback passed by-value? Pass a pointer to thread_count
to the callback?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pass a pointer to thread_count to the callback?
Yes
I see a callable defining the operations to be performed on the output instead of being the output itself.
struct for_each_tag { | ||
} inline constexpr for_each; ///< `cuco::for_each` operator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see for_each
as an internal utility as opposed to an actual hash table operator. Need to think more on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my standpoint I would treat it as an extension to the STL API that is more suitable for the GPU. Having a "cooperative iterator" instead, which would be closer to the spirit of modern C++ has its drawbacks. For example, how do we ensure users only increment the iterator with the same CG? for_each
solves this problem by making the probing part internal. We should even be able to redefine any lookup function (find
, count
, retrieve
) that relies on probing with for_each
, giving us a proper abstraction layer for probing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On a side note I personally find this funtional approach, i.e., "for each found key do X" very appealing. Historic evidence that it is indeed useful comes from warpcore, where many downstream applications (mostly genomics stuff) implemented their custom lookup operations through for_each
functors.
* @param callback Function to call on every element found | ||
*/ | ||
template <class ProbeKey, class Callback> | ||
__device__ void for_each(cooperative_groups::thread_block_tile<cg_size> const& group, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why the unit test is failing. Seems like the logic in this function is flawed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I found the problem. #509 should fix the issue.
…indows (required for shmem bounce buffer flushing during retrieve())
/ok to test WTH all of my commits are signed... |
/ok to test |
effing bot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
closes #499