Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add static_multiset::for_each and its OA impl #506

Merged
merged 14 commits into from
Jun 26, 2024

Conversation

sleeepyjack
Copy link
Collaborator

closes #499

@sleeepyjack sleeepyjack added type: feature request New feature request P1: Should have Necessary but not critical topic: static_multiset Issue related to the static_multiset labels Jun 15, 2024
@sleeepyjack sleeepyjack added this to the static_multiset milestone Jun 15, 2024
@sleeepyjack sleeepyjack self-assigned this Jun 15, 2024
@sleeepyjack sleeepyjack requested a review from PointKernel as a code owner June 15, 2024 00:12
@NVIDIA NVIDIA deleted a comment from copy-pr-bot bot Jun 15, 2024
* @param callback Function to call on every element found
*/
template <class ProbeKey, class Callback>
__device__ void for_each(ProbeKey const& key, Callback callback) const noexcept
Copy link
Collaborator Author

@sleeepyjack sleeepyjack Jun 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
__device__ void for_each(ProbeKey const& key, Callback callback) const noexcept
__device__ void for_each(ProbeKey const& key, Callback&& callback) const noexcept

Unsure if this needs to be a mutable or even universal reference instead. Let's say we define a count functor as such:

struct count_functor {
  std::size_t thread_count = 0;

  // counts the number of matching elements for this thread
  template <class InputIt>
  __device__ void operator()(InputIt) { thread_count++; }
};

And then call

//...
auto thread_counter = count_functor{};
set.for_each(key, thread_counter);
auto const key_count = thread_counter.count;
//...

Then we want the functor to be taken as a mutable reference, right?

Copy link
Member

@PointKernel PointKernel Jun 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pass by value is preferred.

The above example is a good example of a bad callback, especially in a parallel context

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it best practice to pass a callback by-value? I'd have to skim some stackoverflow/cppreference pages to get familiar with the topic. With pass-by-value we lose the ability of giving the callback an internal state that can hold the result of the operation. How would we solve the above example with a callback passed by-value? Pass a pointer to thread_count to the callback?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pass a pointer to thread_count to the callback?

Yes

I see a callable defining the operations to be performed on the output instead of being the output itself.

Comment on lines +68 to +69
struct for_each_tag {
} inline constexpr for_each; ///< `cuco::for_each` operator
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see for_each as an internal utility as opposed to an actual hash table operator. Need to think more on this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my standpoint I would treat it as an extension to the STL API that is more suitable for the GPU. Having a "cooperative iterator" instead, which would be closer to the spirit of modern C++ has its drawbacks. For example, how do we ensure users only increment the iterator with the same CG? for_each solves this problem by making the probing part internal. We should even be able to redefine any lookup function (find, count, retrieve) that relies on probing with for_each, giving us a proper abstraction layer for probing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a side note I personally find this funtional approach, i.e., "for each found key do X" very appealing. Historic evidence that it is indeed useful comes from warpcore, where many downstream applications (mostly genomics stuff) implemented their custom lookup operations through for_each functors.

* @param callback Function to call on every element found
*/
template <class ProbeKey, class Callback>
__device__ void for_each(cooperative_groups::thread_block_tile<cg_size> const& group,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why the unit test is failing. Seems like the logic in this function is flawed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I found the problem. #509 should fix the issue.

Copy link

copy-pr-bot bot commented Jun 25, 2024

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@sleeepyjack
Copy link
Collaborator Author

/ok to test

WTH all of my commits are signed...

@sleeepyjack
Copy link
Collaborator Author

/ok to test

@sleeepyjack
Copy link
Collaborator Author

effing bot

@sleeepyjack sleeepyjack changed the base branch from feature/static_multiset to dev June 25, 2024 19:39
@sleeepyjack sleeepyjack requested a review from PointKernel June 25, 2024 23:47
@sleeepyjack sleeepyjack added the Needs Review Awaiting reviews before merging label Jun 25, 2024
Copy link
Member

@PointKernel PointKernel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sleeepyjack sleeepyjack merged commit a547e8f into NVIDIA:dev Jun 26, 2024
15 checks passed
@sleeepyjack sleeepyjack deleted the for-each-new branch June 26, 2024 01:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Review Awaiting reviews before merging P1: Should have Necessary but not critical topic: static_multiset Issue related to the static_multiset type: feature request New feature request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ENHANCEMENT]: Add for_each device API
2 participants