-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactoring ThreadReduce
#3441
base: main
Are you sure you want to change the base?
Refactoring ThreadReduce
#3441
Conversation
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
/ok to test |
🟨 CI finished in 1h 40m: Pass: 89%/78 | Total: 2d 02h | Avg: 38m 45s | Max: 1h 13m | Hits: 202%/12760
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
Thrust | |
CUDA Experimental | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
CUDA Experimental | |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 78)
# | Runner |
---|---|
53 | linux-amd64-cpu16 |
11 | linux-amd64-gpu-v100-latest-1 |
9 | windows-amd64-cpu16 |
4 | linux-arm64-cpu16 |
1 | linux-amd64-gpu-h100-latest-1-testing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes LGTM. Can you please diff the SASS generated for a unit test covering the affected code for the affected SM versions before and after your change? Thx!
_CCCL_NODISCARD _CCCL_DEVICE constexpr bool enable_sm90_simd_reduction() | ||
{ | ||
using cub::detail::is_one_of; | ||
// ::cuda::std::plus<> not handled: IADD3 always produces less instructions than VIADD2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: Does this comment no longer apply? It seems to contain useful information.
cub/cub/thread/thread_reduce.cuh
Outdated
&& cub::detail:: | ||
is_one_of<ReductionOp, ::cuda::minimum<>, ::cuda::minimum<T>, ::cuda::maximum<>, ::cuda::maximum<T>>(); | ||
}; | ||
inline constexpr bool enable_generic_simd_reduction_traits_v = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remark: We have _CCCL_GLOBAL_CONSTANT
for global constants, but I am not sure if the workaround (adding __device__
for the device compilation pass) is still necessary.
Thanks for looking at this PR! This is still a draft. There are several other changes that I want to apply. |
🟨 CI finished in 5h 41m: Pass: 94%/90 | Total: 2d 15h | Avg: 42m 39s | Max: 1h 16m | Hits: 171%/10928
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
Thrust | |
CUDA Experimental | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
CUDA Experimental | |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 90)
# | Runner |
---|---|
65 | linux-amd64-cpu16 |
11 | linux-amd64-gpu-v100-latest-1 |
9 | windows-amd64-cpu16 |
4 | linux-arm64-cpu16 |
1 | linux-amd64-gpu-h100-latest-1-testing |
Description
ThreadReduce
implementation is very complex due to optimization dispatch across different architectures and C++11 dialect.This PR aims at simplifying the code by using C++17 features.
Open point: Introduce a
ThreadReduce
overloading to allow selecting optimizations