Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring ThreadReduce #3441

Open
wants to merge 16 commits into
base: main
Choose a base branch
from
Open

Conversation

fbusato
Copy link
Contributor

@fbusato fbusato commented Jan 18, 2025

Description

ThreadReduce implementation is very complex due to optimization dispatch across different architectures and C++11 dialect.
This PR aims at simplifying the code by using C++17 features.

Open point: Introduce a ThreadReduce overloading to allow selecting optimizations

@fbusato fbusato added the 3.0 Targeted for 3.0 release label Jan 18, 2025
@fbusato fbusato self-assigned this Jan 18, 2025
Copy link

copy-pr-bot bot commented Jan 18, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@fbusato
Copy link
Contributor Author

fbusato commented Jan 18, 2025

/ok to test

Copy link
Contributor

🟨 CI finished in 1h 40m: Pass: 89%/78 | Total: 2d 02h | Avg: 38m 45s | Max: 1h 13m | Hits: 202%/12760
  • 🟨 cub: Pass: 89%/38 | Total: 1d 07h | Avg: 49m 54s | Max: 1h 13m | Hits: 174%/3540

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  88%/36  | Total:  1d 05h | Avg: 49m 28s | Max:  1h 13m | Hits: 174%/3540  
      🟩 arm64              Pass: 100%/2   | Total:  1h 55m | Avg: 57m 51s | Max: 58m 45s
    🔍 cudacxx_family: nvcc 🔍
      🟩 ClangCUDA          Pass: 100%/2   | Total:  1h 58m | Avg: 59m 08s | Max:  1h 00m
      🔍 nvcc               Pass:  88%/36  | Total:  1d 05h | Avg: 49m 23s | Max:  1h 13m | Hits: 174%/3540  
    🔍 gpu: v100 🔍
      🟩 h100               Pass: 100%/2   | Total: 46m 28s | Avg: 23m 14s | Max: 26m 44s
      🔍 v100               Pass:  88%/36  | Total:  1d 06h | Avg: 51m 23s | Max:  1h 13m | Hits: 174%/3540  
    🟨 ctk
      🟩 12.0               Pass: 100%/5   | Total:  4h 56m | Avg: 59m 22s | Max:  1h 04m | Hits: 176%/885   
      🟥 12.5               Pass:   0%/2   | Total:  1h 03m | Avg: 31m 59s | Max: 32m 43s
      🟨 12.6               Pass:  93%/31  | Total:  1d 01h | Avg: 49m 32s | Max:  1h 13m | Hits: 173%/2655  
    🟨 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  1h 58m | Avg: 59m 08s | Max:  1h 00m
      🟩 nvcc12.0           Pass: 100%/5   | Total:  4h 56m | Avg: 59m 22s | Max:  1h 04m | Hits: 176%/885   
      🟥 nvcc12.5           Pass:   0%/2   | Total:  1h 03m | Avg: 31m 59s | Max: 32m 43s
      🟨 nvcc12.6           Pass:  93%/29  | Total: 23h 37m | Avg: 48m 52s | Max:  1h 13m | Hits: 173%/2655  
    🟨 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  3h 49m | Avg: 57m 26s | Max: 59m 26s
      🟩 Clang15            Pass: 100%/1   | Total: 55m 31s | Avg: 55m 31s | Max: 55m 31s
      🟩 Clang16            Pass: 100%/1   | Total:  1h 01m | Avg:  1h 01m | Max:  1h 01m
      🟩 Clang17            Pass: 100%/1   | Total:  1h 01m | Avg:  1h 01m | Max:  1h 01m
      🟨 Clang18            Pass:  85%/7   | Total:  5h 51m | Avg: 50m 13s | Max:  1h 01m
      🟩 GCC7               Pass: 100%/2   | Total:  1h 53m | Avg: 56m 57s | Max: 58m 51s
      🟩 GCC8               Pass: 100%/1   | Total: 54m 49s | Avg: 54m 49s | Max: 54m 49s
      🟩 GCC9               Pass: 100%/2   | Total:  1h 53m | Avg: 56m 44s | Max: 58m 42s
      🟩 GCC10              Pass: 100%/1   | Total: 57m 22s | Avg: 57m 22s | Max: 57m 22s
      🟩 GCC11              Pass: 100%/1   | Total: 54m 54s | Avg: 54m 54s | Max: 54m 54s
      🟩 GCC12              Pass: 100%/3   | Total:  1h 44m | Avg: 34m 40s | Max: 57m 34s
      🟨 GCC13              Pass:  87%/8   | Total:  4h 59m | Avg: 37m 25s | Max:  1h 00m
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 10m | Avg:  1h 05m | Max:  1h 06m | Hits: 176%/1770  
      🟩 MSVC14.39          Pass: 100%/2   | Total:  2h 23m | Avg:  1h 11m | Max:  1h 13m | Hits: 172%/1770  
      🟥 NVHPC24.7          Pass:   0%/2   | Total:  1h 03m | Avg: 31m 59s | Max: 32m 43s
    🟨 cxx_family
      🟨 Clang              Pass:  92%/14  | Total: 12h 40m | Avg: 54m 17s | Max:  1h 01m
      🟨 GCC                Pass:  94%/18  | Total: 13h 17m | Avg: 44m 19s | Max:  1h 00m
      🟩 MSVC               Pass: 100%/4   | Total:  4h 34m | Avg:  1h 08m | Max:  1h 13m | Hits: 174%/3540  
      🟥 NVHPC              Pass:   0%/2   | Total:  1h 03m | Avg: 31m 59s | Max: 32m 43s
    🟨 jobs
      🟨 Build              Pass:  93%/31  | Total:  1d 04h | Avg: 55m 27s | Max:  1h 13m | Hits: 174%/3540  
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 22m 44s | Avg: 22m 44s | Max: 22m 44s
      🟩 GraphCapture       Pass: 100%/1   | Total: 14m 28s | Avg: 14m 28s | Max: 14m 28s
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 07m | Avg: 22m 34s | Max: 27m 59s
      🟥 TestGPU            Pass:   0%/2   | Total:  1h 12m | Avg: 36m 21s | Max: 37m 40s
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 46m 28s | Avg: 23m 14s | Max: 26m 44s
      🟩 90a                Pass: 100%/1   | Total: 25m 30s | Avg: 25m 30s | Max: 25m 30s
    🟨 std
      🟨 17                 Pass:  92%/14  | Total: 13h 27m | Avg: 57m 40s | Max:  1h 10m | Hits: 176%/2655  
      🟨 20                 Pass:  87%/24  | Total: 18h 09m | Avg: 45m 22s | Max:  1h 13m | Hits: 169%/885   
    
  • 🟨 thrust: Pass: 94%/37 | Total: 18h 13m | Avg: 29m 33s | Max: 1h 01m | Hits: 213%/9220

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  94%/35  | Total: 17h 14m | Avg: 29m 33s | Max:  1h 01m | Hits: 213%/9220  
      🟩 arm64              Pass: 100%/2   | Total: 59m 00s | Avg: 29m 30s | Max: 31m 23s
    🚨 ctk: 12.5 🚨
      🟩 12.0               Pass: 100%/5   | Total:  3h 05m | Avg: 37m 00s | Max: 56m 32s | Hits: 175%/1844  
      🔥 12.5               Pass:   0%/2   | Total: 12m 28s | Avg:  6m 14s | Max:  6m 16s
      🟩 12.6               Pass: 100%/30  | Total: 14h 56m | Avg: 29m 52s | Max:  1h 01m | Hits: 223%/7376  
    🚨 cudacxx: nvcc12.5 🚨
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 53m 16s | Avg: 26m 38s | Max: 27m 27s
      🟩 nvcc12.0           Pass: 100%/5   | Total:  3h 05m | Avg: 37m 00s | Max: 56m 32s | Hits: 175%/1844  
      🔥 nvcc12.5           Pass:   0%/2   | Total: 12m 28s | Avg:  6m 14s | Max:  6m 16s
      🟩 nvcc12.6           Pass: 100%/28  | Total: 14h 02m | Avg: 30m 05s | Max:  1h 01m | Hits: 223%/7376  
    🔍 cudacxx_family: nvcc 🔍
      🟩 ClangCUDA          Pass: 100%/2   | Total: 53m 16s | Avg: 26m 38s | Max: 27m 27s
      🔍 nvcc               Pass:  94%/35  | Total: 17h 20m | Avg: 29m 43s | Max:  1h 01m | Hits: 213%/9220  
    🚨 cxx: NVHPC24.7 🚨
      🟩 Clang14            Pass: 100%/4   | Total:  2h 01m | Avg: 30m 26s | Max: 31m 08s
      🟩 Clang15            Pass: 100%/1   | Total: 29m 40s | Avg: 29m 40s | Max: 29m 40s
      🟩 Clang16            Pass: 100%/1   | Total: 29m 42s | Avg: 29m 42s | Max: 29m 42s
      🟩 Clang17            Pass: 100%/1   | Total: 30m 19s | Avg: 30m 19s | Max: 30m 19s
      🟩 Clang18            Pass: 100%/7   | Total:  2h 43m | Avg: 23m 19s | Max: 32m 28s
      🟩 GCC7               Pass: 100%/2   | Total:  1h 02m | Avg: 31m 00s | Max: 31m 10s
      🟩 GCC8               Pass: 100%/1   | Total: 30m 30s | Avg: 30m 30s | Max: 30m 30s
      🟩 GCC9               Pass: 100%/2   | Total:  1h 07m | Avg: 33m 36s | Max: 35m 50s
      🟩 GCC10              Pass: 100%/1   | Total: 31m 17s | Avg: 31m 17s | Max: 31m 17s
      🟩 GCC11              Pass: 100%/1   | Total: 33m 58s | Avg: 33m 58s | Max: 33m 58s
      🟩 GCC12              Pass: 100%/1   | Total: 33m 35s | Avg: 33m 35s | Max: 33m 35s
      🟩 GCC13              Pass: 100%/8   | Total:  2h 54m | Avg: 21m 47s | Max: 32m 58s
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 57m | Avg: 58m 33s | Max:  1h 00m | Hits: 176%/3688  
      🟩 MSVC14.39          Pass: 100%/3   | Total:  2h 36m | Avg: 52m 09s | Max:  1h 01m | Hits: 238%/5532  
      🔥 NVHPC24.7          Pass:   0%/2   | Total: 12m 28s | Avg:  6m 14s | Max:  6m 16s
    🚨 cxx_family: NVHPC 🚨
      🟩 Clang              Pass: 100%/14  | Total:  6h 14m | Avg: 26m 45s | Max: 32m 28s
      🟩 GCC                Pass: 100%/16  | Total:  7h 12m | Avg: 27m 03s | Max: 35m 50s
      🟩 MSVC               Pass: 100%/5   | Total:  4h 33m | Avg: 54m 42s | Max:  1h 01m | Hits: 213%/9220  
      🔥 NVHPC              Pass:   0%/2   | Total: 12m 28s | Avg:  6m 14s | Max:  6m 16s
    🔍 jobs: Build 🔍
      🔍 Build              Pass:  93%/31  | Total: 16h 45m | Avg: 32m 25s | Max:  1h 01m | Hits: 175%/7376  
      🟩 TestCPU            Pass: 100%/3   | Total: 49m 59s | Avg: 16m 39s | Max: 34m 42s | Hits: 365%/1844  
      🟩 TestGPU            Pass: 100%/3   | Total: 38m 23s | Avg: 12m 47s | Max: 13m 17s
    🟨 gpu
      🟨 v100               Pass:  94%/37  | Total: 18h 13m | Avg: 29m 33s | Max:  1h 01m | Hits: 213%/9220  
    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 39m 29s | Avg: 19m 44s | Max: 26m 12s
    🟩 sm
      🟩 90a                Pass: 100%/1   | Total: 19m 08s | Avg: 19m 08s | Max: 19m 08s
    🟨 std
      🟨 17                 Pass:  92%/14  | Total:  8h 14m | Avg: 35m 17s | Max:  1h 00m | Hits: 176%/5532  
      🟨 20                 Pass:  95%/21  | Total:  9h 20m | Avg: 26m 40s | Max:  1h 01m | Hits: 270%/3688  
    
  • 🟨 cccl_c_parallel: Pass: 50%/2 | Total: 7m 58s | Avg: 3m 59s | Max: 5m 47s

    🚨 jobs: Test 🚨
      🟩 Build              Pass: 100%/1   | Total:  2m 11s | Avg:  2m 11s | Max:  2m 11s
      🔥 Test               Pass:   0%/1   | Total:  5m 47s | Avg:  5m 47s | Max:  5m 47s
    🟨 cpu
      🟨 amd64              Pass:  50%/2   | Total:  7m 58s | Avg:  3m 59s | Max:  5m 47s
    🟨 ctk
      🟨 12.6               Pass:  50%/2   | Total:  7m 58s | Avg:  3m 59s | Max:  5m 47s
    🟨 cudacxx
      🟨 nvcc12.6           Pass:  50%/2   | Total:  7m 58s | Avg:  3m 59s | Max:  5m 47s
    🟨 cudacxx_family
      🟨 nvcc               Pass:  50%/2   | Total:  7m 58s | Avg:  3m 59s | Max:  5m 47s
    🟨 cxx
      🟨 GCC13              Pass:  50%/2   | Total:  7m 58s | Avg:  3m 59s | Max:  5m 47s
    🟨 cxx_family
      🟨 GCC                Pass:  50%/2   | Total:  7m 58s | Avg:  3m 59s | Max:  5m 47s
    🟨 gpu
      🟨 v100               Pass:  50%/2   | Total:  7m 58s | Avg:  3m 59s | Max:  5m 47s
    
  • 🟥 python: Pass: 0%/1 | Total: 25m 13s | Avg: 25m 13s | Max: 25m 13s

    🟥 cpu
      🟥 amd64              Pass:   0%/1   | Total: 25m 13s | Avg: 25m 13s | Max: 25m 13s
    🟥 ctk
      🟥 12.6               Pass:   0%/1   | Total: 25m 13s | Avg: 25m 13s | Max: 25m 13s
    🟥 cudacxx
      🟥 nvcc12.6           Pass:   0%/1   | Total: 25m 13s | Avg: 25m 13s | Max: 25m 13s
    🟥 cudacxx_family
      🟥 nvcc               Pass:   0%/1   | Total: 25m 13s | Avg: 25m 13s | Max: 25m 13s
    🟥 cxx
      🟥 GCC13              Pass:   0%/1   | Total: 25m 13s | Avg: 25m 13s | Max: 25m 13s
    🟥 cxx_family
      🟥 GCC                Pass:   0%/1   | Total: 25m 13s | Avg: 25m 13s | Max: 25m 13s
    🟥 gpu
      🟥 v100               Pass:   0%/1   | Total: 25m 13s | Avg: 25m 13s | Max: 25m 13s
    🟥 jobs
      🟥 Test               Pass:   0%/1   | Total: 25m 13s | Avg: 25m 13s | Max: 25m 13s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 78)

# Runner
53 linux-amd64-cpu16
11 linux-amd64-gpu-v100-latest-1
9 windows-amd64-cpu16
4 linux-arm64-cpu16
1 linux-amd64-gpu-h100-latest-1-testing

Copy link
Contributor

@bernhardmgruber bernhardmgruber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes LGTM. Can you please diff the SASS generated for a unit test covering the affected code for the affected SM versions before and after your change? Thx!

cub/cub/detail/type_traits.cuh Show resolved Hide resolved
_CCCL_NODISCARD _CCCL_DEVICE constexpr bool enable_sm90_simd_reduction()
{
using cub::detail::is_one_of;
// ::cuda::std::plus<> not handled: IADD3 always produces less instructions than VIADD2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: Does this comment no longer apply? It seems to contain useful information.

&& cub::detail::
is_one_of<ReductionOp, ::cuda::minimum<>, ::cuda::minimum<T>, ::cuda::maximum<>, ::cuda::maximum<T>>();
};
inline constexpr bool enable_generic_simd_reduction_traits_v =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remark: We have _CCCL_GLOBAL_CONSTANT for global constants, but I am not sure if the workaround (adding __device__ for the device compilation pass) is still necessary.

@fbusato
Copy link
Contributor Author

fbusato commented Jan 21, 2025

Thanks for looking at this PR! This is still a draft. There are several other changes that I want to apply.

@fbusato fbusato marked this pull request as ready for review January 27, 2025 21:11
@fbusato fbusato requested a review from a team as a code owner January 27, 2025 21:11
Copy link
Contributor

🟨 CI finished in 5h 41m: Pass: 94%/90 | Total: 2d 15h | Avg: 42m 39s | Max: 1h 16m | Hits: 171%/10928
  • 🟨 cub: Pass: 95%/44 | Total: 1d 15h | Avg: 53m 59s | Max: 1h 16m | Hits: 158%/3552

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  95%/42  | Total:  1d 13h | Avg: 53m 40s | Max:  1h 16m | Hits: 158%/3552  
      🟩 arm64              Pass: 100%/2   | Total:  2h 01m | Avg:  1h 00m | Max:  1h 01m
    🔍 ctk: 12.6 🔍
      🟩 12.0               Pass: 100%/5   | Total:  4h 56m | Avg: 59m 14s | Max:  1h 00m | Hits: 159%/888   
      🟩 12.5               Pass: 100%/2   | Total:  2h 33m | Avg:  1h 16m | Max:  1h 16m
      🔍 12.6               Pass:  94%/37  | Total:  1d 08h | Avg: 52m 03s | Max:  1h 13m | Hits: 158%/2664  
    🔍 cudacxx: nvcc12.6 🔍
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  2h 05m | Avg:  1h 02m | Max:  1h 05m
      🟩 nvcc12.0           Pass: 100%/5   | Total:  4h 56m | Avg: 59m 14s | Max:  1h 00m | Hits: 159%/888   
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 33m | Avg:  1h 16m | Max:  1h 16m
      🔍 nvcc12.6           Pass:  94%/35  | Total:  1d 06h | Avg: 51m 27s | Max:  1h 13m | Hits: 158%/2664  
    🔍 cudacxx_family: nvcc 🔍
      🟩 ClangCUDA          Pass: 100%/2   | Total:  2h 05m | Avg:  1h 02m | Max:  1h 05m
      🔍 nvcc               Pass:  95%/42  | Total:  1d 13h | Avg: 53m 35s | Max:  1h 16m | Hits: 158%/3552  
    🔍 gpu: v100 🔍
      🟩 h100               Pass: 100%/2   | Total: 50m 18s | Avg: 25m 09s | Max: 27m 30s
      🔍 v100               Pass:  95%/42  | Total:  1d 14h | Avg: 55m 21s | Max:  1h 16m | Hits: 158%/3552  
    🚨 jobs: TestGPU 🚨
      🟩 Build              Pass: 100%/37  | Total:  1d 12h | Avg: 59m 20s | Max:  1h 16m | Hits: 158%/3552  
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 28m 16s | Avg: 28m 16s | Max: 28m 16s
      🟩 GraphCapture       Pass: 100%/1   | Total: 22m 22s | Avg: 22m 22s | Max: 22m 22s
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 19m | Avg: 26m 29s | Max: 30m 40s
      🔥 TestGPU            Pass:   0%/2   | Total: 49m 42s | Avg: 24m 51s | Max: 30m 00s
    🔍 std: 20 🔍
      🟩 17                 Pass: 100%/20  | Total: 20h 26m | Avg:  1h 01m | Max:  1h 16m | Hits: 159%/2664  
      🔍 20                 Pass:  91%/24  | Total: 19h 09m | Avg: 47m 53s | Max:  1h 16m | Hits: 157%/888   
    🟨 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  3h 46m | Avg: 56m 36s | Max: 58m 24s
      🟩 Clang15            Pass: 100%/2   | Total:  1h 58m | Avg: 59m 28s | Max:  1h 00m
      🟩 Clang16            Pass: 100%/2   | Total:  1h 50m | Avg: 55m 24s | Max: 56m 04s
      🟩 Clang17            Pass: 100%/2   | Total:  2h 00m | Avg:  1h 00m | Max:  1h 02m
      🟨 Clang18            Pass:  85%/7   | Total:  5h 57m | Avg: 51m 01s | Max:  1h 05m
      🟩 GCC7               Pass: 100%/2   | Total:  1h 59m | Avg: 59m 35s | Max: 59m 56s
      🟩 GCC8               Pass: 100%/1   | Total:  1h 06m | Avg:  1h 06m | Max:  1h 06m
      🟩 GCC9               Pass: 100%/2   | Total:  2h 00m | Avg:  1h 00m | Max:  1h 00m
      🟩 GCC10              Pass: 100%/2   | Total:  2h 06m | Avg:  1h 03m | Max:  1h 04m
      🟩 GCC11              Pass: 100%/2   | Total:  1h 54m | Avg: 57m 15s | Max: 57m 35s
      🟩 GCC12              Pass: 100%/4   | Total:  2h 49m | Avg: 42m 22s | Max:  1h 00m
      🟨 GCC13              Pass:  87%/8   | Total:  4h 58m | Avg: 37m 21s | Max:  1h 01m
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 06m | Avg:  1h 03m | Max:  1h 06m | Hits: 159%/1776  
      🟩 MSVC14.39          Pass: 100%/2   | Total:  2h 26m | Avg:  1h 13m | Max:  1h 13m | Hits: 158%/1776  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 33m | Avg:  1h 16m | Max:  1h 16m
    🟨 cxx_family
      🟨 Clang              Pass:  94%/17  | Total: 15h 33m | Avg: 54m 54s | Max:  1h 05m
      🟨 GCC                Pass:  95%/21  | Total: 16h 55m | Avg: 48m 21s | Max:  1h 06m
      🟩 MSVC               Pass: 100%/4   | Total:  4h 33m | Avg:  1h 08m | Max:  1h 13m | Hits: 158%/3552  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 33m | Avg:  1h 16m | Max:  1h 16m
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 50m 18s | Avg: 25m 09s | Max: 27m 30s
      🟩 90a                Pass: 100%/1   | Total: 24m 59s | Avg: 24m 59s | Max: 24m 59s
    
  • 🟨 thrust: Pass: 97%/43 | Total: 1d 00h | Avg: 33m 40s | Max: 1h 12m | Hits: 177%/7376

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  97%/41  | Total: 23h 09m | Avg: 33m 53s | Max:  1h 12m | Hits: 177%/7376  
      🟩 arm64              Pass: 100%/2   | Total: 58m 28s | Avg: 29m 14s | Max: 31m 11s
    🔍 ctk: 12.6 🔍
      🟩 12.0               Pass: 100%/5   | Total:  3h 18m | Avg: 39m 38s | Max:  1h 01m | Hits: 177%/1844  
      🟩 12.5               Pass: 100%/2   | Total:  1h 52m | Avg: 56m 11s | Max: 56m 25s
      🔍 12.6               Pass:  97%/36  | Total: 18h 57m | Avg: 31m 36s | Max:  1h 12m | Hits: 177%/5532  
    🔍 cudacxx: nvcc12.6 🔍
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 56m 58s | Avg: 28m 29s | Max: 29m 43s
      🟩 nvcc12.0           Pass: 100%/5   | Total:  3h 18m | Avg: 39m 38s | Max:  1h 01m | Hits: 177%/1844  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  1h 52m | Avg: 56m 11s | Max: 56m 25s
      🔍 nvcc12.6           Pass:  97%/34  | Total: 18h 00m | Avg: 31m 47s | Max:  1h 12m | Hits: 177%/5532  
    🔍 cudacxx_family: nvcc 🔍
      🟩 ClangCUDA          Pass: 100%/2   | Total: 56m 58s | Avg: 28m 29s | Max: 29m 43s
      🔍 nvcc               Pass:  97%/41  | Total: 23h 11m | Avg: 33m 56s | Max:  1h 12m | Hits: 177%/7376  
    🔍 cxx: MSVC14.39 🔍
      🟩 Clang14            Pass: 100%/4   | Total:  2h 09m | Avg: 32m 22s | Max: 37m 56s
      🟩 Clang15            Pass: 100%/2   | Total:  1h 06m | Avg: 33m 01s | Max: 33m 25s
      🟩 Clang16            Pass: 100%/2   | Total:  1h 04m | Avg: 32m 26s | Max: 34m 36s
      🟩 Clang17            Pass: 100%/2   | Total:  1h 06m | Avg: 33m 06s | Max: 34m 29s
      🟩 Clang18            Pass: 100%/7   | Total:  2h 53m | Avg: 24m 45s | Max: 33m 26s
      🟩 GCC7               Pass: 100%/2   | Total:  1h 10m | Avg: 35m 14s | Max: 35m 32s
      🟩 GCC8               Pass: 100%/1   | Total: 33m 15s | Avg: 33m 15s | Max: 33m 15s
      🟩 GCC9               Pass: 100%/2   | Total:  1h 07m | Avg: 33m 42s | Max: 34m 28s
      🟩 GCC10              Pass: 100%/2   | Total:  1h 08m | Avg: 34m 29s | Max: 35m 05s
      🟩 GCC11              Pass: 100%/2   | Total:  1h 06m | Avg: 33m 25s | Max: 34m 56s
      🟩 GCC12              Pass: 100%/2   | Total:  1h 10m | Avg: 35m 06s | Max: 37m 09s
      🟩 GCC13              Pass: 100%/8   | Total:  2h 53m | Avg: 21m 41s | Max: 37m 11s
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 55m | Avg: 57m 38s | Max:  1h 01m | Hits: 177%/3688  
      🔍 MSVC14.39          Pass:  66%/3   | Total:  2h 49m | Avg: 56m 39s | Max:  1h 12m | Hits: 177%/3688  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  1h 52m | Avg: 56m 11s | Max: 56m 25s
    🔍 cxx_family: MSVC 🔍
      🟩 Clang              Pass: 100%/17  | Total:  8h 19m | Avg: 29m 24s | Max: 37m 56s
      🟩 GCC                Pass: 100%/19  | Total:  9h 10m | Avg: 28m 58s | Max: 37m 11s
      🔍 MSVC               Pass:  80%/5   | Total:  4h 45m | Avg: 57m 03s | Max:  1h 12m | Hits: 177%/7376  
      🟩 NVHPC              Pass: 100%/2   | Total:  1h 52m | Avg: 56m 11s | Max: 56m 25s
    🔍 jobs: TestCPU 🔍
      🟩 Build              Pass: 100%/37  | Total: 22h 41m | Avg: 36m 47s | Max:  1h 12m | Hits: 177%/7376  
      🔍 TestCPU            Pass:  66%/3   | Total: 50m 15s | Avg: 16m 45s | Max: 34m 36s
      🟩 TestGPU            Pass: 100%/3   | Total: 36m 56s | Avg: 12m 18s | Max: 14m 12s
    🔍 std: 20 🔍
      🟩 17                 Pass: 100%/20  | Total: 12h 36m | Avg: 37m 48s | Max:  1h 03m | Hits: 177%/5532  
      🔍 20                 Pass:  95%/21  | Total: 10h 55m | Avg: 31m 12s | Max:  1h 12m | Hits: 177%/1844  
    🟨 gpu
      🟨 v100               Pass:  97%/43  | Total:  1d 00h | Avg: 33m 40s | Max:  1h 12m | Hits: 177%/7376  
    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 37m 02s | Avg: 18m 31s | Max: 26m 06s
    🟩 sm
      🟩 90a                Pass: 100%/1   | Total: 18m 06s | Avg: 18m 06s | Max: 18m 06s
    
  • 🟨 cccl_c_parallel: Pass: 50%/2 | Total: 8m 20s | Avg: 4m 10s | Max: 5m 57s

    🚨 jobs: Test 🚨
      🟩 Build              Pass: 100%/1   | Total:  2m 23s | Avg:  2m 23s | Max:  2m 23s
      🔥 Test               Pass:   0%/1   | Total:  5m 57s | Avg:  5m 57s | Max:  5m 57s
    🟨 cpu
      🟨 amd64              Pass:  50%/2   | Total:  8m 20s | Avg:  4m 10s | Max:  5m 57s
    🟨 ctk
      🟨 12.6               Pass:  50%/2   | Total:  8m 20s | Avg:  4m 10s | Max:  5m 57s
    🟨 cudacxx
      🟨 nvcc12.6           Pass:  50%/2   | Total:  8m 20s | Avg:  4m 10s | Max:  5m 57s
    🟨 cudacxx_family
      🟨 nvcc               Pass:  50%/2   | Total:  8m 20s | Avg:  4m 10s | Max:  5m 57s
    🟨 cxx
      🟨 GCC13              Pass:  50%/2   | Total:  8m 20s | Avg:  4m 10s | Max:  5m 57s
    🟨 cxx_family
      🟨 GCC                Pass:  50%/2   | Total:  8m 20s | Avg:  4m 10s | Max:  5m 57s
    🟨 gpu
      🟨 v100               Pass:  50%/2   | Total:  8m 20s | Avg:  4m 10s | Max:  5m 57s
    
  • 🟥 python: Pass: 0%/1 | Total: 6m 48s | Avg: 6m 48s | Max: 6m 48s

    🟥 cpu
      🟥 amd64              Pass:   0%/1   | Total:  6m 48s | Avg:  6m 48s | Max:  6m 48s
    🟥 ctk
      🟥 12.6               Pass:   0%/1   | Total:  6m 48s | Avg:  6m 48s | Max:  6m 48s
    🟥 cudacxx
      🟥 nvcc12.6           Pass:   0%/1   | Total:  6m 48s | Avg:  6m 48s | Max:  6m 48s
    🟥 cudacxx_family
      🟥 nvcc               Pass:   0%/1   | Total:  6m 48s | Avg:  6m 48s | Max:  6m 48s
    🟥 cxx
      🟥 GCC13              Pass:   0%/1   | Total:  6m 48s | Avg:  6m 48s | Max:  6m 48s
    🟥 cxx_family
      🟥 GCC                Pass:   0%/1   | Total:  6m 48s | Avg:  6m 48s | Max:  6m 48s
    🟥 gpu
      🟥 v100               Pass:   0%/1   | Total:  6m 48s | Avg:  6m 48s | Max:  6m 48s
    🟥 jobs
      🟥 Test               Pass:   0%/1   | Total:  6m 48s | Avg:  6m 48s | Max:  6m 48s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 90)

# Runner
65 linux-amd64-cpu16
11 linux-amd64-gpu-v100-latest-1
9 windows-amd64-cpu16
4 linux-arm64-cpu16
1 linux-amd64-gpu-h100-latest-1-testing

@fbusato fbusato requested a review from a team as a code owner January 29, 2025 19:28
@fbusato fbusato requested a review from alliepiper January 29, 2025 19:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.0 Targeted for 3.0 release
Projects
Status: In Review
Development

Successfully merging this pull request may close these issues.

2 participants