Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uses unsigned offset types in thrust's sort algorithm calling into DispatchMergeSort #3437

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

elstehle
Copy link
Collaborator

Description

PR #3328 has limited the offset types kernel templates of DeviceMergeSort get instantiated for to unsigned offset types. We want to reflect the switch to unsigned offset types in thrust, so thrust can benefit from future tunings that we do for unsigned offset types.

@elstehle elstehle requested review from a team as code owners January 17, 2025 13:14
Copy link
Contributor

🟩 CI finished in 1h 20m: Pass: 100%/78 | Total: 1d 11h | Avg: 27m 02s | Max: 1h 01m | Hits: 393%/12760
  • 🟩 cub: Pass: 100%/38 | Total: 23h 52m | Avg: 37m 42s | Max: 1h 01m | Hits: 523%/3540

    🟩 cpu
      🟩 amd64              Pass: 100%/36  | Total: 22h 20m | Avg: 37m 14s | Max:  1h 01m | Hits: 523%/3540  
      🟩 arm64              Pass: 100%/2   | Total:  1h 31m | Avg: 45m 57s | Max: 47m 17s
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  3h 29m | Avg: 41m 57s | Max: 55m 03s | Hits: 523%/885   
      🟩 12.5               Pass: 100%/2   | Total:  1h 27m | Avg: 43m 50s | Max: 45m 23s
      🟩 12.6               Pass: 100%/31  | Total: 18h 55m | Avg: 36m 37s | Max:  1h 01m | Hits: 523%/2655  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  1h 43m | Avg: 51m 30s | Max: 51m 38s
      🟩 nvcc12.0           Pass: 100%/5   | Total:  3h 29m | Avg: 41m 57s | Max: 55m 03s | Hits: 523%/885   
      🟩 nvcc12.5           Pass: 100%/2   | Total:  1h 27m | Avg: 43m 50s | Max: 45m 23s
      🟩 nvcc12.6           Pass: 100%/29  | Total: 17h 12m | Avg: 35m 35s | Max:  1h 01m | Hits: 523%/2655  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  1h 43m | Avg: 51m 30s | Max: 51m 38s
      🟩 nvcc               Pass: 100%/36  | Total: 22h 09m | Avg: 36m 55s | Max:  1h 01m | Hits: 523%/3540  
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  2h 36m | Avg: 39m 08s | Max: 42m 02s
      🟩 Clang15            Pass: 100%/1   | Total: 37m 46s | Avg: 37m 46s | Max: 37m 46s
      🟩 Clang16            Pass: 100%/1   | Total: 39m 33s | Avg: 39m 33s | Max: 39m 33s
      🟩 Clang17            Pass: 100%/1   | Total: 37m 44s | Avg: 37m 44s | Max: 37m 44s
      🟩 Clang18            Pass: 100%/7   | Total:  4h 35m | Avg: 39m 18s | Max: 51m 38s
      🟩 GCC7               Pass: 100%/2   | Total:  1h 15m | Avg: 37m 47s | Max: 37m 52s
      🟩 GCC8               Pass: 100%/1   | Total: 36m 55s | Avg: 36m 55s | Max: 36m 55s
      🟩 GCC9               Pass: 100%/2   | Total:  1h 19m | Avg: 39m 46s | Max: 40m 05s
      🟩 GCC10              Pass: 100%/1   | Total: 37m 50s | Avg: 37m 50s | Max: 37m 50s
      🟩 GCC11              Pass: 100%/1   | Total: 39m 19s | Avg: 39m 19s | Max: 39m 19s
      🟩 GCC12              Pass: 100%/3   | Total:  1h 14m | Avg: 24m 55s | Max: 38m 05s
      🟩 GCC13              Pass: 100%/8   | Total:  3h 40m | Avg: 27m 31s | Max: 44m 37s
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 54m | Avg: 57m 03s | Max: 59m 04s | Hits: 523%/1770  
      🟩 MSVC14.39          Pass: 100%/2   | Total:  1h 59m | Avg: 59m 57s | Max:  1h 01m | Hits: 523%/1770  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  1h 27m | Avg: 43m 50s | Max: 45m 23s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/14  | Total:  9h 06m | Avg: 39m 03s | Max: 51m 38s
      🟩 GCC                Pass: 100%/18  | Total:  9h 24m | Avg: 31m 20s | Max: 44m 37s
      🟩 MSVC               Pass: 100%/4   | Total:  3h 54m | Avg: 58m 30s | Max:  1h 01m | Hits: 523%/3540  
      🟩 NVHPC              Pass: 100%/2   | Total:  1h 27m | Avg: 43m 50s | Max: 45m 23s
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 36m 42s | Avg: 18m 21s | Max: 19m 38s
      🟩 v100               Pass: 100%/36  | Total: 23h 15m | Avg: 38m 46s | Max:  1h 01m | Hits: 523%/3540  
    🟩 jobs
      🟩 Build              Pass: 100%/31  | Total: 21h 24m | Avg: 41m 26s | Max:  1h 01m | Hits: 523%/3540  
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 20m 44s | Avg: 20m 44s | Max: 20m 44s
      🟩 GraphCapture       Pass: 100%/1   | Total: 16m 59s | Avg: 16m 59s | Max: 16m 59s
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 04m | Avg: 21m 21s | Max: 22m 36s
      🟩 TestGPU            Pass: 100%/2   | Total: 46m 07s | Avg: 23m 03s | Max: 25m 16s
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 36m 42s | Avg: 18m 21s | Max: 19m 38s
      🟩 90a                Pass: 100%/1   | Total: 15m 57s | Avg: 15m 57s | Max: 15m 57s
    🟩 std
      🟩 17                 Pass: 100%/14  | Total: 10h 19m | Avg: 44m 16s | Max: 59m 04s | Hits: 523%/2655  
      🟩 20                 Pass: 100%/24  | Total: 13h 32m | Avg: 33m 52s | Max:  1h 01m | Hits: 523%/885   
    
  • 🟩 thrust: Pass: 100%/37 | Total: 10h 27m | Avg: 16m 57s | Max: 40m 33s | Hits: 343%/9220

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 24m 13s | Avg: 12m 06s | Max: 12m 09s
    🟩 cpu
      🟩 amd64              Pass: 100%/35  | Total: 10h 03m | Avg: 17m 14s | Max: 40m 33s | Hits: 343%/9220  
      🟩 arm64              Pass: 100%/2   | Total: 23m 51s | Avg: 11m 55s | Max: 12m 24s
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  1h 28m | Avg: 17m 40s | Max: 33m 51s | Hits: 336%/1844  
      🟩 12.5               Pass: 100%/2   | Total: 57m 11s | Avg: 28m 35s | Max: 28m 49s
      🟩 12.6               Pass: 100%/30  | Total:  8h 01m | Avg: 16m 03s | Max: 40m 33s | Hits: 344%/7376  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 25m 37s | Avg: 12m 48s | Max: 13m 02s
      🟩 nvcc12.0           Pass: 100%/5   | Total:  1h 28m | Avg: 17m 40s | Max: 33m 51s | Hits: 336%/1844  
      🟩 nvcc12.5           Pass: 100%/2   | Total: 57m 11s | Avg: 28m 35s | Max: 28m 49s
      🟩 nvcc12.6           Pass: 100%/28  | Total:  7h 36m | Avg: 16m 17s | Max: 40m 33s | Hits: 344%/7376  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 25m 37s | Avg: 12m 48s | Max: 13m 02s
      🟩 nvcc               Pass: 100%/35  | Total: 10h 01m | Avg: 17m 11s | Max: 40m 33s | Hits: 343%/9220  
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 54m 00s | Avg: 13m 30s | Max: 13m 50s
      🟩 Clang15            Pass: 100%/1   | Total: 14m 08s | Avg: 14m 08s | Max: 14m 08s
      🟩 Clang16            Pass: 100%/1   | Total: 12m 38s | Avg: 12m 38s | Max: 12m 38s
      🟩 Clang17            Pass: 100%/1   | Total: 12m 36s | Avg: 12m 36s | Max: 12m 36s
      🟩 Clang18            Pass: 100%/7   | Total:  1h 21m | Avg: 11m 35s | Max: 13m 02s
      🟩 GCC7               Pass: 100%/2   | Total: 25m 44s | Avg: 12m 52s | Max: 13m 11s
      🟩 GCC8               Pass: 100%/1   | Total: 13m 21s | Avg: 13m 21s | Max: 13m 21s
      🟩 GCC9               Pass: 100%/2   | Total: 28m 39s | Avg: 14m 19s | Max: 14m 31s
      🟩 GCC10              Pass: 100%/1   | Total: 13m 58s | Avg: 13m 58s | Max: 13m 58s
      🟩 GCC11              Pass: 100%/1   | Total: 13m 00s | Avg: 13m 00s | Max: 13m 00s
      🟩 GCC12              Pass: 100%/1   | Total: 13m 57s | Avg: 13m 57s | Max: 13m 57s
      🟩 GCC13              Pass: 100%/8   | Total:  1h 42m | Avg: 12m 50s | Max: 17m 22s
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 09m | Avg: 34m 46s | Max: 35m 41s | Hits: 336%/3688  
      🟩 MSVC14.39          Pass: 100%/3   | Total:  1h 54m | Avg: 38m 16s | Max: 40m 33s | Hits: 347%/5532  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 57m 11s | Avg: 28m 35s | Max: 28m 49s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/14  | Total:  2h 54m | Avg: 12m 27s | Max: 14m 08s
      🟩 GCC                Pass: 100%/16  | Total:  3h 31m | Avg: 13m 12s | Max: 17m 22s
      🟩 MSVC               Pass: 100%/5   | Total:  3h 04m | Avg: 36m 52s | Max: 40m 33s | Hits: 343%/9220  
      🟩 NVHPC              Pass: 100%/2   | Total: 57m 11s | Avg: 28m 35s | Max: 28m 49s
    🟩 gpu
      🟩 v100               Pass: 100%/37  | Total: 10h 27m | Avg: 16m 57s | Max: 40m 33s | Hits: 343%/9220  
    🟩 jobs
      🟩 Build              Pass: 100%/31  | Total:  9h 01m | Avg: 17m 28s | Max: 40m 33s | Hits: 337%/7376  
      🟩 TestCPU            Pass: 100%/3   | Total: 48m 56s | Avg: 16m 18s | Max: 34m 08s | Hits: 365%/1844  
      🟩 TestGPU            Pass: 100%/3   | Total: 36m 39s | Avg: 12m 13s | Max: 12m 51s
    🟩 sm
      🟩 90a                Pass: 100%/1   | Total: 17m 22s | Avg: 17m 22s | Max: 17m 22s
    🟩 std
      🟩 17                 Pass: 100%/14  | Total:  4h 33m | Avg: 19m 30s | Max: 40m 08s | Hits: 338%/5532  
      🟩 20                 Pass: 100%/21  | Total:  5h 30m | Avg: 15m 42s | Max: 40m 33s | Hits: 351%/3688  
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 8m 50s | Avg: 4m 25s | Max: 6m 50s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total:  8m 50s | Avg:  4m 25s | Max:  6m 50s
    🟩 ctk
      🟩 12.6               Pass: 100%/2   | Total:  8m 50s | Avg:  4m 25s | Max:  6m 50s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/2   | Total:  8m 50s | Avg:  4m 25s | Max:  6m 50s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total:  8m 50s | Avg:  4m 25s | Max:  6m 50s
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total:  8m 50s | Avg:  4m 25s | Max:  6m 50s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total:  8m 50s | Avg:  4m 25s | Max:  6m 50s
    🟩 gpu
      🟩 v100               Pass: 100%/2   | Total:  8m 50s | Avg:  4m 25s | Max:  6m 50s
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 00s | Avg:  2m 00s | Max:  2m 00s
      🟩 Test               Pass: 100%/1   | Total:  6m 50s | Avg:  6m 50s | Max:  6m 50s
    
  • 🟩 python: Pass: 100%/1 | Total: 41m 06s | Avg: 41m 06s | Max: 41m 06s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 41m 06s | Avg: 41m 06s | Max: 41m 06s
    🟩 ctk
      🟩 12.6               Pass: 100%/1   | Total: 41m 06s | Avg: 41m 06s | Max: 41m 06s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/1   | Total: 41m 06s | Avg: 41m 06s | Max: 41m 06s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 41m 06s | Avg: 41m 06s | Max: 41m 06s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 41m 06s | Avg: 41m 06s | Max: 41m 06s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 41m 06s | Avg: 41m 06s | Max: 41m 06s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 41m 06s | Avg: 41m 06s | Max: 41m 06s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 41m 06s | Avg: 41m 06s | Max: 41m 06s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 78)

# Runner
53 linux-amd64-cpu16
11 linux-amd64-gpu-v100-latest-1
9 windows-amd64-cpu16
4 linux-arm64-cpu16
1 linux-amd64-gpu-h100-latest-1-testing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Review
Development

Successfully merging this pull request may close these issues.

1 participant