Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uses unsigned offset types in thrust's scan algorithms #3436

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

elstehle
Copy link
Collaborator

Description

PR #2171 has added support for large number of items to DeviceScan, using unsigned offset types. We want to reflect the switch to unsigned offset types in thrust, so thrust can benefit from future tunings that we do for unsigned offset types.

@elstehle elstehle requested a review from a team as a code owner January 17, 2025 12:04
@elstehle elstehle requested a review from gevtushenko January 17, 2025 12:04
Copy link
Contributor

🟩 CI finished in 1h 51m: Pass: 100%/78 | Total: 2d 03h | Avg: 39m 42s | Max: 1h 08m | Hits: 288%/12760
  • 🟩 cub: Pass: 100%/38 | Total: 1d 06h | Avg: 48m 45s | Max: 1h 08m | Hits: 377%/3540

    🟩 cpu
      🟩 amd64              Pass: 100%/36  | Total:  1d 04h | Avg: 48m 19s | Max:  1h 08m | Hits: 377%/3540  
      🟩 arm64              Pass: 100%/2   | Total:  1h 52m | Avg: 56m 25s | Max: 57m 32s
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  4h 43m | Avg: 56m 43s | Max:  1h 08m | Hits: 377%/885   
      🟩 12.5               Pass: 100%/2   | Total:  2h 10m | Avg:  1h 05m | Max:  1h 08m
      🟩 12.6               Pass: 100%/31  | Total: 23h 58m | Avg: 46m 24s | Max:  1h 06m | Hits: 377%/2655  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  1h 54m | Avg: 57m 14s | Max: 57m 39s
      🟩 nvcc12.0           Pass: 100%/5   | Total:  4h 43m | Avg: 56m 43s | Max:  1h 08m | Hits: 377%/885   
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 10m | Avg:  1h 05m | Max:  1h 08m
      🟩 nvcc12.6           Pass: 100%/29  | Total: 22h 04m | Avg: 45m 40s | Max:  1h 06m | Hits: 377%/2655  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  1h 54m | Avg: 57m 14s | Max: 57m 39s
      🟩 nvcc               Pass: 100%/36  | Total:  1d 04h | Avg: 48m 16s | Max:  1h 08m | Hits: 377%/3540  
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  3h 35m | Avg: 53m 45s | Max: 56m 29s
      🟩 Clang15            Pass: 100%/1   | Total: 57m 28s | Avg: 57m 28s | Max: 57m 28s
      🟩 Clang16            Pass: 100%/1   | Total: 52m 49s | Avg: 52m 49s | Max: 52m 49s
      🟩 Clang17            Pass: 100%/1   | Total: 54m 21s | Avg: 54m 21s | Max: 54m 21s
      🟩 Clang18            Pass: 100%/7   | Total:  5h 29m | Avg: 47m 02s | Max: 57m 39s
      🟩 GCC7               Pass: 100%/2   | Total:  1h 47m | Avg: 53m 48s | Max: 54m 38s
      🟩 GCC8               Pass: 100%/1   | Total: 57m 05s | Avg: 57m 05s | Max: 57m 05s
      🟩 GCC9               Pass: 100%/2   | Total:  1h 49m | Avg: 54m 37s | Max: 54m 50s
      🟩 GCC10              Pass: 100%/1   | Total: 59m 21s | Avg: 59m 21s | Max: 59m 21s
      🟩 GCC11              Pass: 100%/1   | Total: 53m 49s | Avg: 53m 49s | Max: 53m 49s
      🟩 GCC12              Pass: 100%/3   | Total:  1h 38m | Avg: 32m 42s | Max: 53m 42s
      🟩 GCC13              Pass: 100%/8   | Total:  4h 23m | Avg: 32m 56s | Max: 55m 19s
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 15m | Avg:  1h 07m | Max:  1h 08m | Hits: 377%/1770  
      🟩 MSVC14.39          Pass: 100%/2   | Total:  2h 09m | Avg:  1h 04m | Max:  1h 05m | Hits: 377%/1770  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 10m | Avg:  1h 05m | Max:  1h 08m
    🟩 cxx_family
      🟩 Clang              Pass: 100%/14  | Total: 11h 48m | Avg: 50m 38s | Max: 57m 39s
      🟩 GCC                Pass: 100%/18  | Total: 12h 28m | Avg: 41m 35s | Max: 59m 21s
      🟩 MSVC               Pass: 100%/4   | Total:  4h 24m | Avg:  1h 06m | Max:  1h 08m | Hits: 377%/3540  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 10m | Avg:  1h 05m | Max:  1h 08m
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 44m 24s | Avg: 22m 12s | Max: 25m 02s
      🟩 v100               Pass: 100%/36  | Total:  1d 06h | Avg: 50m 13s | Max:  1h 08m | Hits: 377%/3540  
    🟩 jobs
      🟩 Build              Pass: 100%/31  | Total:  1d 04h | Avg: 55m 05s | Max:  1h 08m | Hits: 377%/3540  
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 18m 15s | Avg: 18m 15s | Max: 18m 15s
      🟩 GraphCapture       Pass: 100%/1   | Total: 15m 10s | Avg: 15m 10s | Max: 15m 10s
      🟩 HostLaunch         Pass: 100%/3   | Total: 56m 04s | Avg: 18m 41s | Max: 19m 46s
      🟩 TestGPU            Pass: 100%/2   | Total: 55m 17s | Avg: 27m 38s | Max: 27m 54s
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 44m 24s | Avg: 22m 12s | Max: 25m 02s
      🟩 90a                Pass: 100%/1   | Total: 22m 55s | Avg: 22m 55s | Max: 22m 55s
    🟩 std
      🟩 17                 Pass: 100%/14  | Total: 13h 37m | Avg: 58m 24s | Max:  1h 08m | Hits: 377%/2655  
      🟩 20                 Pass: 100%/24  | Total: 17h 14m | Avg: 43m 07s | Max:  1h 05m | Hits: 376%/885   
    
  • 🟩 thrust: Pass: 100%/37 | Total: 19h 55m | Avg: 32m 19s | Max: 1h 07m | Hits: 253%/9220

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 37m 35s | Avg: 18m 47s | Max: 26m 17s
    🟩 cpu
      🟩 amd64              Pass: 100%/35  | Total: 18h 58m | Avg: 32m 32s | Max:  1h 07m | Hits: 253%/9220  
      🟩 arm64              Pass: 100%/2   | Total: 56m 51s | Avg: 28m 25s | Max: 29m 55s
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  3h 01m | Avg: 36m 17s | Max: 53m 20s | Hits: 225%/1844  
      🟩 12.5               Pass: 100%/2   | Total:  1h 48m | Avg: 54m 26s | Max: 56m 46s
      🟩 12.6               Pass: 100%/30  | Total: 15h 05m | Avg: 30m 10s | Max:  1h 07m | Hits: 261%/7376  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 52m 43s | Avg: 26m 21s | Max: 27m 38s
      🟩 nvcc12.0           Pass: 100%/5   | Total:  3h 01m | Avg: 36m 17s | Max: 53m 20s | Hits: 225%/1844  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  1h 48m | Avg: 54m 26s | Max: 56m 46s
      🟩 nvcc12.6           Pass: 100%/28  | Total: 14h 12m | Avg: 30m 27s | Max:  1h 07m | Hits: 261%/7376  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 52m 43s | Avg: 26m 21s | Max: 27m 38s
      🟩 nvcc               Pass: 100%/35  | Total: 19h 03m | Avg: 32m 39s | Max:  1h 07m | Hits: 253%/9220  
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  2h 09m | Avg: 32m 18s | Max: 33m 52s
      🟩 Clang15            Pass: 100%/1   | Total: 33m 50s | Avg: 33m 50s | Max: 33m 50s
      🟩 Clang16            Pass: 100%/1   | Total: 33m 07s | Avg: 33m 07s | Max: 33m 07s
      🟩 Clang17            Pass: 100%/1   | Total: 30m 26s | Avg: 30m 26s | Max: 30m 26s
      🟩 Clang18            Pass: 100%/7   | Total:  2h 42m | Avg: 23m 15s | Max: 31m 42s
      🟩 GCC7               Pass: 100%/2   | Total: 59m 38s | Avg: 29m 49s | Max: 30m 00s
      🟩 GCC8               Pass: 100%/1   | Total: 30m 41s | Avg: 30m 41s | Max: 30m 41s
      🟩 GCC9               Pass: 100%/2   | Total:  1h 04m | Avg: 32m 15s | Max: 33m 58s
      🟩 GCC10              Pass: 100%/1   | Total: 32m 37s | Avg: 32m 37s | Max: 32m 37s
      🟩 GCC11              Pass: 100%/1   | Total: 32m 29s | Avg: 32m 29s | Max: 32m 29s
      🟩 GCC12              Pass: 100%/1   | Total: 31m 58s | Avg: 31m 58s | Max: 31m 58s
      🟩 GCC13              Pass: 100%/8   | Total:  2h 49m | Avg: 21m 09s | Max: 33m 39s
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 53m | Avg: 56m 31s | Max: 59m 43s | Hits: 226%/3688  
      🟩 MSVC14.39          Pass: 100%/3   | Total:  2h 43m | Avg: 54m 26s | Max:  1h 07m | Hits: 272%/5532  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  1h 48m | Avg: 54m 26s | Max: 56m 46s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/14  | Total:  6h 29m | Avg: 27m 49s | Max: 33m 52s
      🟩 GCC                Pass: 100%/16  | Total:  7h 01m | Avg: 26m 19s | Max: 33m 58s
      🟩 MSVC               Pass: 100%/5   | Total:  4h 36m | Avg: 55m 16s | Max:  1h 07m | Hits: 253%/9220  
      🟩 NVHPC              Pass: 100%/2   | Total:  1h 48m | Avg: 54m 26s | Max: 56m 46s
    🟩 gpu
      🟩 v100               Pass: 100%/37  | Total: 19h 55m | Avg: 32m 19s | Max:  1h 07m | Hits: 253%/9220  
    🟩 jobs
      🟩 Build              Pass: 100%/31  | Total: 18h 26m | Avg: 35m 41s | Max:  1h 07m | Hits: 226%/7376  
      🟩 TestCPU            Pass: 100%/3   | Total: 53m 44s | Avg: 17m 54s | Max: 38m 09s | Hits: 365%/1844  
      🟩 TestGPU            Pass: 100%/3   | Total: 35m 38s | Avg: 11m 52s | Max: 13m 57s
    🟩 sm
      🟩 90a                Pass: 100%/1   | Total: 17m 21s | Avg: 17m 21s | Max: 17m 21s
    🟩 std
      🟩 17                 Pass: 100%/14  | Total:  8h 54m | Avg: 38m 10s | Max: 59m 43s | Hits: 226%/5532  
      🟩 20                 Pass: 100%/21  | Total: 10h 23m | Avg: 29m 42s | Max:  1h 07m | Hits: 295%/3688  
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 8m 49s | Avg: 4m 24s | Max: 6m 51s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total:  8m 49s | Avg:  4m 24s | Max:  6m 51s
    🟩 ctk
      🟩 12.6               Pass: 100%/2   | Total:  8m 49s | Avg:  4m 24s | Max:  6m 51s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/2   | Total:  8m 49s | Avg:  4m 24s | Max:  6m 51s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total:  8m 49s | Avg:  4m 24s | Max:  6m 51s
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total:  8m 49s | Avg:  4m 24s | Max:  6m 51s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total:  8m 49s | Avg:  4m 24s | Max:  6m 51s
    🟩 gpu
      🟩 v100               Pass: 100%/2   | Total:  8m 49s | Avg:  4m 24s | Max:  6m 51s
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  1m 58s | Avg:  1m 58s | Max:  1m 58s
      🟩 Test               Pass: 100%/1   | Total:  6m 51s | Avg:  6m 51s | Max:  6m 51s
    
  • 🟩 python: Pass: 100%/1 | Total: 40m 19s | Avg: 40m 19s | Max: 40m 19s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 40m 19s | Avg: 40m 19s | Max: 40m 19s
    🟩 ctk
      🟩 12.6               Pass: 100%/1   | Total: 40m 19s | Avg: 40m 19s | Max: 40m 19s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/1   | Total: 40m 19s | Avg: 40m 19s | Max: 40m 19s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 40m 19s | Avg: 40m 19s | Max: 40m 19s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 40m 19s | Avg: 40m 19s | Max: 40m 19s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 40m 19s | Avg: 40m 19s | Max: 40m 19s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 40m 19s | Avg: 40m 19s | Max: 40m 19s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 40m 19s | Avg: 40m 19s | Max: 40m 19s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
+/- Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 78)

# Runner
53 linux-amd64-cpu16
11 linux-amd64-gpu-v100-latest-1
9 windows-amd64-cpu16
4 linux-arm64-cpu16
1 linux-amd64-gpu-h100-latest-1-testing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Review
Development

Successfully merging this pull request may close these issues.

1 participant