You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Today, CuPy uses Thrust/CUB algorithms to implement much of it's functionality. That works today by precompiling Thrust algorithms for a variety of fixed types. This is undesirable for a few reasons: it increases binary size, it limits exposure of some algorithms (like segmented sort) due to combinatorial type explosion.
cuda.parallel can and should be able to replace any existing use of pre-instantiated Thrust/CUB algorithms and provide a few benefits:
Reduce binary size (going to JIT)
Custom type support
Custom operator support
Additional algorithm support (because JIT avoids the type combination problem)
Describe the solution you'd like
To start, we'd like to have an inventory of what Thrust/CUB stuff CuPy is using today and where.
From there, we should investigate how we can use cuda.parallel.reduce_into to replace existing uses of cub::DeviceReduce/thrust::reduce.
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered:
jrhemstad
changed the title
[FEA]: Investigate refactoring CuPy to use cuda.parallel
[EPIC] Investigate refactoring CuPy to use cuda.parallel
Nov 25, 2024
Is this a duplicate?
Area
cuda.parallel (Python)
Is your feature request related to a problem? Please describe.
Today, CuPy uses Thrust/CUB algorithms to implement much of it's functionality. That works today by precompiling Thrust algorithms for a variety of fixed types. This is undesirable for a few reasons: it increases binary size, it limits exposure of some algorithms (like segmented sort) due to combinatorial type explosion.
cuda.parallel
can and should be able to replace any existing use of pre-instantiated Thrust/CUB algorithms and provide a few benefits:Describe the solution you'd like
To start, we'd like to have an inventory of what Thrust/CUB stuff CuPy is using today and where.
From there, we should investigate how we can use
cuda.parallel.reduce_into
to replace existing uses of cub::DeviceReduce/thrust::reduce.Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: