Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EPIC] Investigate refactoring CuPy to use cuda.parallel #2958

Open
1 task done
jrhemstad opened this issue Nov 25, 2024 · 1 comment
Open
1 task done

[EPIC] Investigate refactoring CuPy to use cuda.parallel #2958

jrhemstad opened this issue Nov 25, 2024 · 1 comment
Assignees
Labels
feature request New feature or request.

Comments

@jrhemstad
Copy link
Collaborator

Is this a duplicate?

Area

cuda.parallel (Python)

Is your feature request related to a problem? Please describe.

Today, CuPy uses Thrust/CUB algorithms to implement much of it's functionality. That works today by precompiling Thrust algorithms for a variety of fixed types. This is undesirable for a few reasons: it increases binary size, it limits exposure of some algorithms (like segmented sort) due to combinatorial type explosion.

cuda.parallel can and should be able to replace any existing use of pre-instantiated Thrust/CUB algorithms and provide a few benefits:

  • Reduce binary size (going to JIT)
  • Custom type support
  • Custom operator support
  • Additional algorithm support (because JIT avoids the type combination problem)

Describe the solution you'd like

To start, we'd like to have an inventory of what Thrust/CUB stuff CuPy is using today and where.

From there, we should investigate how we can use cuda.parallel.reduce_into to replace existing uses of cub::DeviceReduce/thrust::reduce.

Describe alternatives you've considered

No response

Additional context

No response

@jrhemstad jrhemstad added the feature request New feature or request. label Nov 25, 2024
@github-project-automation github-project-automation bot moved this to Todo in CCCL Nov 25, 2024
@jrhemstad jrhemstad changed the title [FEA]: Investigate refactoring CuPy to use cuda.parallel [EPIC] Investigate refactoring CuPy to use cuda.parallel Nov 25, 2024
@shwina
Copy link

shwina commented Nov 28, 2024

To give an idea of the binary size improvement this could bring, here are the sizes of the largest 10 extensions in the CuPy build:

~/workspace/cupy$ find . -type f -name "*.so" -exec ls -lhS {} + | awk '{print $5, $9}' | head -20 | grep -v "./build"

42M ./cupy/cuda/thrust.cpython-310-x86_64-linux-gnu.so
34M ./cupy/cuda/cub.cpython-310-x86_64-linux-gnu.so
7.3M ./cupy/random/_generator_api.cpython-310-x86_64-linux-gnu.so
1.5M ./cupy_backends/cuda/libs/cusparse.cpython-310-x86_64-linux-gnu.so
1.4M ./cupy_backends/cuda/api/runtime.cpython-310-x86_64-linux-gnu.so
1.2M ./cupy/cuda/jitify.cpython-310-x86_64-linux-gnu.so
1.2M ./cupy_backends/cuda/libs/cusolver.cpython-310-x86_64-linux-gnu.so
851K ./cupy/_core/core.cpython-310-x86_64-linux-gnu.so
693K ./cupy/_core/fusion.cpython-310-x86_64-linux-gnu.so
671K ./cupy/cuda/memory.cpython-310-x86_64-linux-gnu.so

cupy.cuda.thrust and cupy.cuda.cub are by far the largest extension modules.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request.
Projects
Status: Todo
Development

No branches or pull requests

2 participants