Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: [DO NOT MERGE] introduce libcuml wheels #6199

Draft
wants to merge 71 commits into
base: branch-25.02
Choose a base branch
from

Conversation

jameslamb
Copy link
Member

@jameslamb jameslamb commented Dec 30, 2024

Replaces #6006, contributes to rapidsai/build-planning#33.

Proposes packaging libcuml as a wheel, which is then re-used by cuml-cu{11,12} wheels.

Blocked by rapidsai/cuvs#594

Notes for Reviewers

Benefits of these changes

Wheel contents

libcuml:

  • libcuml++.so (shared library) and its headers
  • libcumlprims_mg.so (shared library) and its headers
  • other vendored dependencies (CCCL, fmt)

cuml:

  • cuml Python / Cython code and compiled Cython extensions

Dependency Flows

In short.... libcuml contains libcuml.so and libcumlprims_mg.so dynamic libraries and the headers to link against them.

  • Anything that needs to link against cuML at build time pulls in libcugraph wheels as a build dependency.
  • Anything that needs cuML's symbols at runtime pulls it in as a runtime dependency, and calls libcuml.load_library().

For more details and some flowcharts, see rapidsai/build-planning#33 (comment)

Size changes (CUDA 12, Python 3.12, x86_64)

wheel num files (before) num files (this PR) size (before) size (this PR)
libcuml --- 1766 --- 289M
cuml 442 441 527M 9M
TOTAL 442 2207 527M 298M

NOTES: size = compressed, "before" = 2025-01-22 nightlies

how I calculated those (click me)
docker run \
    --rm \
    --network host \
    --env RAPIDS_NIGHTLY_DATE=2025-01-22 \
    --env CUML_NIGHTLY_SHA=01e19bba9821954b062a04fbf31d3522afa4b0b1 \
    --env CUML_PR="pull-request/6199" \
    --env CUML_PR_SHA="9d5100ec4589e20230a31817518427efa1e49c6d" \
    --env RAPIDS_PY_CUDA_SUFFIX=cu12 \
    --env WHEEL_DIR_BEFORE=/tmp/wheels-before \
    --env WHEEL_DIR_AFTER=/tmp/wheels-after \
    -it rapidsai/ci-wheel:cuda12.5.1-rockylinux8-py3.12 \
    bash

# --- nightly wheels --- #
mkdir -p ./wheels-before

export RAPIDS_BUILD_TYPE=branch
export RAPIDS_REF_NAME="branch-25.02"

# cuml
RAPIDS_PY_WHEEL_NAME="cuml_${RAPIDS_PY_CUDA_SUFFIX}" \
RAPIDS_REPOSITORY=rapidsai/cuml \
RAPIDS_SHA=${CUML_NIGHTLY_SHA} \
    rapids-download-wheels-from-s3 python ./wheels-before

# --- wheels from CI --- #
mkdir -p ./wheels-after

export RAPIDS_BUILD_TYPE="pull-request"

# libcuml
RAPIDS_PY_WHEEL_NAME="libcuml_${RAPIDS_PY_CUDA_SUFFIX}" \
RAPIDS_REPOSITORY=rapidsai/cuml \
RAPIDS_REF_NAME="${CUML_PR}" \
RAPIDS_SHA="${CUML_PR_SHA}" \
    rapids-download-wheels-from-s3 cpp ./wheels-after

# cuml
RAPIDS_PY_WHEEL_NAME="cuml_${RAPIDS_PY_CUDA_SUFFIX}" \
RAPIDS_REPOSITORY=rapidsai/cuml \
RAPIDS_REF_NAME="${CUML_PR}" \
RAPIDS_SHA="${CUML_PR_SHA}" \
    rapids-download-wheels-from-s3 python ./wheels-after

pip install pydistcheck
pydistcheck \
    --inspect \
    --select 'distro-too-large-compressed' \
    ./wheels-before/*.whl \
| grep -E '^checking|files: | compressed' \
> ./before.txt

# get more exact sizes
du -sh ./wheels-before/*

pydistcheck \
    --inspect \
    --select 'distro-too-large-compressed' \
    ./wheels-after/*.whl \
| grep -E '^checking|files: | compressed' \
> ./after.txt

# get more exact sizes
du -sh ./wheels-after/*

How I tested this

These other PRs:

@jameslamb jameslamb added 2 - In Progress Currenty a work in progress 5 - DO NOT MERGE Hold off on merging; see PR for details labels Dec 30, 2024

This comment was marked as resolved.

@github-actions github-actions bot removed the CUDA/C++ label Dec 31, 2024
@github-actions github-actions bot added conda conda issue CUDA/C++ labels Jan 2, 2025
@rapidsai rapidsai deleted a comment from bdice Jan 17, 2025
python/cuml/CMakeLists.txt Outdated Show resolved Hide resolved
ci/validate_wheel.sh Outdated Show resolved Hide resolved
python/cuml/CMakeLists.txt Outdated Show resolved Hide resolved
@jameslamb
Copy link
Member Author

/ok to test

# --- cumlprims_mg --- #
# ship cumlprims_mg in the 'libcuml' wheel (for re-use by 'cuml' wheels)
set(CUML_USE_CUMLPRIMS_MG_STATIC OFF)
set(CUML_EXCLUDE_CUMLPRIMS_MG_FROM_ALL OFF)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proposing bundling libcumlprims_mg.so + its headers into libcuml wheels:

  • so libcuml++.so and the Cython extensions in cuml share 1 instance of this library
  • to avoid needing to compile cumlprims_mg for both libcuml and cuml wheel builds

For conda packages, this isn't necessary because we distributed separate libcumlprims_mg conda packages that can be dynamically linked to.

We don't have the equivalent wheels and I don't think they'd be worth doing (given that they're only needed for cuML).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This plan seems fine to me. It reduces binary size and compile time, and provides the same behavior we already have for conda packages (except this is one wheel, rather than two conda packages).

ci/build_wheel_cuml.sh Show resolved Hide resolved
# --- cumlprims_mg --- #
# ship cumlprims_mg in the 'libcuml' wheel (for re-use by 'cuml' wheels)
set(CUML_USE_CUMLPRIMS_MG_STATIC OFF)
set(CUML_EXCLUDE_CUMLPRIMS_MG_FROM_ALL OFF)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This plan seems fine to me. It reduces binary size and compile time, and provides the same behavior we already have for conda packages (except this is one wheel, rather than two conda packages).

Returns ``None`` if the library cannot be loaded.
"""
# cumlprims_mg installs to lib/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a bug in cumlprims_mg's CMake?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Yes it is.

Let's wait for https://github.com/rapidsai/cumlprims_mg/pull/223 to merge, and then simplify this logic to use lib64/.

python/libcuml/libcuml/load.py Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2 - In Progress Currenty a work in progress 5 - DO NOT MERGE Hold off on merging; see PR for details ci CMake conda conda issue CUDA/C++ Cython / Python Cython or Python issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants