Make magma optional #298

mgorny · 2024-12-04T18:32:05Z

Checklist

Used a personal fork of the feedstock to propose changes
Bumped the build number (if the version is unchanged)
Reset the build number to 0 (if the version changed)
Re-rendered with the latest conda-smithy (Use the phrase @conda-forge-admin, please rerender in a comment in this PR for automated rerendering)
Ensured the license file is being packaged.

From the commit message:

Upstream keeps all magma-related routines in a separate
libtorch_cuda_linalg library that is loaded dynamically whenever linalg
functions are used.  Given the library is relatively small, splitting it
makes it possible to provide "magma" and "nomagma" variants that can
be alternated between.

Also:

Try to speed up magma/nomagma builds a bit.  Rather than rebuilding
the package 3 times (possibly switching magma → nomagma → magma again),
build it twice at the very beginning and store the built files for later
reuse in subpackage builds.

While at it, replace the `pip wheel` calls with `setup.py build` to
avoid unnecessarily zipping up and then unpacking the whole thing.
In the end, we are only grabbing a handful of files for `libtorch*`
packages and they are in predictable location in the build directory.
`pip install` remains being used in the final builds for `pytorch`.

Fixes #275

@isuruf helped me a lot with this, particularly with refactoring the builds, so both variants are built in one run.

Upstream keeps all magma-related routines in a separate libtorch_cuda_linalg library that is loaded dynamically whenever linalg functions are used. Given the library is relatively small, splitting it makes it possible to provide "magma" and "nomagma" variants that can be alternated between. Fixes conda-forge#275 Co-authored-by: Isuru Fernando <[email protected]>

…nda-forge-pinning 2024.12.04.13.54.14

Try to speed up magma/nomagma builds a bit. Rather than rebuilding the package 3 times (possibly switching magma → nomagma → magma again), build it twice at the very beginning and store the built files for later reuse in subpackage builds. While at it, replace the `pip wheel` calls with `setup.py build` to avoid unnecessarily zipping up and then unpacking the whole thing. In the end, we are only grabbing a handful of files for `libtorch*` packages and they are in predictable location in the build directory. `pip install` remains being used in the final builds for `pytorch`.

conda-forge-admin · 2024-12-04T18:33:41Z

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found it was in an excellent condition.

I do have some suggestions for making it better though...

For recipe/meta.yaml:

ℹ️ It looks like the 'libtorch-cuda-linalg' output doesn't have any tests.
ℹ️ The recipe is not parsable by parser conda-souschef (grayskull). Your recipe may not receive automatic updates and/or may not be compatible with conda-forge's infrastructure. Please check the logs for more information and ensure your recipe can be parsed.
ℹ️ The recipe is not parsable by parser conda-recipe-manager. Your recipe may not receive automatic updates and/or may not be compatible with conda-forge's infrastructure. Please check the logs for more information and ensure your recipe can be parsed.

_{This message was generated by GitHub Actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/12202823840. Examine the logs at this URL for more detail.}

h-vetinari · 2024-12-04T21:38:04Z

Since this is still WIP, I started only a single job on linux for now (x64+MKL+CUDA)

mgorny · 2024-12-05T12:21:30Z

Since this is still WIP, I started only a single job on linux for now (x64+MKL+CUDA)

Thanks. It seems to have failed only because it's adding new outputs. Do you want me to file the admin request for allowing libtorch-cuda-linalg, or do you prefer reviewing the changes first?

h-vetinari · 2024-12-06T07:43:37Z

Do you want me to file the admin request for allowing libtorch-cuda-linalg

That would be great!

Added in conda-forge/pytorch-cpu-feedstock#298.

mgorny · 2024-12-06T13:28:11Z

Filed as conda-forge/admin-requests#1209.

…onda-forge-pinning 2024.12.06.16.06.14

The test currently refused to even start, since not all dependencies were satisfied.

Put all the rules in a single file. In the end, build_common.sh has pytorch-conditional code at the very end anyway, and keeping the code split like this only makes it harder to notice mistakes.

hmaarrfk · 2024-12-07T20:30:20Z

recipe/README.md

+   that are independent of selected Python version and are therefore shared
+   by all Python versions.
+
+2. `libtorch-cuda-linalg` that provides the shared `libtorch_cuda_linalg.so`


is this the preffered one?

as in, is it preffered that users make use of linalg?

I ask because we try very hard to ensure that

mamba install pytorch

installs the best hardware optimized one.

Not sure I understand the question.

libtorch_cuda_linalg.so is required for some operations. It is always pulled in by libtorch itself, i.e. the end result is that some version of the library is always installed.

As for magma vs. nomagma, I think we ought to prefer magma. #275 (comment) suggests that cusolver will be faster for some workflows, but the magma build supports both and according to https://pytorch.org/docs/stable/backends.html#torch.backends.cuda.preferred_linalg_library, it has a heuristic to choose the faster backend for given operation.

So who benefits from making this optional?

Sorry if this obvious.

This is #298, i.e. people who want to avoid installing the large magma dependency (~250M in package). Given that libtorch-cuda-linalg is 200k (nomagma) / 245k (magma), I've figured out it's worth it.

The no-magma variant reduces the on-disk install size of a pytorch-gpu install from 7 GB to 5 GB (as #275 says). So this is definitely worth doing.

Given that cusolver is now on average fairly close in performance to magma and that the PyTorch wheels don't include magma, I'd argue that the default should be nomagma (I'd certainly almost always want that for my use cases). However, what the default is is less important than both options being available.

i understsand. it is just not "obvious" thats all.

few users will touch

Is this based on your experience or facts?

The main driving principle with the pytorch package here has been to provide the same features as in upstream conda/wheels so that users who do use our packages are not driven away by mysterious performance issues. With the deprecation of anaconda.org/pytorch channel, we certainly do not want the users of that channel to look at conda-forge and form the opinion that the pytorch packages in conda-forgeare sub-par. Therefore I do strongly prefer using the same options as the upstream wheels which in this case is with magma.

Is this based on your experience or facts?

It's based on @rgommers' statement "10 or so functions, vs. [...] all of libtorch". I don't know pytorch nearly as well as Ralf, so I caveated my point with "assuming their percentage [of those who need it] among users is reasonably small".

IMO it's always a numbers game (and often we don't know the numbers, so we stay conservative; which is fine!) - if we knew, say, that <0.1% of users are affected, then burdening 99.9% users with a useless 2GB download sounds unappealing. The question gets progressively harder the higher we estimate the percentage. Probably anything north of 5% of users, we'd default to performance rather than footprint? I don't have a magic bullet. 🤷

provide the same features as in upstream conda/wheels so that users who do use our packages are not driven away by mysterious performance issues.

I agree with the overall direction of course. We still have a few things to catch up (e.g. cpuarch builds, as IIUC pytorch effectively requires something between x86_64 v3 and v4).

That said, it's certainly technically possible to warn users of a potential performance-cliff where we don't yet support certain upstream optimizations (or require an opt-in for whatever reason). It's mainly a question of how much of that we want to patch and/or upstream.

To be clear, I'm also OK with "fat binaries are preferable to performance degradation or extra maintenance effort". The magma-case is just not so black-and-white IMO. What do you think about the idea of making an exception to link libmagma statically?

To be fair, they're ~10 pretty important linear algebra functions. The default matching upstream is also a good argument. I actually wish that was done more often, so I can't complain about the argument being applied here:) For example, cuDNN version is very performance-sensitive, so I wish that were matched better rather than just taking whatever the latest version is.

Static linking or some other way to reduce libmagma in size would be good to investigate then though, because binary size is also a performance (and usability) issue.

They are pretty important linear algebra functions like lu_solve, so I would say vast majority of users would use magma. Not the other way. For eg: a simple script like below

import torch a = torch.randn(5, 5).cuda() b = torch.randn(5, 5).cuda() torch.linalg.solve(a, b)

would invoke magma. The number of functions provided is a poor indicator of its usage. (but certainly valid about code size)

e.g. cpuarch builds, as IIUC pytorch effectively requires something between x86_64 v3 and v4).

pytorch requires those from the compiler, but not at runtime. fbgemm does runtime cpu dispatching just like openblas, numpy, etc.

What do you think about the idea of making an exception to link libmagma statically?

Why do you think that linking libmagma statically will help? Just because upstream used a static build? One option that's different from upstream is conda-forge/libmagma-feedstock#22

conda-forge-admin · 2024-12-08T03:34:38Z

Hi! This is the friendly automated conda-forge-linting service.

I was trying to look for recipes to lint for you, but it appears we have a merge conflict. Please try to merge or rebase with the base branch to resolve this conflict.

Please ping the 'conda-forge/core' team (using the @ notation in a comment) if you believe this is a bug.

mgorny · 2024-12-08T03:38:37Z

I've added explanation in the README, as well as a fix to install libtorch_python symlink, and missing test dependencies. Tests don't work yet but I'm on a good way to run a subset of them (like upstream CI does).

conda-forge-admin · 2024-12-08T03:42:37Z

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found it was in an excellent condition.

I do have some suggestions for making it better though...

For recipe/meta.yaml:

ℹ️ It looks like the 'libtorch-cuda-linalg' output doesn't have any tests.
ℹ️ The recipe is not parsable by parser conda-souschef (grayskull). Your recipe may not receive automatic updates and/or may not be compatible with conda-forge's infrastructure. Please check the logs for more information and ensure your recipe can be parsed.
ℹ️ The recipe is not parsable by parser conda-recipe-manager. Your recipe may not receive automatic updates and/or may not be compatible with conda-forge's infrastructure. Please check the logs for more information and ensure your recipe can be parsed.

_{This message was generated by GitHub Actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/12318937456. Examine the logs at this URL for more detail.}

While technically upstream uses 2024.2.0, this is causing some of the calls to fail with an error: RuntimeError: MKL FFT error: Intel oneMKL DFTI ERROR: Inconsistent configuration parameters Force <2024 that seems to work better. Fixes conda-forge#301

Enable actually running a fixed random subset (1/5) of core tests to check for packaging-related regressions. We are not running the complete test suite because it takes too long to complete.

mgorny · 2024-12-09T20:33:07Z

I've added a mkl pin to fix #301, and added running a subset of tests. If they turn out to take too much time, we can go for a smaller shard.

Per RuntimeError: Ninja is required to load C++ extensions

mgorny · 2024-12-10T13:51:10Z

I've added a mkl pin to fix #301, and added running a subset of tests. If they turn out to take too much time, we can go for a smaller shard.

Ok, FWICS it adds roughly ~1.5h for a single CI build for osx, so definitely too much (and megabuild will multiply that by 4, I think). I'm going to replace the subset once CI confirms that I've fixed the test failure (so I don't lose it while changing test subset).

mgorny · 2024-12-11T19:10:06Z

Replaced the tests with a hand-picked subset. Pulled in a few fixes for numpy-2 testing.

My next idea on the list is to try making it work with rattler-build.

isuruf · 2024-12-12T12:07:29Z

Do you have the libtorch_cuda_linalg.so artifact without magma somewhere that I can use to check things?

mgorny · 2024-12-12T12:53:21Z

Do you have the libtorch_cuda_linalg.so artifact without magma somewhere that I can use to check things?

Not offhand, but I should have a warm ccache, so I'll get one quickly. However, I've only built for 7.5 CUDA target, is that okay for you?

isuruf · 2024-12-12T12:56:42Z

is that okay for you?

yep

mgorny · 2024-12-12T13:21:24Z

Do you have the libtorch_cuda_linalg.so artifact without magma somewhere that I can use to check things?

Here's both "magma" and "nomagma" variants (they're small): linalgs.zip

While there still doesn't seem to be a clear agreement which builds should be preferred, let's prefer "magma" to keep the current behavior unchanged for end users.

Replace the build number hacks with `track_features` to deprioritize generic BLAS over mkl, and CPU over CUDA. This is mostly intended to simplify stuff before trying to port to rattler-build.

…onda-forge-pinning 2024.12.12.16.25.16

Remove a leftover `skip` that prevented CUDA + generic BLAS build from providing all packages, notably `pytorch`. While at it, remove redundant [win] skip.

mgorny · 2024-12-12T20:07:17Z

Ok, some important stuff in today's update:

I've noticed that I've failed to add mkl/generic to build string for CUDA + generic builds, and some of them were still skipped — with this version, all 4 variants are built correctly now
I've also switched from using disjoint build numbers to prioritize CUDA and mkl, to using track_features

That said, I think the latter means that I should set the build number to be higher than the highest current build number used, is that correct?

recipe/meta.yaml

isuruf · 2024-12-13T04:54:11Z

It seems like with just libtorch_cuda_linalg.so replaced with nomagma variant, you still have torch.cuda.has_magma == True

isuruf · 2024-12-13T04:54:23Z

recipe/build_common.sh

+  libtorch-split)
+    # Call setup.py directly to avoid spending time on unnecessarily
+    # packing and unpacking the wheel.
+    $PREFIX/bin/python setup.py build


Calling setup.py directly is deprecated right?

Depends on who you ask. setuptools itself only deprecated setup.py install so far. Then, the hacks PyTorch do on top of setuptools are all pretty much incompatible with PEP517 (especially sys.argv manipulations).

This reverts commit 4ad7437.

…onda-forge-pinning 2024.12.13.14.31.10

mgorny · 2024-12-13T16:01:37Z

I've reverted the build number and track_features changes.

mgorny · 2024-12-13T16:02:47Z

It seems like with just libtorch_cuda_linalg.so replaced with nomagma variant, you still have torch.cuda.has_magma == True

Hmm, I suppose there's no trivial way around that. We could then multiple all pytorch variants for that but that sounds like an overkill.

mgorny and others added 3 commits December 4, 2024 19:13

MNT: Re-rendered with conda-build 24.9.0, conda-smithy 3.44.9, and co…

da06d90

…nda-forge-pinning 2024.12.04.13.54.14

trigger CI

e48edc0

mgorny added a commit to mgorny/admin-requests that referenced this pull request Dec 6, 2024

Add feedstock output for libtorch-cuda-linalg in pytorch-cpu

b49681e

Added in conda-forge/pytorch-cpu-feedstock#298.

mgorny mentioned this pull request Dec 6, 2024

Add feedstock output for libtorch-cuda-linalg in pytorch-cpu conda-forge/admin-requests#1209

Merged

19 tasks

mgorny added 4 commits December 6, 2024 17:52

Include blas_impl in libtorch-cuda-linalg package build string

55a72e0

Add some explanation on how things work to recipe/README.md

cf410ae

Fix building with CUDA disabled

700e6b4

MNT: Re-rendered with conda-build 24.11.2, conda-smithy 3.44.9, and c…

3bde066

…onda-forge-pinning 2024.12.06.16.06.14

mgorny marked this pull request as ready for review December 6, 2024 16:56

mgorny requested review from Tobias-Fischer, beckermr, benjaminrwilson, hmaarrfk, jeongseok-meta and sodre as code owners December 6, 2024 16:56

mgorny added 3 commits December 6, 2024 20:52

Update test dependencies

9846f68

The test currently refused to even start, since not all dependencies were satisfied.

Move symlinking from build_pytorch.sh to build_common.sh

2843518

Put all the rules in a single file. In the end, build_common.sh has pytorch-conditional code at the very end anyway, and keeping the code split like this only makes it harder to notice mistakes.

Fix creating libtorch_python.so symlink in sitedir

f1c7ec0

hmaarrfk reviewed Dec 7, 2024

View reviewed changes

Explain magma vs. nomagma better in the README

ee8f6d3

Merge remote-tracking branch 'upstream/main' into magma-wip

b8066ee

mgorny added 2 commits December 9, 2024 21:30

Pin mkl to <2024

0e37b1f

While technically upstream uses 2024.2.0, this is causing some of the calls to fail with an error: RuntimeError: MKL FFT error: Intel oneMKL DFTI ERROR: Inconsistent configuration parameters Force <2024 that seems to work better. Fixes conda-forge#301

Run a subset of core tests

ad49de1

Enable actually running a fixed random subset (1/5) of core tests to check for packaging-related regressions. We are not running the complete test suite because it takes too long to complete.

Add ninja to test dependencies

b20a0af

Per RuntimeError: Ninja is required to load C++ extensions

mgorny added 4 commits December 10, 2024 18:08

Use a specific subset of tests

323297c

Add patches to fix testing with numpy-2

276fb8c

Deselect Dynamo tests on py3.13

a51fd2b

Disable fuzzing tests using hypothesis

606ea2d

mgorny added 6 commits December 12, 2024 16:22

Deprioritize nomagma builds

921a959

While there still doesn't seem to be a clear agreement which builds should be preferred, let's prefer "magma" to keep the current behavior unchanged for end users.

Use track_features instead of build numbers to deprioritize

4ad7437

Replace the build number hacks with `track_features` to deprioritize generic BLAS over mkl, and CPU over CUDA. This is mostly intended to simplify stuff before trying to port to rattler-build.

Add missing {{ blas_impl }} to CUDA build strings

0350217

MNT: Re-rendered with conda-build 24.11.2, conda-smithy 3.44.9, and c…

830e948

…onda-forge-pinning 2024.12.12.16.25.16

Remove leftover skips for CUDA + generic BLAS builds

9c2026a

Remove a leftover `skip` that prevented CUDA + generic BLAS build from providing all packages, notably `pytorch`. While at it, remove redundant [win] skip.

Add more underscores to build strings for readability

2eef900

isuruf reviewed Dec 13, 2024

View reviewed changes

recipe/meta.yaml Outdated Show resolved Hide resolved

isuruf reviewed Dec 13, 2024

View reviewed changes

recipe/meta.yaml Outdated Show resolved Hide resolved

isuruf reviewed Dec 13, 2024

View reviewed changes

mgorny added 2 commits December 13, 2024 16:37

Revert "Use track_features instead of build numbers to deprioritize"

43968e9

This reverts commit 4ad7437.

MNT: Re-rendered with conda-build 24.11.2, conda-smithy 3.44.9, and c…

69519ca

…onda-forge-pinning 2024.12.13.14.31.10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make magma optional #298

Make magma optional #298

mgorny commented Dec 4, 2024

conda-forge-admin commented Dec 4, 2024 •

edited

Loading

h-vetinari commented Dec 4, 2024

mgorny commented Dec 5, 2024

h-vetinari commented Dec 6, 2024

mgorny commented Dec 6, 2024

hmaarrfk Dec 7, 2024

mgorny Dec 8, 2024 •

edited

Loading

hmaarrfk Dec 8, 2024

mgorny Dec 8, 2024

rgommers Dec 8, 2024

hmaarrfk Dec 10, 2024

isuruf Dec 10, 2024

h-vetinari Dec 10, 2024

rgommers Dec 10, 2024

isuruf Dec 12, 2024

conda-forge-admin commented Dec 8, 2024

mgorny commented Dec 8, 2024

conda-forge-admin commented Dec 8, 2024 •

edited

Loading

mgorny commented Dec 9, 2024

mgorny commented Dec 10, 2024

mgorny commented Dec 11, 2024

isuruf commented Dec 12, 2024

mgorny commented Dec 12, 2024

isuruf commented Dec 12, 2024

mgorny commented Dec 12, 2024

mgorny commented Dec 12, 2024

isuruf commented Dec 13, 2024

isuruf Dec 13, 2024

mgorny Dec 13, 2024

mgorny commented Dec 13, 2024

mgorny commented Dec 13, 2024

Make magma optional #298

Are you sure you want to change the base?

Make magma optional #298

Conversation

mgorny commented Dec 4, 2024

conda-forge-admin commented Dec 4, 2024 • edited Loading

h-vetinari commented Dec 4, 2024

mgorny commented Dec 5, 2024

h-vetinari commented Dec 6, 2024

mgorny commented Dec 6, 2024

Choose a reason for hiding this comment

mgorny Dec 8, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

conda-forge-admin commented Dec 8, 2024

mgorny commented Dec 8, 2024

conda-forge-admin commented Dec 8, 2024 • edited Loading

mgorny commented Dec 9, 2024

mgorny commented Dec 10, 2024

mgorny commented Dec 11, 2024

isuruf commented Dec 12, 2024

mgorny commented Dec 12, 2024

isuruf commented Dec 12, 2024

mgorny commented Dec 12, 2024

mgorny commented Dec 12, 2024

isuruf commented Dec 13, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mgorny commented Dec 13, 2024

mgorny commented Dec 13, 2024

conda-forge-admin commented Dec 4, 2024 •

edited

Loading

mgorny Dec 8, 2024 •

edited

Loading

conda-forge-admin commented Dec 8, 2024 •

edited

Loading