Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DRAFT] Added magma conda build scripts for ROCm #27

Draft
wants to merge 45 commits into
base: main
Choose a base branch
from

Conversation

jataylo
Copy link

@jataylo jataylo commented Mar 22, 2023

Added a draft PR for magma conda build to facilitate discussion. Presently I overwrite the magma dir with the changes, I can change this before considering merging.

I have tested building this package locally (the tars are too big to store here but logs are stored in magma-rocm5.3 and magma-rocm5.4)

Testing

  • Build the conda packages with make magma-rocm5.3 or make magma-rocm5.4.
  • Use a many linux image and extract and move magma to /opt/rocm (as done in this PRs common/install_magma.sh mimicing the way CUDA handles this) - uploaded as rocm/pytorch-private:manylinux-rocm5.4-131-magma-wheel-test
  • Proceed with wheel building as usual, install wheels in base ubuntu env and run relevant unit tests

Unit Test

PYTORCH_TEST_WITH_ROCM=1 python3.8 test/run_test.py -v -i "test_linalg" --continue-through-error
Ran 741 tests in 404.818s

OK (skipped=69)

What's next
@pruthvistony @jithunnair-amd
This gives a base for us to now make any changes to get this ready. We need to decide on

  • How will we version our packages (magma-rocm5.4, magma-rocm5.3 currently) and should each have their own commit?
  • We need to build/upload the conda packages and update install_magma.sh to test a CI run.

KyleCZH and others added 30 commits March 22, 2022 19:07
* add dependencies for rocm5.1

* install miopen dependencies using cmake

* change lib path in build_rocm.sh for rocm5.1

* change amdgpu_version for rocm5.1

* remove 4.5.2 and add 5.1
[ROCm] add dependencies for rocm5.2
* change the MAYBE_LIB64 path for just for 5.2
* Changes to support ROCm 5.3

* Updated as per comments
- In ROCm 5.3 libtorch build are failing during magma build due to
  to missing python binary so added install statement
* Updating the condition for noRCCL build

* Updated changes as per comments
…rocm_fork

Disable MLIR backend when building MIOpen
jithunnair-amd and others added 14 commits October 5, 2022 02:07
…taging_branch

Use staging branch of MIOpen for ROCm5.3
* Update to so patching for ROCm

Wildcard used in grep to grab the actual numbered so file referenced
in patchelf. This allows the removal of specifying the so number in
DEPS_LIST & DEPS_SONAME

This commit also adds the functionality for trimming so names to
build_libtorch.sh from build_common.sh

* Refactor to remove switch statement in build_rocm.sh

This commit refactors build_rocm.sh and brings in a few major updates:
 - No longer required to specify the full .so name (with number) for ROCm libraries
       - The .so versions are copied and the patching code will fix the links to point to this version
 - No longer required to specify paths for ROCm libraries allowing the removal of the large switch
       - Paths are acquired programmatically with find
 - No longer required to specify both the path and filename for the OS specific libraries
       - Programatically extract file name from the path
 - Automatically extract Tensile/Kernels files for the architectures specified in PYTORCH_ROCM_ARCH
   and any non-arch specific files e.g. TensileLibrary.dat
* Remove miopen custom build step

* Bundle MIOpen db files in wheel

* Correct path
* Use libtinfo.so.6 for Ubuntu 2004

* Fix to origname grep

* Condition on ROCM_VERSION for libtinfo6
We require the same fix that was made on upstream pytorch
pytorch/pytorch#91371
ROCm/pytorch@b72ec7c

Without this change install_conda.sh stage fails
```
#21 6.254 CondaFileIOError: '/opt/conda/pkgs/envs/*/env.txt'. [Errno 2] No such file or directory: '/opt/conda/pkgs/envs/*/env.txt'
#21 6.254 
#21 ERROR: executor failed running [/bin/sh -c bash ./install_conda.sh && rm install_conda.sh]: exit code: 1
------
 > [conda 2/3] RUN bash ./install_conda.sh && rm install_conda.sh:
------
executor failed running [/bin/sh -c bash ./install_conda.sh && rm install_conda.sh]: exit code: 1
```

Locally tested with the `/builder/libtorch/build_docker.sh`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants