Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add magma conda package install support via MAGMA_PACKAGE_SOURCE #30

Draft
wants to merge 50 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
df95cf7
[ROCm] add dependencies for rocm5.1 (#1)
KyleCZH Mar 23, 2022
c44dfc0
[ROCm] install custom libdrm to silence missing amdgpu.ids warning
jeffdaily Apr 13, 2022
102ca3a
copy aux file amdgpu.ids
jeffdaily Apr 13, 2022
628d2ef
begin search for amdgpu.ids from install root, not lib
jeffdaily Apr 18, 2022
671b088
Merge pull request #2 from KyleCZH/libdrm_rocm_fix
jeffdaily Apr 19, 2022
4eb0de6
Merge branch 'main' into rocm_ifu_20220516
jeffdaily May 16, 2022
96c3fbd
Merge pull request #3 from ROCmSoftwarePlatform/rocm_ifu_20220516
jeffdaily May 16, 2022
558113c
refactor manywheel/build_rocm.sh
jeffdaily May 12, 2022
952e042
update manywheel/build_rocm.sh for ROCm 5.2
jeffdaily May 12, 2022
3474f5d
update scripts and Dockerfile for ROCm 5.2
jeffdaily May 12, 2022
87e5cef
update rocblas aux file list
jeffdaily May 13, 2022
7ee79d2
Merge pull request #4 from ROCmSoftwarePlatform/rocm_52_fork
jeffdaily May 16, 2022
156f24a
git224-core no longer available, use git236-core
jeffdaily May 25, 2022
d211ae1
add ROCm 5.1.3 AMDGPU support
jeffdaily May 25, 2022
af4f26f
Add 5.1.3 support for libtorch
pruthvistony May 26, 2022
0134ed6
update libamd_comgr.so path for ROCm5.2 (#5)
aspamidi Jun 8, 2022
36a5684
Add AMDGPU for ROCm5.2
jithunnair-amd Jun 29, 2022
0238135
Merge branch 'main' into main_rocm
jeffdaily Aug 11, 2022
b086528
reconcile common/install_rocm_drm.sh with upstream
jeffdaily Aug 11, 2022
68bacb6
Updates to support rocm5.3 wheel builds (#6)
pruthvistony Sep 1, 2022
fe0f987
Installing python before magma build
pruthvistony Sep 1, 2022
02688de
Move python install to libtorch/Dockerfile (#8)
jithunnair-amd Sep 12, 2022
5b83342
Merge remote-tracking branch 'origin/main' into IFU-main-2022-09-15
jithunnair-amd Sep 15, 2022
6b5b73e
Updating the condition for noRCCL build (#9)
pruthvistony Sep 22, 2022
8fc92e6
Merge pull request #10 from ROCmSoftwarePlatform/IFU-main-2022-09-15
jithunnair-amd Sep 22, 2022
5a0ae80
Disable MLIR when building MIOpen
jithunnair-amd Oct 4, 2022
bc6fe73
Merge pull request #11 from ROCmSoftwarePlatform/disable_miopen_mlir_…
jithunnair-amd Oct 5, 2022
42f213f
Use MIOpen branch for ROCm5.3; Change all conditions to -eq
jithunnair-amd Oct 5, 2022
0468db5
Merge pull request #12 from ROCmSoftwarePlatform/rocm5.3
jithunnair-amd Oct 5, 2022
c4d81fe
Use staging branch of MIOpen for ROCm5.3
jithunnair-amd Oct 5, 2022
55db5ba
Merge pull request #13 from ROCmSoftwarePlatform/use_miopen_rocm5.3_s…
jithunnair-amd Oct 5, 2022
594e241
Allow ROCm minor releases to use the same MIOpen branch as the major …
jithunnair-amd Oct 24, 2022
9b4cdd0
correct logic to ensure rocm5.4 doesn't fall in wrong condition
jithunnair-amd Oct 24, 2022
39f492a
Update for rocm5.4 branch
jithunnair-amd Oct 27, 2022
eac0e77
Update amdgpu repo url for ROCm5.3
jithunnair-amd Oct 27, 2022
a6f4604
Refactor wheel and libtorch build scripts (#7)
jataylo Oct 27, 2022
6dd80e4
rocfft/hipfft link to libhiprtc.so in ROCm5.4 (#15)
jithunnair-amd Oct 31, 2022
1cffc92
Disable MIOpen build from source for PyTorch wheels (#16)
jithunnair-amd Dec 1, 2022
7a44689
libtinfo.so version update and logic fix (#19)
jataylo Jan 19, 2023
fdad948
Fix conda install on distributions with strict POSIX sh (#18)
jataylo Jan 24, 2023
21cc230
Conditionalise librocfft-device so's out of rocm5.5 (#22)
jithunnair-amd Feb 14, 2023
c0632f3
Exit on missing file in build_rocm.sh (#23)
jataylo Feb 17, 2023
ae59a10
Merge remote-tracking branch 'upstream/main' into IFU-main-2023-03-03
jataylo Mar 3, 2023
24f9208
Remove building magma from source
jataylo Mar 3, 2023
23be9d1
Revert
jataylo Mar 6, 2023
17b7478
Remove rocm5.1 rocm5.2 from libtorch Dockerfile
jataylo Mar 7, 2023
82ccc1c
Add miopen install rpm script (#26)
jataylo Mar 10, 2023
009ed17
Merge branch 'upstream_main' into IFU-main-2023-03-24
pruthvistony Mar 28, 2023
be17939
Merge remote-tracking branch 'rocm_fork/IFU-main-2023-03-24' into roc…
jithunnair-amd Apr 11, 2023
9f1167f
Add magma conda package install support via MAGMA_PACKAGE_SOURCE
jataylo May 3, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion common/install_rocm.sh
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ ver() {
}

# Map ROCm version to AMDGPU version
declare -A AMDGPU_VERSIONS=( ["5.0"]="21.50" ["5.1.1"]="22.10.1" ["5.2"]="22.20" )
declare -A AMDGPU_VERSIONS=( ["5.0"]="21.50" ["5.1.1"]="22.10.1" ["5.1.3"]="22.10.3" ["5.2"]="22.20" )

install_ubuntu() {
apt-get update
Expand Down
118 changes: 80 additions & 38 deletions common/install_rocm_magma.sh
Original file line number Diff line number Diff line change
@@ -1,44 +1,86 @@
#!/bin/bash

# TODO upstream differences from this file is into the (eventual) one in pytorch
# - (1) check for static lib mkl
# - (2) MKLROOT as env var

set -ex

# TODO (2)
MKLROOT=${MKLROOT:-/opt/intel}

# "install" hipMAGMA into /opt/rocm/magma by copying after build
git clone https://bitbucket.org/icl/magma.git
pushd magma
# fix for magma_queue memory leak issue
git checkout c62d700d880c7283b33fb1d615d62fc9c7f7ca21
cp make.inc-examples/make.inc.hip-gcc-mkl make.inc
echo 'LIBDIR += -L$(MKLROOT)/lib' >> make.inc
# TODO (1)
if [[ -f "${MKLROOT}/lib/libmkl_core.a" ]]; then
echo 'LIB = -Wl,--start-group -lmkl_gf_lp64 -lmkl_gnu_thread -lmkl_core -Wl,--end-group -lpthread -lstdc++ -lm -lgomp -lhipblas -lhipsparse' >> make.inc
fi
echo 'LIB += -Wl,--enable-new-dtags -Wl,--rpath,/opt/rocm/lib -Wl,--rpath,$(MKLROOT)/lib -Wl,--rpath,/opt/rocm/magma/lib -ldl' >> make.inc
echo 'DEVCCFLAGS += --gpu-max-threads-per-block=256' >> make.inc
export PATH="${PATH}:/opt/rocm/bin"
if [[ -n "$PYTORCH_ROCM_ARCH" ]]; then
amdgpu_targets=`echo $PYTORCH_ROCM_ARCH | sed 's/;/ /g'`
function install_magma_source() {
# TODO upstream differences from this file is into the (eventual) one in pytorch
# - (1) check for static lib mkl
# - (2) MKLROOT as env var

# TODO (2)
MKLROOT=${MKLROOT:-/opt/intel}

# "install" hipMAGMA into /opt/rocm/magma by copying after build
git clone https://bitbucket.org/icl/magma.git
pushd magma
# fix for magma_queue memory leak issue
git checkout c62d700d880c7283b33fb1d615d62fc9c7f7ca21
cp make.inc-examples/make.inc.hip-gcc-mkl make.inc
echo 'LIBDIR += -L$(MKLROOT)/lib' >> make.inc
# TODO (1)
if [[ -f "${MKLROOT}/lib/libmkl_core.a" ]]; then
echo 'LIB = -Wl,--start-group -lmkl_gf_lp64 -lmkl_gnu_thread -lmkl_core -Wl,--end-group -lpthread -lstdc++ -lm -lgomp -lhipblas -lhipsparse' >> make.inc
fi
echo 'LIB += -Wl,--enable-new-dtags -Wl,--rpath,/opt/rocm/lib -Wl,--rpath,$(MKLROOT)/lib -Wl,--rpath,/opt/rocm/magma/lib -ldl' >> make.inc
echo 'DEVCCFLAGS += --gpu-max-threads-per-block=256' >> make.inc
export PATH="${PATH}:/opt/rocm/bin"
if [[ -n "$PYTORCH_ROCM_ARCH" ]]; then
amdgpu_targets=`echo $PYTORCH_ROCM_ARCH | sed 's/;/ /g'`
else
amdgpu_targets=`rocm_agent_enumerator | grep -v gfx000 | sort -u | xargs`
fi
for arch in $amdgpu_targets; do
echo "DEVCCFLAGS += --amdgpu-target=$arch" >> make.inc
done
# hipcc with openmp flag may cause isnan() on __device__ not to be found; depending on context, compiler may attempt to match with host definition
sed -i 's/^FOPENMP/#FOPENMP/g' make.inc
make -f make.gen.hipMAGMA -j $(nproc)
LANG=C.UTF-8 make lib/libmagma.so -j $(nproc) MKLROOT="${MKLROOT}"
make testing/testing_dgemm -j $(nproc) MKLROOT="${MKLROOT}"
popd
mkdir -p /opt/rocm/magma
mv magma/include /opt/rocm/magma
mv magma/lib /opt/rocm/magma
rm -rf magma
}

function install_magma_package() {

MAGMA_VERSION="2.6.2"
rocm_path="/opt/rocm"
tmp_dir=$(mktemp -d)
pushd ${tmp_dir}
wget --no-check-certificate -q $MAGMA_PACKAGE_SOURCE
tar -xf *.tar*
mkdir -p "${rocm_path}/magma"

if [ -e "$tmp_dir/magma/include" ]; then
mv "$tmp_dir/magma/include" "${rocm_path}/magma/include"
echo "Successfully installed MAGMA include files to ${rocm_path}/magma/include"
else
echo "Error: MAGMA include files not found in $tmp_dir/magma/include"
fi

if [ -e "$tmp_dir/magma/lib/" ]; then
mv "$tmp_dir/magma/lib/" "${rocm_path}/magma/lib"
echo "Successfully installed MAGMA library files to ${rocm_path}/magma/lib"
else
echo "Error: MAGMA library file not found in $tmp_dir/magma/lib"
fi

if [ ! -d "${rocm_path}/magma" ]; then
echo "Error: MAGMA installation failed"
exit 1
fi

popd
rm -rf $tmp_dir
}

if [ -z "$MAGMA_PACKAGE_SOURCE" ]; then
echo "MAGMA_PACKAGE_SOURCE is not set, building magma from source"
install_magma_source
else
amdgpu_targets=`rocm_agent_enumerator | grep -v gfx000 | sort -u | xargs`
echo "MAGMA_PACKAGE_SOURCE is set, installing magma from source"
install_magma_package
fi
for arch in $amdgpu_targets; do
echo "DEVCCFLAGS += --amdgpu-target=$arch" >> make.inc
done
# hipcc with openmp flag may cause isnan() on __device__ not to be found; depending on context, compiler may attempt to match with host definition
sed -i 's/^FOPENMP/#FOPENMP/g' make.inc
make -f make.gen.hipMAGMA -j $(nproc)
LANG=C.UTF-8 make lib/libmagma.so -j $(nproc) MKLROOT="${MKLROOT}"
make testing/testing_dgemm -j $(nproc) MKLROOT="${MKLROOT}"
popd
mkdir -p /opt/rocm/magma
mv magma/include /opt/rocm/magma
mv magma/lib /opt/rocm/magma
rm -rf magma

37 changes: 37 additions & 0 deletions common/install_rocm_miopen_rpm.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
#!/bin/bash

set -ex

# Create the tmp dir to extract into
EXTRACTDIR_ROOT=/extract_miopen_rpm
mkdir -p ${EXTRACTDIR_ROOT}
echo "Creating temporary directory for rpm download..."

# Fail if rpm source is not available
if ! wget -P ${EXTRACTDIR_ROOT} ${MIOPEN_RPM_SOURCE}; then
echo 'ERROR: Failed to download MIOpen package.'
exit 1
fi
echo "MIOpen package download complete..."

# Extract rpm in EXTRACT_DIR
cd ${EXTRACTDIR_ROOT}
miopen_rpm=$(ls *.rpm)
rpm2cpio ${miopen_rpm} | cpio -idmv

# Copy libMIOpen.so.1 over existing
source_file=$(ls opt/rocm-*/lib/libMIOpen.so.1.0*)
dest_file=$(ls /opt/rocm-${ROCM_VERSION}*/lib/libMIOpen.so.1.0*)
if [ -e ${source_file} ] && [ -e ${dest_file} ]; then
echo "Source .so: ${source_file}"
echo "Dest .so: ${dest_file}"
cp $source_file $dest_file
else
echo 'ERROR: either the source or destination path for libMIOpen.so.1.0 does not exist'
exit 1
fi
echo "libMIOpen so file from RPM copied to existing MIOpen install..."

# Clean up extracted dir
rm -rf ${EXTRACTDIR_ROOT}
echo "Removed temporary directory..."
14 changes: 7 additions & 7 deletions libtorch/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -62,22 +62,22 @@ ENV PYTORCH_ROCM_ARCH ${PYTORCH_ROCM_ARCH}
ENV MKLROOT /opt/intel
ADD ./common/install_rocm.sh install_rocm.sh
ADD ./common/install_rocm_drm.sh install_rocm_drm.sh
#ADD ./common/install_rocm_magma.sh install_rocm_magma.sh
ADD ./common/install_rocm_magma.sh install_rocm_magma.sh
# gfortran and python needed for building magma from source for ROCm
RUN apt-get update -y && \
apt-get install gfortran -y && \
apt-get install python -y && \
apt-get clean

FROM rocm as rocm5.3
RUN ROCM_VERSION=5.3 bash ./install_rocm.sh && rm install_rocm.sh
RUN bash ./install_rocm_drm.sh && rm install_rocm_drm.sh
#RUN bash ./install_rocm_magma.sh && rm install_rocm_magma.sh

FROM rocm as rocm5.4.2
RUN ROCM_VERSION=5.4.2 bash ./install_rocm.sh && rm install_rocm.sh
RUN bash ./install_rocm_drm.sh && rm install_rocm_drm.sh
#RUN bash ./install_rocm_magma.sh && rm install_rocm_magma.sh
RUN bash ./install_rocm_magma.sh && rm install_rocm_magma.sh

FROM rocm as rocm5.3
RUN ROCM_VERSION=5.3 bash ./install_rocm.sh && rm install_rocm.sh
RUN bash ./install_rocm_drm.sh && rm install_rocm_drm.sh
RUN bash ./install_rocm_magma.sh && rm install_rocm_magma.sh

FROM ${BASE_TARGET} as final
# Install LLVM
Expand Down
2 changes: 1 addition & 1 deletion libtorch/build_docker.sh
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ case ${GPU_ARCH_TYPE} in
rocm)
BASE_TARGET=rocm${GPU_ARCH_VERSION}
DOCKER_TAG=rocm${GPU_ARCH_VERSION}
GPU_IMAGE=rocm/dev-ubuntu-20.04:${GPU_ARCH_VERSION}-magma
GPU_IMAGE=rocm/dev-ubuntu-20.04:${GPU_ARCH_VERSION}
PYTORCH_ROCM_ARCH="gfx900;gfx906;gfx908"
ROCM_REGEX="([0-9]+)\.([0-9]+)[\.]?([0-9]*)"
if [[ $GPU_ARCH_VERSION =~ $ROCM_REGEX ]]; then
Expand Down
9 changes: 4 additions & 5 deletions manywheel/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -157,15 +157,14 @@ FROM cpu_final as rocm_final
ARG ROCM_VERSION=3.7
ARG PYTORCH_ROCM_ARCH
ENV PYTORCH_ROCM_ARCH ${PYTORCH_ROCM_ARCH}
ARG MAGMA_PACKAGE_SOURCE
ENV MAGMA_PACKAGE_SOURCE ${MAGMA_PACKAGE_SOURCE}
# Install ROCm
ADD ./common/install_rocm.sh install_rocm.sh
RUN ROCM_VERSION=${ROCM_VERSION} bash ./install_rocm.sh && rm install_rocm.sh
ADD ./common/install_rocm_drm.sh install_rocm_drm.sh
RUN bash ./install_rocm_drm.sh && rm install_rocm_drm.sh
ADD ./common/install_rocm_magma.sh install_rocm_magma.sh
RUN bash ./install_rocm_magma.sh && rm install_rocm_magma.sh
# cmake3 is needed for the MIOpen build
RUN ln -sf /usr/local/bin/cmake /usr/bin/cmake3
### The following is now performed beforehand in a new GPU_IMAGE with magma and miopen preinstalled
#ADD ./common/install_rocm_magma.sh install_rocm_magma.sh
#RUN bash ./install_rocm_magma.sh && rm install_rocm_magma.sh
#ADD ./common/install_miopen.sh install_miopen.sh
#RUN bash ./install_miopen.sh ${ROCM_VERSION} && rm install_miopen.sh
2 changes: 1 addition & 1 deletion manywheel/build_docker.sh
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ case ${GPU_ARCH_TYPE} in
TARGET=rocm_final
DOCKER_TAG=rocm${GPU_ARCH_VERSION}
LEGACY_DOCKER_IMAGE=${DOCKER_REGISTRY}/pytorch/manylinux-rocm:${GPU_ARCH_VERSION}
GPU_IMAGE=rocm/dev-centos-7:${GPU_ARCH_VERSION}-magma-miopen-staging
GPU_IMAGE=rocm/dev-centos-7:${GPU_ARCH_VERSION}
PYTORCH_ROCM_ARCH="gfx900;gfx906;gfx908"
ROCM_REGEX="([0-9]+)\.([0-9]+)[\.]?([0-9]*)"
if [[ $GPU_ARCH_VERSION =~ $ROCM_REGEX ]]; then
Expand Down
22 changes: 18 additions & 4 deletions manywheel/build_rocm.sh
Original file line number Diff line number Diff line change
Expand Up @@ -84,10 +84,6 @@ ROCM_SO_FILES=(
"libmagma.so"
"librccl.so"
"librocblas.so"
"librocfft-device-0.so"
"librocfft-device-1.so"
"librocfft-device-2.so"
"librocfft-device-3.so"
"librocfft.so"
"librocm_smi64.so"
"librocrand.so"
Expand All @@ -97,6 +93,13 @@ ROCM_SO_FILES=(
"libroctx64.so"
)

if [[ $ROCM_INT -lt 50500 ]]; then
ROCM_SO_FILES+=("librocfft-device-0.so")
ROCM_SO_FILES+=("librocfft-device-1.so")
ROCM_SO_FILES+=("librocfft-device-2.so")
ROCM_SO_FILES+=("librocfft-device-3.so")
fi

if [[ $ROCM_INT -ge 50400 ]]; then
ROCM_SO_FILES+=("libhiprtc.so")
fi
Expand Down Expand Up @@ -146,6 +149,11 @@ ARCH_SPECIFIC_FILES=$(ls $ROCBLAS_LIB_SRC | grep -E $ARCH)
OTHER_FILES=$(ls $ROCBLAS_LIB_SRC | grep -v gfx)
ROCBLAS_LIB_FILES=($ARCH_SPECIFIC_FILES $OTHER_FILES)

# MIOpen library files
MIOPEN_SHARE_SRC=$ROCM_HOME/share/miopen/db
MIOPEN_SHARE_DST=share/miopen/db
MIOPEN_SHARE_FILES=($(ls $MIOPEN_SHARE_SRC | grep -E $ARCH))

# ROCm library files
ROCM_SO_PATHS=()
for lib in "${ROCM_SO_FILES[@]}"
Expand All @@ -159,6 +167,10 @@ do
if [[ -z $file_path ]]; then
file_path=($(find $ROCM_HOME/ -name "$lib")) # Then search in ROCM_HOME
fi
if [[ -z $file_path ]]; then
echo "Error: Library file $lib is not found." >&2
exit 1
fi
ROCM_SO_PATHS[${#ROCM_SO_PATHS[@]}]="$file_path" # Append lib to array
done

Expand All @@ -174,11 +186,13 @@ DEPS_SONAME=(

DEPS_AUX_SRCLIST=(
"${ROCBLAS_LIB_FILES[@]/#/$ROCBLAS_LIB_SRC/}"
"${MIOPEN_SHARE_FILES[@]/#/$MIOPEN_SHARE_SRC/}"
"/opt/amdgpu/share/libdrm/amdgpu.ids"
)

DEPS_AUX_DSTLIST=(
"${ROCBLAS_LIB_FILES[@]/#/$ROCBLAS_LIB_DST/}"
"${MIOPEN_SHARE_FILES[@]/#/$MIOPEN_SHARE_DST/}"
"share/libdrm/amdgpu.ids"
)

Expand Down