Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CUDA/HIP support and GPU builds #184

Merged
merged 20 commits into from
Mar 4, 2024
Merged

Conversation

sebastiangrimberg
Copy link
Contributor

@sebastiangrimberg sebastiangrimberg commented Feb 2, 2024

Adds support for CUDA and HIP in the build system and makes corresponding changes to the code to run on GPUs.

Accompanying documentation update in #185

Resolves #3

TODO (for Spack support):

@sebastiangrimberg sebastiangrimberg added draft Work in progress performance Related to performance labels Feb 2, 2024
@sebastiangrimberg sebastiangrimberg force-pushed the sjg/openmp-improvements branch 2 times, most recently from 490dbc0 to 3d45711 Compare February 2, 2024 02:20
@sebastiangrimberg sebastiangrimberg changed the title Add CUDA/HIP support for GPU builds Add CUDA/HIP support and GPU builds Feb 2, 2024
@sebastiangrimberg sebastiangrimberg force-pushed the sjg/openmp-improvements branch 3 times, most recently from 38f7229 to f1bfece Compare February 7, 2024 18:53
Base automatically changed from sjg/openmp-improvements to main February 7, 2024 19:33
@sebastiangrimberg sebastiangrimberg force-pushed the sjg/gpu-build-system-dev branch 5 times, most recently from bf79e54 to dd42790 Compare February 16, 2024 21:27
@sebastiangrimberg sebastiangrimberg marked this pull request as ready for review February 16, 2024 23:55
@sebastiangrimberg sebastiangrimberg changed the base branch from main to sjg/cpw-waveport-mesh February 17, 2024 02:01
Base automatically changed from sjg/cpw-waveport-mesh to main February 17, 2024 04:15
@sebastiangrimberg sebastiangrimberg force-pushed the sjg/gpu-build-system-dev branch 3 times, most recently from a228960 to 9e002ef Compare February 28, 2024 00:03
Copy link
Collaborator

@hughcars hughcars left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good, mainly just clarification questions. The build process is clearly fiddly, so I wonder whether the documentation would be sufficient as is, but that can be tackled later.

The actual issues seen so far:

  • Running the examples tests on a g5.48xlarge, the spheres case is failing with nans for the indicator norm.
  • Running the examples with nt > 1, SuperLU seems to cause
  Residual norms for GMRES solve
 ** On entry to DGEMM  parameter number  8 had an illegal value
--------------------------------------------------------------------------
prterun noticed that process rank 1 with PID 10743 on node c889f3baddb0 exited on
signal 11 (Segmentation fault: 11).
--------------------------------------------------------------------------

on the cpw_lumped_uniform case, whereas using STRUMPACK is fine. Might be as easy as noting in the docs that SuperLU doesn't play nice with threading right now.

palace/fem/errorindicator.cpp Show resolved Hide resolved
palace/fem/fespace.cpp Outdated Show resolved Hide resolved
palace/fem/coefficient.hpp Show resolved Hide resolved
{
double norm = Norml2(comm, x, B, Bx);
MFEM_ASSERT(norm > 0.0, "Zero vector norm in normalization!");
x *= 1.0 / norm;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: /= norm.

palace/linalg/slepc.cpp Outdated Show resolved Hide resolved
palace/linalg/slepc.cpp Outdated Show resolved Hide resolved
palace/linalg/slepc.cpp Show resolved Hide resolved
palace/linalg/vector.hpp Show resolved Hide resolved
palace/linalg/vector.cpp Show resolved Hide resolved
palace/linalg/vector.cpp Outdated Show resolved Hide resolved
@sebastiangrimberg
Copy link
Contributor Author

sebastiangrimberg commented Feb 28, 2024

The actual issues seen so far:

  • Running the examples tests on a g5.48xlarge, the spheres case is failing with nans for the indicator norm.

This is fixed in a8d0cc9.

  • Running the examples with nt > 1, SuperLU seems to cause
  Residual norms for GMRES solve
 ** On entry to DGEMM  parameter number  8 had an illegal value
--------------------------------------------------------------------------
prterun noticed that process rank 1 with PID 10743 on node c889f3baddb0 exited on
signal 11 (Segmentation fault: 11).
--------------------------------------------------------------------------

on the cpw_lumped_uniform case, whereas using STRUMPACK is fine.

As far as I can tell, this is only happening on M1 macOS with SuperLU_DIST. I'm strongly inclined to say it is a bug there but at least for now it is not related to this PR. One thing to probably explore is for OpenMP builds to build SuperLU_DIST and STRUMPACK without OpenMP and just rely on a threaded BLAS/LAPACK.

EDIT: SuperLU_DIST issue: xiaoyeli/superlu_dist#159

@sebastiangrimberg
Copy link
Contributor Author

sebastiangrimberg commented Feb 29, 2024

Once this PR, #193, and #194 are approved, I will rebase on main and merge all into main together.

@sebastiangrimberg sebastiangrimberg removed the draft Work in progress label Feb 29, 2024
Copy link
Collaborator

@hughcars hughcars left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved as part of testing on #204

@sebastiangrimberg sebastiangrimberg merged commit 71b3813 into main Mar 4, 2024
17 checks passed
@sebastiangrimberg sebastiangrimberg deleted the sjg/gpu-build-system-dev branch March 4, 2024 21:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Related to performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Operator partial assembly, GPU support
2 participants