-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CUDA/HIP support and GPU builds #184
Conversation
490dbc0
to
3d45711
Compare
164cc37
to
ecfffb2
Compare
38f7229
to
f1bfece
Compare
ecfffb2
to
c614668
Compare
bf79e54
to
dd42790
Compare
23a099a
to
4e1010f
Compare
4e1010f
to
93f7f6b
Compare
a228960
to
9e002ef
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks good, mainly just clarification questions. The build process is clearly fiddly, so I wonder whether the documentation would be sufficient as is, but that can be tackled later.
The actual issues seen so far:
- Running the examples tests on a g5.48xlarge, the spheres case is failing with nans for the indicator norm.
- Running the examples with nt > 1, SuperLU seems to cause
Residual norms for GMRES solve
** On entry to DGEMM parameter number 8 had an illegal value
--------------------------------------------------------------------------
prterun noticed that process rank 1 with PID 10743 on node c889f3baddb0 exited on
signal 11 (Segmentation fault: 11).
--------------------------------------------------------------------------
on the cpw_lumped_uniform
case, whereas using STRUMPACK
is fine. Might be as easy as noting in the docs that SuperLU doesn't play nice with threading right now.
{ | ||
double norm = Norml2(comm, x, B, Bx); | ||
MFEM_ASSERT(norm > 0.0, "Zero vector norm in normalization!"); | ||
x *= 1.0 / norm; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: /= norm
.
This is fixed in a8d0cc9.
As far as I can tell, this is only happening on M1 macOS with SuperLU_DIST. I'm strongly inclined to say it is a bug there but at least for now it is not related to this PR. One thing to probably explore is for OpenMP builds to build SuperLU_DIST and STRUMPACK without OpenMP and just rely on a threaded BLAS/LAPACK. EDIT: SuperLU_DIST issue: xiaoyeli/superlu_dist#159 |
0b7c9dd
to
1d91f60
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved as part of testing on #204
1d91f60
to
8132f54
Compare
…which support it Also forward CUDA/HIP compiler + flags to dependency builds, and add configure option for GPU-aware MPI (passed to Hypre, PETSc has its own detection).
8132f54
to
b802a98
Compare
Adds support for CUDA and HIP in the build system and makes corresponding changes to the code to run on GPUs.
Accompanying documentation update in #185
Resolves #3
TODO (for Spack support):
main-2023-11
spack/spack#42726 (Resolved by usinglibxsmm@=main
dependency)