Releases: wu-kan/HPL-AI
HPL-AI Version 2.3d
-- [M] testing/ptest/HPLAI_pdinfo.cc
Kan Wu added support for AscendCL, and added a new strategy
to enable GEMM to use CPU and GPU/NPU at the same time:
-- [M] src/pgesv/HPLAI_blas.cc
Kan Wu added added a compilation option BLAS_ILP64 to better
support ILP64/32 BLAS:
-- [M] src/pgesv/HPLAI_pdgesv.cc
HPL-AI Version 2.3c
Kan Wu fixed the bug when n > 65536.
-- [M] src/pgesv/HPLAI_pdgesv.cc
Kan Wu added an option to use the half-precision computeType
in cublasGemmEx (cuda@11: is required):
-- [M] src/blas/HPLAI_blas.cc
Kan Wu renamed the executable "xhplai" to "xhpl_ai", to be
synced to HPL-AI-NVIDIA v1.0.0 in nvidia:hpc-benchmarks:
-- [M] src/Makefile.am
-- [M] testing/Makefile.am
Kan Wu release the software at
https://github.com/SYSU-SCC/sysu-scc-spack-repo:
-- [M] testing/ptest/HPLAI_pdinfo.cc
HPL-AI Version 2.3b
Kan Wu optimized the device blaspp routines through the
strategy of buffering memory: the program will check whether
the buffer size is sufficient at runtime, free and reallocate
the buffer when necessary. Therefore, it is recommended to
run the benchmark twice with the same parameters to obtain
higher computation rate:
-- [M] src/blas/HPLAI_blas.cc
-- [M] testing/ptest/HPLAI_pdinfo.cc
HPL-AI Version 2.3a
Junkang Huang added the parallel GMRES and matrix construction
part (https://github.com/schuangs/hpl-ai-with-IR) for HPL-AI.
Note that the GMRES did not check the backward-error, which may
cause the result to be invalid.
Kan Wu reimplemented HPL-AI with some bug fixes and some minor
optimizations. The C souce code of hpl-2.3 remained the same,
while took advantages of modern C++ features such as templates,
overloading, and GPU backends via blaspp.