-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
converted to C++ BLAS/LAPACK interface #237
Conversation
…inear algebra bindings
…valeev/feature/linalgpp
…compatible with ninja
…/LAPACK++) properly
# Conflicts: # .gitlab-ci.yml # external/versions.cmake
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I think it looks good, but I'm a bit confused as to why you chose to only move blas
over to blaspp
and not lapack
in e.g. potrf
, trtri
, etc?
@@ -695,20 +696,21 @@ inline void gemm(btas::Tensor<T, Range, Storage>& result, | |||
std::cbegin(left.range().extent()), std::cbegin(right.range().extent()))); | |||
|
|||
// Compute gemm dimensions | |||
using integer = TiledArray::math::blas::integer; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this type of logic is used elsewhere, but use of blaspp
/lapackpp
generally precludes the need to explicitly specify integer type: all of the interfaces are int64_t
. So you can pretty much move over all integer
types to be 64-bit and the internal conversions to the appropriate BLAS integer are handled in the BLAS/LAPACK wrappers internally (they do introspection to make sure things are correct)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wavefunction91 TA::math::blas::integer
alias is used elsewhere (MPQC). We could deprecate them but I think it's relatively harmless.
src/TiledArray/math/blas.h
Outdated
|
||
/// converts Op to ints in manner useful for bit manipulations | ||
/// NoTranspose -> 0, Transpose->1, ConjTranspose->2 | ||
inline auto to_int(Op op) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're using these for bit manipulation, you should probably impose 32/64-bit and signed-ness here as opposed to auto
. I know ISO dictates this is int32_t
here, but might be good to enforce it if only for readability
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, will do
@@ -76,17 +76,16 @@ class cuBLASHandlePool { | |||
}; | |||
// thread_local cublasHandle_t *cuBLASHandlePool::handle_; | |||
|
|||
inline cublasOperation_t to_cublas_op( | |||
madness::cblas::CBLAS_TRANSPOSE cblas_op) { | |||
inline cublasOperation_t to_cublas_op(math::blas::Op cblas_op) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't the blaspp
cuBLAS bindings handle this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed.
This points out the cuBLAS in BLAS++ issue. Having looked at it I don't think there is a way to fully control how the work gets assigned to streams through BLAS++ API, so switching to their device interface and removing all CUDA stuff is not an option as it will break TA contractions. Without looking more deeply I don't see how to work around this.
I propose we defer device BLAS work to another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this can get deferred, but FWIW they do have stream management
It still needs a bit of work (i.e. they dont allow for attaching a queue onto existing streams, which is something that would be beneficial in this use case)
#include <blas/dot.hh> | ||
#include <blas/gemm.hh> | ||
#include <blas/scal.hh> | ||
#include <blas/util.hh> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get the point, but not sure this really changes the coupling. Might just be verbosity with no net benefit, we should look into this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is along the iwyu philosophy, which I mostly agree with.
static constexpr auto ConjTranspose = madness::cblas::ConjTrans; | ||
/// the integer type used by C++ BLAS/LAPACK interface, same as that used by | ||
/// BLAS++/LAPACK++ | ||
using integer = int64_t; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See previous comment on integer types
using TiledArray::math::linalg::cholesky_solve; | ||
using TiledArray::math::linalg::cholesky_lsolve; | ||
} | ||
using TiledArray::math::linalg::cholesky; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't these aliases defeat the purpose of nesting the math::linalg
entirely?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto* a = A.data(); | ||
lapack_int lda = n; | ||
integer lda = n; | ||
TA_LAPACK(potrf, uplo, n, a, lda); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we just be using lapack::potrf
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TA_LAPACK(potrf,...
resolves to that, but also checks the return and if needed throws exception with embedded __FILE__
and __LINE__
# Conflicts: # INSTALL.md # external/versions.cmake
Yes I saw that, it appears to be pretty simple, i.e. there is no way to
specify which one of the "parallel" streams to launch into. That's
showstopper for us.
…On Thu, Dec 31, 2020, 4:19 PM David Williams-Young ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In src/TiledArray/cuda/cublas.h
<#237 (comment)>
:
> @@ -76,17 +76,16 @@ class cuBLASHandlePool {
};
// thread_local cublasHandle_t *cuBLASHandlePool::handle_;
-inline cublasOperation_t to_cublas_op(
- madness::cblas::CBLAS_TRANSPOSE cblas_op) {
+inline cublasOperation_t to_cublas_op(math::blas::Op cblas_op) {
I agree that this can get deferred, but FWIW they do have stream management
https://bitbucket.org/icl/blaspp/src/4cae6e956825496d13e087c4620dd67fe40d75bb/include/blas/device.hh#lines-79
It still needs a bit of work (i.e. they dont allow for attaching a queue
onto existing streams, which is something that would be beneficial in this
use case)
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#237 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAQXIZZRU62ABZHSHBWCG4TSXTTGHANCNFSM4VGEUCNQ>
.
|
no longer depend on MADNESS linear algebra bindings