converted to C++ BLAS/LAPACK interface #237

evaleev · 2020-12-22T21:05:33Z

no longer depend on MADNESS linear algebra bindings

…inear algebra bindings

…valeev/feature/linalgpp

…compatible with ninja

…/LAPACK++) properly

…e time

…, etc.

# Conflicts: # .gitlab-ci.yml # external/versions.cmake

wavefunction91

Overall I think it looks good, but I'm a bit confused as to why you chose to only move blas over to blaspp and not lapack in e.g. potrf, trtri, etc?

wavefunction91 · 2020-12-29T19:24:52Z

src/TiledArray/external/btas.h

@@ -695,20 +696,21 @@ inline void gemm(btas::Tensor<T, Range, Storage>& result,
      std::cbegin(left.range().extent()), std::cbegin(right.range().extent())));

  // Compute gemm dimensions
+  using integer = TiledArray::math::blas::integer;


Not sure if this type of logic is used elsewhere, but use of blaspp/lapackpp generally precludes the need to explicitly specify integer type: all of the interfaces are int64_t. So you can pretty much move over all integer types to be 64-bit and the internal conversions to the appropriate BLAS integer are handled in the BLAS/LAPACK wrappers internally (they do introspection to make sure things are correct)

@wavefunction91 TA::math::blas::integer alias is used elsewhere (MPQC). We could deprecate them but I think it's relatively harmless.

wavefunction91 · 2020-12-29T19:28:15Z

src/TiledArray/math/blas.h

+
+/// converts Op to ints in manner useful for bit manipulations
+/// NoTranspose -> 0, Transpose->1, ConjTranspose->2
+inline auto to_int(Op op) {


If you're using these for bit manipulation, you should probably impose 32/64-bit and signed-ness here as opposed to auto. I know ISO dictates this is int32_t here, but might be good to enforce it if only for readability

OK, will do

wavefunction91 · 2020-12-29T20:04:49Z

src/TiledArray/cuda/cublas.h

@@ -76,17 +76,16 @@ class cuBLASHandlePool {
 };
 // thread_local cublasHandle_t *cuBLASHandlePool::handle_;

-inline cublasOperation_t to_cublas_op(
-    madness::cblas::CBLAS_TRANSPOSE cblas_op) {
+inline cublasOperation_t to_cublas_op(math::blas::Op cblas_op) {


Don't the blaspp cuBLAS bindings handle this?

Indeed.

This points out the cuBLAS in BLAS++ issue. Having looked at it I don't think there is a way to fully control how the work gets assigned to streams through BLAS++ API, so switching to their device interface and removing all CUDA stuff is not an option as it will break TA contractions. Without looking more deeply I don't see how to work around this.

I propose we defer device BLAS work to another PR.

I agree that this can get deferred, but FWIW they do have stream management

https://bitbucket.org/icl/blaspp/src/4cae6e956825496d13e087c4620dd67fe40d75bb/include/blas/device.hh#lines-79

It still needs a bit of work (i.e. they dont allow for attaching a queue onto existing streams, which is something that would be beneficial in this use case)

wavefunction91 · 2020-12-29T20:06:43Z

src/TiledArray/math/blas.h

+#include <blas/dot.hh>
+#include <blas/gemm.hh>
+#include <blas/scal.hh>
+#include <blas/util.hh>


I get the point, but not sure this really changes the coupling. Might just be verbosity with no net benefit, we should look into this

this is along the iwyu philosophy, which I mostly agree with.

wavefunction91 · 2020-12-29T20:07:02Z

src/TiledArray/math/blas.h

-static constexpr auto ConjTranspose = madness::cblas::ConjTrans;
+/// the integer type used by C++ BLAS/LAPACK interface, same as that used by
+/// BLAS++/LAPACK++
+using integer = int64_t;


See previous comment on integer types

wavefunction91 · 2020-12-30T21:11:59Z

src/TiledArray/math/linalg/cholesky.h

-  using TiledArray::math::linalg::cholesky_solve;
-  using TiledArray::math::linalg::cholesky_lsolve;
-}
+using TiledArray::math::linalg::cholesky;


Don't these aliases defeat the purpose of nesting the math::linalg entirely?

@asadchev was in charge of laying out namespaces, I don't recall the rationale; @asadchev please comment

wavefunction91 · 2020-12-30T21:16:28Z

src/TiledArray/math/linalg/rank-local.cpp

  auto* a = A.data();
-  lapack_int lda = n;
+  integer lda = n;
  TA_LAPACK(potrf, uplo, n, a, lda);


Shouldn't we just be using lapack::potrf here?

TA_LAPACK(potrf,... resolves to that, but also checks the return and if needed throws exception with embedded __FILE__ and __LINE__

# Conflicts: # INSTALL.md # external/versions.cmake

…tion issues

evaleev · 2020-12-31T22:19:53Z

Yes I saw that, it appears to be pretty simple, i.e. there is no way to specify which one of the "parallel" streams to launch into. That's showstopper for us.

…

On Thu, Dec 31, 2020, 4:19 PM David Williams-Young ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In src/TiledArray/cuda/cublas.h <#237 (comment)> : > @@ -76,17 +76,16 @@ class cuBLASHandlePool { }; // thread_local cublasHandle_t *cuBLASHandlePool::handle_; -inline cublasOperation_t to_cublas_op( - madness::cblas::CBLAS_TRANSPOSE cblas_op) { +inline cublasOperation_t to_cublas_op(math::blas::Op cblas_op) { I agree that this can get deferred, but FWIW they do have stream management https://bitbucket.org/icl/blaspp/src/4cae6e956825496d13e087c4620dd67fe40d75bb/include/blas/device.hh#lines-79 It still needs a bit of work (i.e. they dont allow for attaching a queue onto existing streams, which is something that would be beneficial in this use case) — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#237 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAQXIZZRU62ABZHSHBWCG4TSXTTGHANCNFSM4VGEUCNQ> .

converted to C++ BLAS/LAPACK interface, no longer depend on MADNESS l…

a0c0bcc

…inear algebra bindings

evaleev requested a review from asadchev December 22, 2020 21:05

evaleev added 28 commits December 22, 2020 16:40

updated scalapack API for updated enums

7dcf3ad

Merge branch 'master' into evaleev/feature/linalgpp

23fda41

updated blas example

f481b4f

Merge branch 'master' into evaleev/feature/linalgpp

f20dd85

added missing blas/wrappers.hh to pick up fixed-type BLAS overloads

7ee4b1a

fixed tensor_suite/gemm UT

1b34c20

[cmake] ENABLE_MKL is no longer used

02bec7c

Merge remote-tracking branch 'origin/evaleev/feature/linalgpp' into e…

78a4d19

…valeev/feature/linalgpp

[cmake] build boost from source, if not found

4b22e56

[ci] fixed gitlab BLAS vars

ffbf2ca

amended 4b22e56

8ae6e6e

[ci] budge FindBLAS to discover MKL by loading the mklvars.sh script

e6335c2

[python] disable einsum unit test for now

a3a5ec6

[gitlab-ci] do not load MKL by default

7ed2da9

[python] build target runs in CMAKE_BINARY_DIR, not in subdir, to be …

eed169c

…compatible with ninja

[travis] remove install dir until can install everything (incl BLAS++…

386694a

…/LAPACK++) properly

[cmake] check for presence of prerequisite python modules at configur…

e033d40

…e time

[python] re-enable einsum UT

9bda3d6

replaced last uses of HAVE_INTEL_MKL with TILEDARRAY_HAS_INTEL_MKL

bba8d07

updated INSTALL.md with notes on BLAS/LAPACK discovery, VG toolchains…

66b19f6

…, etc.

hushed warnings from legacy Intel C++ compiler

c6c464a

merged master into evaleev/feature/linalgpp

76b4e9e

fixup

d4cb73d

amended c6c464a

8ed1e2f

to be able to include mkl headers need to introspect blas's include path

20770b9

cleanup

397e8d2

tot expressions UT: slight tweak to compile with Intel C++ compiler

9248ebb

amended 20770b9

49dbe2a

evaleev requested a review from wavefunction91 December 29, 2020 14:48

evaleev added 5 commits December 29, 2020 12:19

Merge branch 'master' into evaleev/feature/linalgpp

73cf8e8

# Conflicts: # .gitlab-ci.yml # external/versions.cmake

[ci] amended BLAS var definition in .gitlab-ci.yml

7714b8c

BLAS++ header detection is aware of more TBB lib path variations

594e19d

amended 594e19d

d15774d

Documented INTEGER4 in discussion of MKL

36d199a

wavefunction91 reviewed Dec 30, 2020

View reviewed changes

evaleev added 4 commits December 31, 2020 09:40

cleanup

de3ce71

Merge branch 'master' into evaleev/feature/linalgpp

9521adb

# Conflicts: # INSTALL.md # external/versions.cmake

blas::to_int returns int64_t, per #237 (comment)

b7cc247

amended CUDA code for BLAS changes + resolve all CUDA-related compila…

9ba14f4

…tion issues

evaleev merged commit a6bdc81 into master Dec 31, 2020

evaleev deleted the evaleev/feature/linalgpp branch December 31, 2020 23:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

converted to C++ BLAS/LAPACK interface #237

converted to C++ BLAS/LAPACK interface #237

evaleev commented Dec 22, 2020

wavefunction91 left a comment

wavefunction91 Dec 29, 2020

evaleev Dec 31, 2020

wavefunction91 Dec 29, 2020

evaleev Dec 31, 2020

wavefunction91 Dec 29, 2020

evaleev Dec 31, 2020

wavefunction91 Dec 31, 2020

wavefunction91 Dec 29, 2020

evaleev Dec 31, 2020

wavefunction91 Dec 29, 2020

wavefunction91 Dec 30, 2020

evaleev Dec 31, 2020

wavefunction91 Dec 30, 2020

evaleev Dec 31, 2020

evaleev commented Dec 31, 2020 via email

converted to C++ BLAS/LAPACK interface #237

converted to C++ BLAS/LAPACK interface #237

Conversation

evaleev commented Dec 22, 2020

wavefunction91 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

evaleev commented Dec 31, 2020 via email