Implemented chol, trinv, ttmm, hpdinv.

Details: - Implemented an initial set of "level-4" operations: - 'chol': Cholesky factorization - 'trinv': Triangular matrix inversion - 'ttmm': Triangular-transpose matrix multiply (that is, either L^H * L or U * U^H, where the diagonal of L or U is real) - 'hpdinv': Hermitian-positive definite matrix inversion (also known as spdinv, or symmetric-positive definite matrix inversion for real-domain matrices) The first three operations each contain three kinds of algorithmic variants: - blocked ("blk"): blocked algorithms expressed in terms of object APIs. - unblocked ("unb"): unblocked algorithms expressed in terms of object APIs. - optimized unblocked ("opt"): optimized unblocked algorithms expressed in terms of typed APIs. except for ttmm, which omits the unblocked ("unb") implementations. (In contrast to the first three operations, 'hpdinv' is implemented as a composite operation in terms of chol, trinv, and ttmm, and so it does not have any algorithmic variants of its own.) For every variant that is implemented, there are two separate functions, one each to handle lower- and upper-triangular matrices. In the case of 'trinv', unit and non-unit diagonals are also supported, albeit via conditional statements in a unified set of variants that work for both cases. Each of 'chol', 'trinv', and 'ttmm' employs an extra level of recursion for the self-similar subproblem, with 4*KC and KC used for the outer and inner algorithmic blocksizes, respectively. All four operations provide object and typed APIs. (NOTE: The variants added by this commit were inspired and modeled after those present in libflame.) - Added testsuite modules to test the chol, trinv, ttmm, and hpdinv operations for correctness and updated the input.operations files accordingly. - Changed invertsc operation to be a non-destructive operation; that is, it now takes separate input and output operands. This change applies to both the object and typed APIs. - Defined an alternative square root operation, sqrtrsc, which, when operating on complex scalars, assumes the imaginary part of the input to be zero. - Changed the semantics of addm, subm, copym, axpym, scal2m, and xpbym so that when the source matrix has an implicit unit diagonal, the operation leaves the diagonal of the destination matrix untouched. Previously, the operations would interpret an implicit unit diagonal on the source matrix as a request to manifest the unit diagonal *explicitly* on output (either as something to copy in the case of copym, or something to compute with in the cases of addm, subm, axpym, scal2m, and xpbym). It turns out that this behavior was too cute by half and could cause unintended headaches for practical use cases. (This change in behavior also required small modifications to the trmv and trsv testsuite modules so that they would properly test matrices with unit diagonals.) - Added missing dependencies for copym to gemv, ger, hemv, trmv, and trsv testsuite modules. - Implemented level-0-like ltsc, ltesc, gtsc, gtesc operations in frame/util, which use lt, lte, gt, and gte level-0 scalar macros. - Implemented bli_acquire_mparts_tl2br() in bli_part.c, which provides selected subpartitions of a larger matrix. Also made a trivial variable rename in bli_part.c to harmonize with variable naming conventions elsewhere in BLIS. - Due to the fact that this code was developed against a more recent commit of BLIS (bce86b1) which employs const correctness, this commit adds -Wno-discarded-qualifiers for gcc, or -Wno-incompatible-pointer-types-discards-qualifiers for clang, to the list of compiler flags used for all source code. In the case of clang, -Wno-unused-but-set-variable is also thrown in just to pacify clang's protest of some unused variables in select files.
flame · Jan 14, 2023 · 02b5acd · 02b5acd
1 parent 0e4491d
commit 02b5acd
Show file tree

Hide file tree

Showing 146 changed files with 11,960 additions and 118 deletions.
diff --git a/CREDITS b/CREDITS
@@ -3,12 +3,13 @@ BLIS framework
 Acknowledgements
 ---
 
-The BLIS framework was primarily authored by
+The BLIS framework was originally authored by
 
   Field Van Zee            @fgvanzee           (The University of Texas at Austin)
 
 but many others have contributed code and feedback, including
 
+  Jay Acosta               @jay-acosta         (Oracle)
   Sameer Agarwal           @sandwichmaker      (Google)
   Murtaza Ali                                  (Texas Instruments)
   Sajid Ali                @s-sajid-ali        (Northwestern University)

diff --git a/common.mk b/common.mk
@@ -678,6 +678,18 @@ ifeq ($(CC_VENDOR),clang)
 CWARNFLAGS += -Wno-tautological-compare -Wno-pass-failed
 endif
 
+# Disable discarded qualifier warnings.
+# NOTE: This is a temporary hack until the 'ampere' branch can catch up to the
+# point in the 'master' brange lineage where const correctness is implemented
+# throughout BLIS's higher-level APIs.
+ifeq ($(CC_VENDOR),gcc)
+CWARNFLAGS     := -Wno-discarded-qualifiers
+else
+ifeq ($(CC_VENDOR),clang)
+CWARNFLAGS     := -Wno-incompatible-pointer-types-discards-qualifiers -Wno-unused-but-set-variable
+endif
+endif
+
 $(foreach c, $(CONFIG_LIST_FAM), $(eval $(call append-var-for,CWARNFLAGS,$(c))))
 
 # --- Position-independent code flags (shared libraries only) ---

diff --git a/examples/oapi/04level0.c b/examples/oapi/04level0.c
@@ -166,7 +166,7 @@ int main( int argc, char** argv )
 	bli_normfsc( &zeta, &alpha );
 	bli_printm( "alpha := normf( zeta )  # normf() = complex modulus in complex domain.", &alpha, "%4.1f", "" );
 
-	bli_invertsc( &gamma );
+	bli_invertsc( &gamma, &gamma );
 	bli_printm( "gamma := 1.0 / gamma", &gamma, "%4.2f", "" );
 
 

diff --git a/frame/0/bli_l0_check.c b/frame/0/bli_l0_check.c
@@ -55,20 +55,8 @@ GENFRONT( copysc )
 GENFRONT( divsc )
 GENFRONT( mulsc )
 GENFRONT( sqrtsc )
+GENFRONT( sqrtrsc )
 GENFRONT( subsc )
-
-
-#undef  GENFRONT
-#define GENFRONT( opname ) \
-\
-void PASTEMAC(opname,_check) \
-     ( \
-       obj_t*  chi  \
-     ) \
-{ \
-	bli_l0_xsc_check( chi ); \
-}
-
 GENFRONT( invertsc )
 
 
@@ -357,7 +345,7 @@ void bli_l0_xxbsc_check
      (
        obj_t*  chi,
        obj_t*  psi,
-       bool*   is_eq
+       bool*   is
      )
 {
 	err_t e_val;

diff --git a/frame/0/bli_l0_check.h b/frame/0/bli_l0_check.h
@@ -51,17 +51,8 @@ GENTPROT( copysc )
 GENTPROT( divsc )
 GENTPROT( mulsc )
 GENTPROT( sqrtsc )
+GENTPROT( sqrtrsc )
 GENTPROT( subsc )
-
-
-#undef  GENTPROT
-#define GENTPROT( opname ) \
-\
-void PASTEMAC(opname,_check) \
-     ( \
-       obj_t*  chi  \
-     );
-
 GENTPROT( invertsc )
 
 
@@ -152,5 +143,5 @@ void bli_l0_xxbsc_check
      (
        obj_t*  chi,
        obj_t*  psi,
-       bool*   is_eq
+       bool*   is
      );
diff --git a/frame/0/bli_l0_fpa.c b/frame/0/bli_l0_fpa.c
@@ -56,6 +56,7 @@ GENFRONT( mulsc )
 GENFRONT( subsc )
 GENFRONT( invertsc )
 GENFRONT( sqrtsc )
+GENFRONT( sqrtrsc )
 GENFRONT( unzipsc )
 GENFRONT( zipsc )
 

diff --git a/frame/0/bli_l0_fpa.h b/frame/0/bli_l0_fpa.h
@@ -50,6 +50,7 @@ GENPROT( mulsc )
 GENPROT( subsc )
 GENPROT( invertsc )
 GENPROT( sqrtsc )
+GENPROT( sqrtrsc )
 GENPROT( unzipsc )
 GENPROT( zipsc )
 

diff --git a/frame/0/bli_l0_ft.h b/frame/0/bli_l0_ft.h
@@ -37,7 +37,7 @@
 // -- Level-0 function types ---------------------------------------------------
 //
 
-// addsc, divsc, subsc
+// addsc, divsc, subsc, invertsc
 
 #undef  GENTDEF
 #define GENTDEF( ctype, ch, opname, tsuf ) \
@@ -52,18 +52,6 @@ typedef void (*PASTECH2(ch,opname,tsuf)) \
 INSERT_GENTDEF( addsc )
 INSERT_GENTDEF( divsc )
 INSERT_GENTDEF( subsc )
-
-// invertsc
-
-#undef  GENTDEF
-#define GENTDEF( ctype, ch, opname, tsuf ) \
-\
-typedef void (*PASTECH2(ch,opname,tsuf)) \
-     ( \
-       conj_t  conjchi, \
-       ctype*  chi  \
-     );
-
 INSERT_GENTDEF( invertsc )
 
 // mulsc
@@ -118,6 +106,7 @@ typedef void (*PASTECH2(ch,opname,tsuf)) \
      );
 
 INSERT_GENTDEF( sqrtsc )
+INSERT_GENTDEF( sqrtrsc )
 
 // getsc
 

diff --git a/frame/0/bli_l0_oapi.c b/frame/0/bli_l0_oapi.c
@@ -115,38 +115,6 @@ GENFRONT( addsc )
 GENFRONT( divsc )
 GENFRONT( mulsc )
 GENFRONT( subsc )
-
-
-#undef  GENFRONT
-#define GENFRONT( opname ) \
-\
-void PASTEMAC0(opname) \
-     ( \
-       obj_t*  chi  \
-     ) \
-{ \
-	bli_init_once(); \
-\
-	num_t     dt        = bli_obj_dt( chi ); \
-\
-	conj_t    conjchi   = bli_obj_conj_status( chi ); \
-\
-	void*     buf_chi   = bli_obj_buffer_for_1x1( dt, chi ); \
-\
-	if ( bli_error_checking_is_enabled() ) \
-	    PASTEMAC(opname,_check)( chi ); \
-\
-	/* Query a type-specific function pointer, except one that uses
-	   void* for function arguments instead of typed pointers. */ \
-	PASTECH(opname,_vft) f = PASTEMAC(opname,_qfp)( dt ); \
-\
-	f \
-	( \
-	  conjchi, \
-	  buf_chi  \
-	); \
-}
-
 GENFRONT( invertsc )
 
 
@@ -181,6 +149,7 @@ void PASTEMAC0(opname) \
 }
 
 GENFRONT( sqrtsc )
+GENFRONT( sqrtrsc )
 
 
 #undef  GENFRONT

diff --git a/frame/0/bli_l0_oapi.h b/frame/0/bli_l0_oapi.h
@@ -63,17 +63,8 @@ GENPROT( addsc )
 GENPROT( divsc )
 GENPROT( mulsc )
 GENPROT( sqrtsc )
+GENPROT( sqrtrsc )
 GENPROT( subsc )
-
-
-#undef  GENPROT
-#define GENPROT( opname ) \
-\
-BLIS_EXPORT_BLIS void PASTEMAC0(opname) \
-     ( \
-       obj_t*  chi  \
-     );
-
 GENPROT( invertsc )
 
 

diff --git a/frame/0/bli_l0_tapi.c b/frame/0/bli_l0_tapi.c
@@ -67,7 +67,8 @@ INSERT_GENTFUNC_BASIC( subsc, subs )
 void PASTEMAC(ch,opname) \
      ( \
        conj_t  conjchi, \
-       ctype*  chi  \
+       ctype*  chi, \
+       ctype*  psi  \
      ) \
 { \
 	bli_init_once(); \
@@ -76,7 +77,7 @@ void PASTEMAC(ch,opname) \
 \
 	PASTEMAC(ch,copycjs)( conjchi, *chi, chi_conj ); \
 	PASTEMAC(ch,kername)( chi_conj ); \
-	PASTEMAC(ch,copys)( chi_conj, *chi ); \
+	PASTEMAC(ch,copys)( chi_conj, *psi ); \
 }
 
 INSERT_GENTFUNC_BASIC( invertsc, inverts )
@@ -176,6 +177,25 @@ void PASTEMAC(ch,opname) \
 INSERT_GENTFUNC_BASIC0( sqrtsc )
 
 
+#undef  GENTFUNCR
+#define GENTFUNCR( ctype, ctype_r, ch, chr, opname ) \
+\
+void PASTEMAC(ch,opname) \
+     ( \
+       ctype* chi, \
+       ctype* psi  \
+     ) \
+{ \
+	bli_init_once(); \
+\
+	const ctype_r chi_r = PASTEMAC(ch,real)( *chi ); \
+\
+	PASTEMAC2(chr,ch,sqrt2s)( chi_r, *psi ); \
+}
+
+INSERT_GENTFUNCR_BASIC0( sqrtrsc )
+
+
 #undef  GENTFUNC
 #define GENTFUNC( ctype, ch, opname ) \
 \

diff --git a/frame/0/bli_l0_tapi.h b/frame/0/bli_l0_tapi.h
@@ -51,17 +51,6 @@ INSERT_GENTPROT_BASIC0( addsc )
 INSERT_GENTPROT_BASIC0( divsc )
 INSERT_GENTPROT_BASIC0( mulsc )
 INSERT_GENTPROT_BASIC0( subsc )
-
-
-#undef  GENTPROT
-#define GENTPROT( ctype, ch, opname ) \
-\
-BLIS_EXPORT_BLIS void PASTEMAC(ch,opname) \
-     ( \
-       conj_t  conjchi, \
-       ctype*  chi  \
-     );
-
 INSERT_GENTPROT_BASIC0( invertsc )
 
 
@@ -88,6 +77,7 @@ BLIS_EXPORT_BLIS void PASTEMAC(ch,opname) \
      );
 
 INSERT_GENTPROT_BASIC0( sqrtsc )
+INSERT_GENTPROT_BASIC0( sqrtrsc )
 
 
 #undef  GENTPROT

diff --git a/frame/1m/bli_l1m_tapi.c b/frame/1m/bli_l1m_tapi.c
@@ -83,6 +83,11 @@ void PASTEMAC2(ch,opname,EX_SUF) \
 \
 	/* When the diagonal of an upper- or lower-stored matrix is unit,
 	   we handle it with a separate post-processing step. */ \
+	/* NOTE: This code was disabled after I realized that when matrix A has the
+	   properties of having a unit diagonal (and being lower or upper stored),
+	   the operation should only read the strictly lower/upper triangle and
+	   leave the diagonal of B untouched. */ \
+/*
 	if ( bli_is_upper_or_lower( uplox ) && \
 	     bli_is_unit_diag( diagx ) ) \
 	{ \
@@ -99,6 +104,7 @@ void PASTEMAC2(ch,opname,EX_SUF) \
 		  rntm  \
 		); \
 	} \
+*/ \
 }
 
 INSERT_GENTFUNC_BASIC( addm, addd )
@@ -148,6 +154,11 @@ void PASTEMAC2(ch,opname,EX_SUF) \
 \
 	/* When the diagonal of an upper- or lower-stored matrix is unit,
 	   we handle it with a separate post-processing step. */ \
+	/* NOTE: This code was disabled after I realized that when matrix A has the
+	   properties of having a unit diagonal (and being lower or upper stored),
+	   the operation should only read the strictly lower/upper triangle and
+	   leave the diagonal of B untouched. */ \
+/*
 	if ( bli_is_upper_or_lower( uplox ) && \
 	     bli_is_unit_diag( diagx ) ) \
 	{ \
@@ -169,6 +180,7 @@ void PASTEMAC2(ch,opname,EX_SUF) \
 		  rntm  \
 		); \
 	} \
+*/ \
 }
 
 INSERT_GENTFUNC_BASIC0( copym )
@@ -222,6 +234,11 @@ void PASTEMAC2(ch,opname,EX_SUF) \
 \
 	/* When the diagonal of an upper- or lower-stored matrix is unit,
 	   we handle it with a separate post-processing step. */ \
+	/* NOTE: This code was disabled after I realized that when matrix A has the
+	   properties of having a unit diagonal (and being lower or upper stored),
+	   the operation should only read the strictly lower/upper triangle and
+	   leave the diagonal of B untouched. */ \
+/*
 	if ( bli_is_upper_or_lower( uplox ) && \
 	     bli_is_unit_diag( diagx ) ) \
 	{ \
@@ -239,6 +256,7 @@ void PASTEMAC2(ch,opname,EX_SUF) \
 		  rntm  \
 		); \
 	} \
+*/ \
 }
 
 INSERT_GENTFUNC_BASIC0( axpym )
@@ -311,6 +329,11 @@ void PASTEMAC2(ch,opname,EX_SUF) \
 \
 	/* When the diagonal of an upper- or lower-stored matrix is unit,
 	   we handle it with a separate post-processing step. */ \
+	/* NOTE: This code was disabled after I realized that when matrix A has the
+	   properties of having a unit diagonal (and being lower or upper stored),
+	   the operation should only read the strictly lower/upper triangle and
+	   leave the diagonal of B untouched. */ \
+/*
 	if ( bli_is_upper_or_lower( uplox ) && \
 	     bli_is_unit_diag( diagx ) ) \
 	{ \
@@ -331,6 +354,7 @@ void PASTEMAC2(ch,opname,EX_SUF) \
 		  rntm  \
 		); \
 	} \
+*/ \
 }
 
 INSERT_GENTFUNC_BASIC0( scal2m )
@@ -448,6 +472,11 @@ void PASTEMAC2(ch,opname,EX_SUF) \
 \
 	/* When the diagonal of an upper- or lower-stored matrix is unit,
 	   we handle it with a separate post-processing step. */ \
+	/* NOTE: This code was disabled after I realized that when matrix A has the
+	   properties of having a unit diagonal (and being lower or upper stored),
+	   the operation should only read the strictly lower/upper triangle and
+	   leave the diagonal of B untouched. */ \
+/*
 	if ( bli_is_upper_or_lower( uplox ) && \
 	     bli_is_unit_diag( diagx ) ) \
 	{ \
@@ -465,6 +494,7 @@ void PASTEMAC2(ch,opname,EX_SUF) \
 		  rntm  \
 		); \
 	} \
+*/ \
 }
 
 INSERT_GENTFUNC_BASIC0( xpbym )