forked from flame/blis
-
Notifications
You must be signed in to change notification settings - Fork 37
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge commit '81e10346' into amd-main
* commit '81e10346': Alloc at least 1 elem in pool_t block_ptrs. (flame#560) Fix insufficient pool-growing logic in bli_pool.c. (flame#559) Arm SVE C/ZGEMM Fix FMOV 0 Mistake SH Kernel Unused Eigher Arm SVE C/ZGEMM Support *beta==0 Arm SVE Config armsve Use ZGEMM/CGEMM Arm SVE: Update Perf. Graph Arm SVE CGEMM 2Vx10 Unindex Process Alpha=1.0 Arm SVE ZGEMM 2Vx10 Unindex Process Alpha=1.0 A64FX Config Use ZGEMM/CGEMM Arm SVE Typo Fix ZGEMM/CGEMM C Prefetch Reg Arm SVE Add SGEMM 2Vx10 Unindexed Arm SVE ZGEMM Support Gather Load / Scatt. St. Arm SVE Add ZGEMM 2Vx10 Unindexed Arm SVE Add ZGEMM 2Vx7 Unindexed Arm SVE Add ZGEMM 2Vx8 Unindexed Update Travis CI badge Armv8 Trash New Bulk Kernels Enable testing 1m in `make check`. Config ArmSVE Unregister 12xk. Move 12xk to Old Revert __has_include(). Distinguish w/ BLIS_FAMILY_** Register firestorm into arm64 Metaconfig Armv8 DGEMMSUP Fix Edge 6x4 Switch Case Typo Armv8 DGEMMSUP Fix 8x4m Store Inst. Typo Add test for Apple M1 (firestorm) Firestorm CPUID Dispatcher Armv8 GEMMSUP Edge Cases Require Signed Ints Make error checking level a thread-local variable. Fix data race in testsuite. Update .appveyor.yml Firestorm Block Size Fixes Armv8 Handle *beta == 0 for GEMMSUP ??r Case. Move unused ARM SVE kernels to "old" directory. Add an option to control whether or not to use @rpath. Fix $ORIGIN usage on linux. Arm micro-architecture dispatch (flame#344) Use @path-based install name on MacOS and use relocatable RPATH entries for testsuite inaries. Armv8 Handle *beta == 0 for GEMMSUP ?rc Case. Armv8 Fix 6x8 Row-Maj Ukr Apply patch from @xrq-phys. Add explicit handling for beta == 0 in armsve sd and armv7a d gemm ukrs. bli_error: more cleanup on the error strings array Arm SVE Exclude SVE-Intrinsic Kernels for GCC 8-9 Arm SVE: Correct PACKM Ker Name: Intrinsic Kers Fix config_name in bli_arch.c Arm Whole GEMMSUP Call Route is Asm/Int Optimized Arm: DGEMMSUP `Macro' Edge Cases Stop Calling Ref Header Typo Arm: DGEMMSUP ??r(rv) Invoke Edge Size Arm: DGEMMSUP ?rc(rd) Invoke Edge Size Arm: Implement GEMMSUP Fallback Method Arm64 Fix: Support Alpha/Beta in GEMMSUP Intrin Added Apple Firestorm (A14/M1) Subconfig Arm64 8x4 Kernel Use Less Regs Armv8-A Supplimentary GEMMSUP Sizes for RD Armv8-A Fix GEMMSUP-RD Kernels on GNU Asm Armv8-A Adjust Types for PACKM Kernels Armv8-A GEMMSUP-RD 6x8m Armv8-A GEMMSUP-RD 6x8n Armv8-A s/d Packing Kernels Fix Typo Armv8-A Introduced s/d Packing Kernels Armv8-A DGEMMSUP 6x8m Kernel Armv8-A DGEMMSUP Adjustments Armv8-A Add More DGEMMSUP Armv8-A Add GEMMSUP 4x8n Kernel Armv8-A Add Part of GEMMSUP 8x4m Kernel Armv8A DGEMM 4x4 Kernel WIP. Slow Armv8-A Add 8x4 Kernel WIP AMD-Internal: [CPUPL-2698] Change-Id: I194ff69356740bb36ca189fd1bf9fef02eec3803
- Loading branch information
Showing
80 changed files
with
11,427 additions
and
283 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,5 @@ | ||
skip_branch_with_pr: true | ||
|
||
environment: | ||
matrix: | ||
- LIB_TYPE: shared | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,144 @@ | ||
/* | ||
BLIS | ||
An object-based framework for developing high-performance BLAS-like | ||
libraries. | ||
Copyright (C) 2014, The University of Texas at Austin | ||
Redistribution and use in source and binary forms, with or without | ||
modification, are permitted provided that the following conditions are | ||
met: | ||
- Redistributions of source code must retain the above copyright | ||
notice, this list of conditions and the following disclaimer. | ||
- Redistributions in binary form must reproduce the above copyright | ||
notice, this list of conditions and the following disclaimer in the | ||
documentation and/or other materials provided with the distribution. | ||
- Neither the name(s) of the copyright holder(s) nor the names of its | ||
contributors may be used to endorse or promote products derived | ||
from this software without specific prior written permission. | ||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS | ||
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT | ||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR | ||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT | ||
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, | ||
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT | ||
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, | ||
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY | ||
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT | ||
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE | ||
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. | ||
*/ | ||
|
||
#include "blis.h" | ||
|
||
void bli_cntx_init_firestorm( cntx_t* cntx ) | ||
{ | ||
blksz_t blkszs[ BLIS_NUM_BLKSZS ]; | ||
blksz_t thresh[ BLIS_NUM_THRESH ]; | ||
|
||
// Set default kernel blocksizes and functions. | ||
bli_cntx_init_firestorm_ref( cntx ); | ||
|
||
// ------------------------------------------------------------------------- | ||
|
||
// Update the context with optimized native gemm micro-kernels and | ||
// their storage preferences. | ||
bli_cntx_set_l3_nat_ukrs | ||
( | ||
2, | ||
BLIS_GEMM_UKR, BLIS_FLOAT, bli_sgemm_armv8a_asm_8x12, FALSE, | ||
BLIS_GEMM_UKR, BLIS_DOUBLE, bli_dgemm_armv8a_asm_6x8, FALSE, | ||
cntx | ||
); | ||
|
||
// Update the context with optimized packm kernels. | ||
bli_cntx_set_packm_kers | ||
( | ||
4, | ||
BLIS_PACKM_8XK_KER, BLIS_FLOAT, bli_spackm_armv8a_int_8xk, | ||
BLIS_PACKM_12XK_KER, BLIS_FLOAT, bli_spackm_armv8a_int_12xk, | ||
BLIS_PACKM_6XK_KER, BLIS_DOUBLE, bli_dpackm_armv8a_int_6xk, | ||
BLIS_PACKM_8XK_KER, BLIS_DOUBLE, bli_dpackm_armv8a_int_8xk, | ||
cntx | ||
); | ||
|
||
// Initialize level-3 blocksize objects with architecture-specific values. | ||
// s d c z | ||
bli_blksz_init_easy( &blkszs[ BLIS_MR ], 8, 6, -1, -1 ); | ||
bli_blksz_init_easy( &blkszs[ BLIS_NR ], 12, 8, -1, -1 ); | ||
bli_blksz_init_easy( &blkszs[ BLIS_MC ], 120, 252, -1, -1 ); | ||
bli_blksz_init_easy( &blkszs[ BLIS_KC ], 640, 3072, -1, -1 ); | ||
bli_blksz_init_easy( &blkszs[ BLIS_NC ], 3072, 8192, -1, -1 ); | ||
|
||
// Update the context with the current architecture's register and cache | ||
// blocksizes (and multiples) for native execution. | ||
bli_cntx_set_blkszs | ||
( | ||
BLIS_NAT, 5, | ||
BLIS_NC, &blkszs[ BLIS_NC ], BLIS_NR, | ||
BLIS_KC, &blkszs[ BLIS_KC ], BLIS_KR, | ||
BLIS_MC, &blkszs[ BLIS_MC ], BLIS_MR, | ||
BLIS_NR, &blkszs[ BLIS_NR ], BLIS_NR, | ||
BLIS_MR, &blkszs[ BLIS_MR ], BLIS_MR, | ||
cntx | ||
); | ||
|
||
// ------------------------------------------------------------------------- | ||
|
||
// Initialize sup thresholds with architecture-appropriate values. | ||
// s d c z | ||
bli_blksz_init_easy( &thresh[ BLIS_MT ], -1, 99, -1, -1 ); | ||
bli_blksz_init_easy( &thresh[ BLIS_NT ], -1, 99, -1, -1 ); | ||
bli_blksz_init_easy( &thresh[ BLIS_KT ], -1, 99, -1, -1 ); | ||
|
||
// Initialize the context with the sup thresholds. | ||
bli_cntx_set_l3_sup_thresh | ||
( | ||
3, | ||
BLIS_MT, &thresh[ BLIS_MT ], | ||
BLIS_NT, &thresh[ BLIS_NT ], | ||
BLIS_KT, &thresh[ BLIS_KT ], | ||
cntx | ||
); | ||
|
||
// Update the context with optimized small/unpacked gemm kernels. | ||
bli_cntx_set_l3_sup_kers | ||
( | ||
8, | ||
BLIS_RRR, BLIS_DOUBLE, bli_dgemmsup_rv_armv8a_asm_6x8m, TRUE, | ||
BLIS_RRC, BLIS_DOUBLE, bli_dgemmsup_rd_armv8a_asm_6x8m, TRUE, | ||
BLIS_RCR, BLIS_DOUBLE, bli_dgemmsup_rv_armv8a_asm_6x8m, TRUE, | ||
BLIS_RCC, BLIS_DOUBLE, bli_dgemmsup_rv_armv8a_asm_6x8n, TRUE, | ||
BLIS_CRR, BLIS_DOUBLE, bli_dgemmsup_rv_armv8a_asm_6x8m, TRUE, | ||
BLIS_CRC, BLIS_DOUBLE, bli_dgemmsup_rd_armv8a_asm_6x8n, TRUE, | ||
BLIS_CCR, BLIS_DOUBLE, bli_dgemmsup_rv_armv8a_asm_6x8n, TRUE, | ||
BLIS_CCC, BLIS_DOUBLE, bli_dgemmsup_rv_armv8a_asm_6x8n, TRUE, | ||
cntx | ||
); | ||
|
||
// Initialize level-3 sup blocksize objects with architecture-specific | ||
// values. | ||
// s d c z | ||
bli_blksz_init_easy( &blkszs[ BLIS_MR ], -1, 6, -1, -1 ); | ||
bli_blksz_init_easy( &blkszs[ BLIS_NR ], -1, 8, -1, -1 ); | ||
bli_blksz_init_easy( &blkszs[ BLIS_MC ], -1, 240, -1, -1 ); | ||
bli_blksz_init_easy( &blkszs[ BLIS_KC ], -1, 1024, -1, -1 ); | ||
bli_blksz_init_easy( &blkszs[ BLIS_NC ], -1, 3072, -1, -1 ); | ||
|
||
// Update the context with the current architecture's register and cache | ||
// blocksizes for small/unpacked level-3 problems. | ||
bli_cntx_set_l3_sup_blkszs | ||
( | ||
5, | ||
BLIS_NC, &blkszs[ BLIS_NC ], | ||
BLIS_KC, &blkszs[ BLIS_KC ], | ||
BLIS_MC, &blkszs[ BLIS_MC ], | ||
BLIS_NR, &blkszs[ BLIS_NR ], | ||
BLIS_MR, &blkszs[ BLIS_MR ], | ||
cntx | ||
); | ||
} | ||
|
Oops, something went wrong.