-
Notifications
You must be signed in to change notification settings - Fork 374
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Default BLIS_[MNK]T values never actually set #781
Comments
I'm not seeing where Am I missing something? |
The middle step is just in my branch but the end is the same. |
@devinamatthews Do you agree that this fixes the issue? ...
if ( v_s >= 0 ) bli_blksz_set_def( v_s, BLIS_FLOAT, b_dst );
if ( v_d >= 0 ) bli_blksz_set_def( v_d, BLIS_DOUBLE, b_dst );
if ( v_c >= 0 ) bli_blksz_set_def( v_c, BLIS_SCOMPLEX, b_dst );
if ( v_z >= 0 ) bli_blksz_set_def( v_z, BLIS_DCOMPLEX, b_dst );
if ( e_s >= 0 ) bli_blksz_set_max( e_s, BLIS_FLOAT, b_dst );
if ( e_d >= 0 ) bli_blksz_set_max( e_d, BLIS_DOUBLE, b_dst );
if ( e_c >= 0 ) bli_blksz_set_max( e_c, BLIS_SCOMPLEX, b_dst );
if ( e_z >= 0 ) bli_blksz_set_max( e_z, BLIS_DCOMPLEX, b_dst );
} |
Yes, or setting the default blocksizes to 1. |
OR, zeroing out all blocksizes and kernel pointers initially. |
Wait, now I'm doubting myself that the above code snippet is even a fix. 🤔 |
Hmm, my concern is that with default thresholds of 1, problem sizes of 0 would cause the sup handler to be invoked, which I would prefer not to happen. |
In #710 the zero-dim cases are handled earlier. |
Okay, after studying the code a bit more, I'm back on board with this fix. |
Details: - Fixed a bug that resulted in BLIS non-deterministically calling the gemmsup handler, irrespective of the thresholds that are registered via bli_cntx_set_blkszs(). - Deep dive: In bli_cntx_init_ref.c, the default values for the gemmsup thresholds (BLIS_[MNK]T blocksizes) wre being set to zero so that no operation ever matched the criteria for gemmsup (unless specific sup thresholds are registered). HOWEVER, these thresholds are set via bli_cntx_set_blkszs() which calls bli_blksz_copy_if_pos(), which was only coping the thresholds into the gks' cntx_t if the values were strictly positive. Thus, the zero values passed into bli_cntx_set_blkszs() were being ignored and those threshold slots within the gks were left uninitialized. The upshot of this is that the reference gemmsup handler was being called for gemm problems essentially at random (and as it turns out, very rarely the reference gemmsup implementation would encounter a divide-by-zero error). - The problem was fixed by changing bli_blksz_copy_if_pos() so that it copies values that are non-negative (values >= 0 instead of > 0). The function was also renamed to bli_blksz_copy_if_nonneg() - Fixes #781. Thanks to Devin Matthews for identifying, diagnosing, and proposing a fix for this issue.
Details: - Fixed a bug that resulted in BLIS non-deterministically calling the gemmsup handler, irrespective of the thresholds that are registered via bli_cntx_set_blkszs(). - Deep dive: In bli_cntx_init_ref.c, the default values for the gemmsup thresholds (BLIS_[MNK]T blocksizes) wre being set to zero so that no operation ever matched the criteria for gemmsup (unless specific sup thresholds are registered). HOWEVER, these thresholds are set via bli_cntx_set_blkszs() which calls bli_blksz_copy_if_pos(), which was only coping the thresholds into the gks' cntx_t if the values were strictly positive. Thus, the zero values passed into bli_cntx_set_blkszs() were being ignored and those threshold slots within the gks were left uninitialized. The upshot of this is that the reference gemmsup handler was being called for gemm problems essentially at random (and as it turns out, very rarely the reference gemmsup implementation would encounter a divide-by-zero error). - The problem was fixed by changing bli_blksz_copy_if_pos() so that it copies values that are non-negative (values >= 0 instead of > 0). The function was also renamed to bli_blksz_copy_if_nonneg() - Also needed to standardize use of -1 as the sole value to embed into blksz_t structs as a signal to bli_cntx_set_blkszs() to *not* register a value for that slot (and instead let whatever existing values remain). This required updates to the bli_cntx_init_*() functions for bgq, cortexa9, knc, penryn, power7, and template subconfigs, as some of these codes were using 0 instead of -1. - Fixes #781. Thanks to Devin Matthews for identifying, diagnosing, and proposing a fix for this issue.
Details: - Added a new 'sifive_x280' subconfiguration for SiFive's x280 RISC-V instruction set architecture. The subconfig registers kernels from a correspondingly new kernel set, also named 'sifive_x280'. - Added the aforementioned kernel set, which includes intrinsics- and assembly-based implementations of most level-1v kernels along with level-1f kernels axpy2v dotaxpyv, packm kernels, and level-3 gemm, gemmtrsm_l, and gemmtrsm_u microkernels (plus supporting files). - Registered the 'sifive_x280' subconfig as belonging to a singleton family by the same name. - Added an entry to '.travis.yml' to test the new subconfig via qemu. - Updates to 'travis/do_riscv.sh' script to support the 'sifive_x280' subconfig and to reflect updated tarball names. - Special thanks to Lee Killough, Devin Matthews, and Angelika Schwarz for their engagement on this commit. - (cherry picked from commit 05388dd) Fixed HPX barrier synchronization (#783) Details: - Fixed hpx barrier synchronization. HPX was hanging on larger cores because blis was using non-hpx synchronization primitives. But when using hpx-runtime only hpx-synchronization primitives should be used. Hence, a C style wrapper hpx_barrier_t is introduced to perform hpx barrier operations. - Replaced hpx::for_loop with hpx::futures. Using hpx::for_loop with hpx::barrier on n_threads greater than actual hardware thread count causes synchronization issues making hpx hanging. This can be avoided by using hpx::futures, which are relatively very lightweight, robust and scalable. - (cherry picked from 7a87e57) Fixed bug in sup threshold registration. (#782) Details: - Fixed a bug that resulted in BLIS non-deterministically calling the gemmsup handler, irrespective of the thresholds that are registered via bli_cntx_set_blkszs(). - Deep dive: In bli_cntx_init_ref.c, the default values for the gemmsup thresholds (BLIS_[MNK]T blocksizes) wre being set to zero so that no operation ever matched the criteria for gemmsup (unless specific sup thresholds are registered). HOWEVER, these thresholds are set via bli_cntx_set_blkszs() which calls bli_blksz_copy_if_pos(), which was only coping the thresholds into the gks' cntx_t if the values were strictly positive. Thus, the zero values passed into bli_cntx_set_blkszs() were being ignored and those threshold slots within the gks were left uninitialized. The upshot of this is that the reference gemmsup handler was being called for gemm problems essentially at random (and as it turns out, very rarely the reference gemmsup implementation would encounter a divide-by-zero error). - The problem was fixed by changing bli_blksz_copy_if_pos() so that it copies values that are non-negative (values >= 0 instead of > 0). The function was also renamed to bli_blksz_copy_if_nonneg() - Also needed to standardize use of -1 as the sole value to embed into blksz_t structs as a signal to bli_cntx_set_blkszs() to *not* register a value for that slot (and instead let whatever existing values remain). This required updates to the bli_cntx_init_*() functions for bgq, cortexa9, knc, penryn, power7, and template subconfigs, as some of these codes were using 0 instead of -1. - Fixes #781. Thanks to Devin Matthews for identifying, diagnosing, and proposing a fix for this issue. - (cherry picked from 8fff1e3) Update zen3 subconfig to support NVHPC compilers. (#779) Details: - Parse $(CC_VENDOR) values of "nvc" in 'zen3' make_defs.mk file. - Minor refactor to accommodate above edit. - CREDITS file update. - (cherry picked from 1e264a4) Fixed brokenness when sba is disabled. (#777) Details: - Previously, disabling the sba via --disable-sba-pools resulted in a segfault due to a sanity-check-triggering abort(). The problem was that the sba, as currently used in the l3 thread decorators, did not yet (fully) support pools being disabled. The solution entailed creating wrapper function, bli_sba_array_elem(), which either calls bli_apool_array_elem() (when sba pools are enabled at configure time) or returns a NULL sba_pool pointer (when sba pools are disabled), and calling bli_sba_array_elem() in place of bli_apool_array_elem(). Note that the NULL pointer returned by bli_sba_array_elem() when the sba pools are disabled does no harm since in that situation the pointer goes unreferenced when acquiring and releasing small blocks. Thanks to John Mather for reporting this bug. - Guarded the bodies of bli_sba_init() and bli_sba_finalize() with #ifdef BLIS_ENABLE_SBA_POOLS. I don't think this was actually necessary to fix the aforementioned bug, but it seems like good practice. - Moved the code in bli_l3_thrinfo_create() that checked that the array* pointer is non-NULL before calling bli_sba_array_elem() (previously bli_apool_array_elem()) into the definition of bli_sba_array_elem(). - Renamed various instances of 'pool' variables and function parameters to 'sba_pool' to emphasize what kind of pool it represents. - Whitespace changes. - (cherry picked from c2099ed) Implemented [cz]symv_(), [cz]syr_(), [cz]rot_(). (#778) Details: - Expanded existing BLAS compatibility APIs to provide interfaces to [cz]symv_(), [cz]syr_(). This was easy since those operations were already implemented natively in BLIS; the APIs were previously omitted only because they were not formally part of the BLAS. - Implemented [cz]rot_() by feeding code from LAPACK 3.11 through f2c. - Thanks to James Foster for pointing out that LAPACK contains these additional symbols, which prompted these additions, as well as for testing the [cz]rot_() functions from Julia's test infrastructure. - CREDITS file update. - (cherry picked from 37ca4fd) Fixes to HPC runtime code path. (#773) Details: - Fixed hpx::for_each invocation and replace with hpx::for_loop. The HPX runtime was initialized using hpx::start, but the hpx::for_each function was being called on a non-hpx runtime (i.e standard BLIS runtime - single main thread). To run hpx::for_each on HPX runtime correctly, the code now uses hpx::run_as_hpx_thread(func, args...). - Replaced hpx::for_each with hpx::for_loop, which eliminates use of hpx::util::counting_iterator. - Employ hpx::execution::chunk_size(1) to make sure that a thread resides on a particular core. - Replaced hpx::apply() with updated version hpx::post(). - Initialize tdata->id = 0 in libblis.c to 0, as it is the main thread and is needed for writing results to output file. - By default, if not specified, the HPX runtime uses all N threads/cores available in the system. But, if we want to only specify n_threads out N threads, we use hpx::execution::experimental::num_cores(n_threads). - (cherry picked from a4a6329) Fixed broken link in Multithreading.md. (#774) Details: - Replaced 404'd link in docs/Multithreading.md with an archive from The Wayback Machine. - CREDITS file update. - (cherry picked from c6546c1)
In
bli_cntx_init_ref.c
, the default values for the gemmsup thresholds (BLIS_[MNK]T
blocksizes) are set to zero, so that no operation ever matches the criteria for gemmsup. HOWEVER, these are set viabli_cntx_set_blkszs
which callsbli_cntx_set_blksz
which callsbli_blksz_copy_if_pos
. Thus, the zero values are simply discarded and those blocksizes remain uninitialized. The upshot of this is that the reference gemmsup handler is called for gemm problems essentially at random (and as it turns out, very rarely the reference gemmsup implementation encounters a divide-by-zero error). Becausebli_gemmt?sup
checks if m, n, and k are less than the thresholds, setting the default values to 1 should be safe and effective.The text was updated successfully, but these errors were encountered: