Rework bli_packm_blk_var1. #707

devinamatthews · 2023-01-05T23:30:28Z

Separate the dense code (general/hermitian/symmetric/triangular fully-stored part) from the triangular, diagonal-intersecting code. This allows a more consistent usage of round-robin thread scheduling (even for the dense micro-panels) for the latter.

devinamatthews · 2023-01-06T05:17:24Z

@fgvanzee let me know when you've checked it.

fgvanzee · 2023-01-06T19:35:39Z

@devinamatthews Looks good. Unless there's more you want to do here, I'll squash with a new commit log entry once CI finishes.

Details: - Factored some of the structure awareness out of the loop in bli_packm_blk_var1(). So instead of having a single loop with conditionals in the body to handle various kinds of structure (and stored/unstored submatrix placement), we now have a conditional branch to handle various structure/storage scenarios with a loop in each section. This change was originally motivated to choose slab or round- robin partitioning (in the context of triangular matrices) based on the structure of the entire block (or panel) being packed rather than each micropanel individually. Previously, the code would attempt to limit rr to the portion of the block that intersects the diagonal and use slab for the remainder. However, that approach was not well-thought out and in many situations this would lead to inferior load balancing when compared to using round-robin for the entire block (or panel). This commit has the added benefit of incurring less overhead during the packing process now that each of the new loops is simpler. - (cherry picked from commit b6735ca) Switch to l3 sup decorator in gemmlike sandbox. (#704) Details: - Modified the gemmlike sandbox to call bli_l3_sup_thread_decorator() rather than a local analogue of that code. This reduces redundant logic and makes it easier for the sandbox to inherit future improvements to the framework's threading code. - Moved addon/gemmd to addon/old/gemmd. This code has fallen out of date and is taking too much effort to maintain. We will very likely reimplement it completely once future changes are made to the framework proper. - (cherry picked from f956b79)

Rework bli_packm_blk_var1.

fb5b4eb

Separate the dense code (general/hermitian/symmetric/triangular fully-stored part) from the triangular, diagonal-intersecting code. This allows a more consistent usage of round-robin thread scheduling (even for the dense micro-panels) for the latter.

devinamatthews requested a review from fgvanzee January 5, 2023 23:30

Whitespace/comment tweaks.

ccc8da6

fgvanzee merged commit b6735ca into master Jan 6, 2023

fgvanzee deleted the fix-thread-mapping-in-packm branch January 6, 2023 20:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework bli_packm_blk_var1. #707

Rework bli_packm_blk_var1. #707

devinamatthews commented Jan 5, 2023

devinamatthews commented Jan 6, 2023

fgvanzee commented Jan 6, 2023

Rework bli_packm_blk_var1. #707

Rework bli_packm_blk_var1. #707

Conversation

devinamatthews commented Jan 5, 2023

devinamatthews commented Jan 6, 2023

fgvanzee commented Jan 6, 2023