Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework bli_packm_blk_var1. #707

Merged
merged 2 commits into from
Jan 6, 2023
Merged

Rework bli_packm_blk_var1. #707

merged 2 commits into from
Jan 6, 2023

Conversation

devinamatthews
Copy link
Member

Separate the dense code (general/hermitian/symmetric/triangular fully-stored part) from the triangular, diagonal-intersecting code. This allows a more consistent usage of round-robin thread scheduling (even for the dense micro-panels) for the latter.

Separate the dense code (general/hermitian/symmetric/triangular fully-stored part) from the triangular, diagonal-intersecting code. This allows a more consistent usage of round-robin thread scheduling (even for the dense micro-panels) for the latter.
@devinamatthews
Copy link
Member Author

@fgvanzee let me know when you've checked it.

@fgvanzee
Copy link
Member

fgvanzee commented Jan 6, 2023

@devinamatthews Looks good. Unless there's more you want to do here, I'll squash with a new commit log entry once CI finishes.

@fgvanzee fgvanzee merged commit b6735ca into master Jan 6, 2023
@fgvanzee fgvanzee deleted the fix-thread-mapping-in-packm branch January 6, 2023 20:10
fgvanzee added a commit that referenced this pull request May 20, 2024
Details:
- Factored some of the structure awareness out of the loop in
  bli_packm_blk_var1(). So instead of having a single loop with
  conditionals in the body to handle various kinds of structure (and
  stored/unstored submatrix placement), we now have a conditional branch
  to handle various structure/storage scenarios with a loop in each
  section. This change was originally motivated to choose slab or round-
  robin partitioning (in the context of triangular matrices) based on
  the structure of the entire block (or panel) being packed rather than
  each micropanel individually. Previously, the code would attempt to
  limit rr to the portion of the block that intersects the diagonal and
  use slab for the remainder. However, that approach was not well-thought
  out and in many situations this would lead to inferior load balancing
  when compared to using round-robin for the entire block (or panel).
  This commit has the added benefit of incurring less overhead during
  the packing process now that each of the new loops is simpler.
- (cherry picked from commit b6735ca)

Switch to l3 sup decorator in gemmlike sandbox. (#704)

Details:
- Modified the gemmlike sandbox to call bli_l3_sup_thread_decorator()
  rather than a local analogue of that code. This reduces redundant
  logic and makes it easier for the sandbox to inherit future
  improvements to the framework's threading code.
- Moved addon/gemmd to addon/old/gemmd. This code has fallen out of date
  and is taking too much effort to maintain. We will very likely
  reimplement it completely once future changes are made to the
  framework proper.
- (cherry picked from f956b79)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants