Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arm64 DGEMMSUP with Extended MR&NR #655

Merged
merged 9 commits into from
Aug 30, 2022
Merged

Conversation

xrq-phys
Copy link
Collaborator

@xrq-phys xrq-phys commented Aug 19, 2022

Added extended MR/NR support for kernel/armv8a/3/sup.

As num of registers in NEON is large in number and short in length, I'm here extending both MR and NR.
Way of approaching this is to represent C microtile in registers optionally in columns so for sizes like 6x7m, crr is the kernel-level default with rrr supported through an in-register transpose.

  • A few asm kernels are crafted for rv to complete this extended size support.
  • For rd I'm still relying heavily in c-intrinsic kernels with branching so the performance might not be that nice. Sorry for that.
  • As always, changes adopted into firestorm config.

@fgvanzee
Copy link
Member

Sorry for the delay in reviewing this, @xrq-phys. Looks like these changes are only active in the firestorm subconfig, so I see no harm in merging it. I assume you did some testing on your end?

@xrq-phys
Copy link
Collaborator Author

Yes. I've done tests on my laptop.

Haven't tuned sup's block size/threshold for other Arm64 subconfigs. Maybe another PR for Gravitons :D

Btw this PR contains #650's change.

@fgvanzee
Copy link
Member

Thanks @xrq-phys!

And apologies for forgetting about #650. I've been so occupied with other BLIS-related projects.

So once I merge this, will #650 be moot? Or do we still need to merge it?

@xrq-phys
Copy link
Collaborator Author

No problem!

Yes #650 would be completely outdated once this is merged.

@fgvanzee fgvanzee merged commit dfa5413 into flame:master Aug 30, 2022
@fgvanzee fgvanzee mentioned this pull request Aug 30, 2022
fgvanzee added a commit that referenced this pull request Oct 26, 2023
Details:
- Updated Makefile and common.mk so that the targeted configuration's
  kernel CFLAGS are applied to source files that are found in a
  'kernels' subdirectory within an enabled addon. For now, this
  behavior only applies when the 'kernels' directory is at the top
  level of the addon directory structure. For example, if there is an
  addon named 'foobar', the source code must be located in
  addon/foobar/kernels/ in order for it to be compiled with the target
  configurations's kernel CFLAGS. Any other source code within
  addon/foobar/ will be compiled with general-purpose CFLAGS (the same
  ones that were used on all addon code prior to this commit). Thanks
  to AMD (esp. Mithun Mohan) for suggesting this change and catching an
  intermediate bug in the PR.
- Comment/whitespace updates.
- (cherry picked from commit fd885cf)

Fix line number issue in flattened blis.h. (#660)

Details:
- Updated the top-level Makefile so that it invokes flatten-headers.py
  without the -c option, which was requesting that comments be stripped
  (since comment stripping is disabled by default).
- Updated flatten-headers.py to accept a new option (-l) to enable
  insertion of #line directives into the output file. This new option
  is enabled by default.
- Also added logic to flatten-headers.py that outputs a warning if both
  comment stripping and line numbers are requested since the comment
  stripping will cause the line numbers to become inaccurate.
- (cherry picked from commit 6e5431e)

Defined invscalv, invscalm, invscald operations. (#661)

Details:
- Defined invert-scale (invscal) operation on vectors (level-1v),
  matrices (level-1m), and diagonals (level-1d).
- Added test modules for invscalv and invscalm to the testsuite.
- Updated BLISObjectAPI.md and BLISTypedAPI.md API documentation to
  reflect the new operations. Also updated KernelsHowTo.md accordingly.
- Renamed 'beta' to 'alpha' in scalv and scalm testsuite modules (and
  input.operations files) so that the parameter name matches the
  parameter used in the documentation.
- (cherry picked from commit 4afe0cf)

Added '-q' quiet mode option to testsuite. (#657)

Details:
- Added support for a '-q' command line option to the testsuite. This
  option suppresses most informational output that would normally
  clutter up the screen. By default, verbose mode (the previous
  status quo) will be operative, and so quiet mode must be requested.
- (cherry picked from commit a87eae2)

Arm64 dgemmsup with extended MR&NR (#655)

Details:
- Since the number of registers in NEON is large but their lengths are
  short, I'm here extending both MR and NR.
- The approach is to represent the C microtile in registers optionally
  in columns, so for sizes like 6x7m, the 'crr' kernel is the default
  with 'rrr' supported through an in-register transpose.
- A few asm kernels are crafted for 'rv' to complete this extended size
  support.
- For 'rd' I'm still relying heavily on C99 intrinsic kernels with
  branching so the performance might not be optimal. (Sorry for that.)
- So far, these changes only affect the 'firestorm' subconfig.
- This commit also contains row-preferential s12x8 and d6x8 gemm
  ukernels. These microkernels are templatized versions of the existing
  s8x12 and d6x8 ukernels defined in bli_gemm_armv8a_asm_d6x8.c.
- (cherry picked from commit dfa5413)

Temporarily disabled #line directives from 6826c1c.

Details:
- Commented out the inclusion of #line preprocessor directives in the
  flattened header output provided by build/flatten-headers.py. This
  output was added recently in 6826c1c, but was later found to have
  thrown off the line numbering referenced by compiler warnings and
  errors (possibly due to license comment blocks, which are stripped
  from source headers as they are inlined into the monolithic header).
- (cherry picked from commit 9e5594a)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants