Arm64 DGEMMSUP with Extended MR&NR #655

xrq-phys · 2022-08-19T05:18:03Z

Added extended MR/NR support for kernel/armv8a/3/sup.

As num of registers in NEON is large in number and short in length, I'm here extending both MR and NR.
Way of approaching this is to represent C microtile in registers optionally in columns so for sizes like 6x7m, crr is the kernel-level default with rrr supported through an in-register transpose.

A few asm kernels are crafted for rv to complete this extended size support.
For rd I'm still relying heavily in c-intrinsic kernels with branching so the performance might not be that nice. Sorry for that.
As always, changes adopted into firestorm config.

Basically it's a templated re-implementation of current 6x8's transpose.

Fallback cases are completely handled by intrinsic-based kernels.

Typo fix: branch flag compiled by GCC

fgvanzee · 2022-08-27T22:50:12Z

Sorry for the delay in reviewing this, @xrq-phys. Looks like these changes are only active in the firestorm subconfig, so I see no harm in merging it. I assume you did some testing on your end?

xrq-phys · 2022-08-28T01:32:42Z

Yes. I've done tests on my laptop.

Haven't tuned sup's block size/threshold for other Arm64 subconfigs. Maybe another PR for Gravitons :D

Btw this PR contains #650's change.

fgvanzee · 2022-08-28T18:47:15Z

Thanks @xrq-phys!

And apologies for forgetting about #650. I've been so occupied with other BLIS-related projects.

So once I merge this, will #650 be moot? Or do we still need to merge it?

xrq-phys · 2022-08-29T01:41:05Z

No problem!

Yes #650 would be completely outdated once this is merged.

Details: - Updated Makefile and common.mk so that the targeted configuration's kernel CFLAGS are applied to source files that are found in a 'kernels' subdirectory within an enabled addon. For now, this behavior only applies when the 'kernels' directory is at the top level of the addon directory structure. For example, if there is an addon named 'foobar', the source code must be located in addon/foobar/kernels/ in order for it to be compiled with the target configurations's kernel CFLAGS. Any other source code within addon/foobar/ will be compiled with general-purpose CFLAGS (the same ones that were used on all addon code prior to this commit). Thanks to AMD (esp. Mithun Mohan) for suggesting this change and catching an intermediate bug in the PR. - Comment/whitespace updates. - (cherry picked from commit fd885cf) Fix line number issue in flattened blis.h. (#660) Details: - Updated the top-level Makefile so that it invokes flatten-headers.py without the -c option, which was requesting that comments be stripped (since comment stripping is disabled by default). - Updated flatten-headers.py to accept a new option (-l) to enable insertion of #line directives into the output file. This new option is enabled by default. - Also added logic to flatten-headers.py that outputs a warning if both comment stripping and line numbers are requested since the comment stripping will cause the line numbers to become inaccurate. - (cherry picked from commit 6e5431e) Defined invscalv, invscalm, invscald operations. (#661) Details: - Defined invert-scale (invscal) operation on vectors (level-1v), matrices (level-1m), and diagonals (level-1d). - Added test modules for invscalv and invscalm to the testsuite. - Updated BLISObjectAPI.md and BLISTypedAPI.md API documentation to reflect the new operations. Also updated KernelsHowTo.md accordingly. - Renamed 'beta' to 'alpha' in scalv and scalm testsuite modules (and input.operations files) so that the parameter name matches the parameter used in the documentation. - (cherry picked from commit 4afe0cf) Added '-q' quiet mode option to testsuite. (#657) Details: - Added support for a '-q' command line option to the testsuite. This option suppresses most informational output that would normally clutter up the screen. By default, verbose mode (the previous status quo) will be operative, and so quiet mode must be requested. - (cherry picked from commit a87eae2) Arm64 dgemmsup with extended MR&NR (#655) Details: - Since the number of registers in NEON is large but their lengths are short, I'm here extending both MR and NR. - The approach is to represent the C microtile in registers optionally in columns, so for sizes like 6x7m, the 'crr' kernel is the default with 'rrr' supported through an in-register transpose. - A few asm kernels are crafted for 'rv' to complete this extended size support. - For 'rd' I'm still relying heavily on C99 intrinsic kernels with branching so the performance might not be optimal. (Sorry for that.) - So far, these changes only affect the 'firestorm' subconfig. - This commit also contains row-preferential s12x8 and d6x8 gemm ukernels. These microkernels are templatized versions of the existing s8x12 and d6x8 ukernels defined in bli_gemm_armv8a_asm_d6x8.c. - (cherry picked from commit dfa5413) Temporarily disabled #line directives from 6826c1c. Details: - Commented out the inclusion of #line preprocessor directives in the flattened header output provided by build/flatten-headers.py. This output was added recently in 6826c1c, but was later found to have thrown off the line numbering referenced by compiler warnings and errors (possibly due to license comment blocks, which are stripped from source headers as they are inlined into the monolithic header). - (cherry picked from commit 9e5594a)

xrq-phys added 9 commits August 1, 2022 22:24

Armv8A Add Row-Major Kernels

3e259b6

Basically it's a templated re-implementation of current 6x8's transpose.

Firestorm Use Row-Maj Kernel

29c6f6e

Armv8a GEMMSUP No Longer Needs Ref

414c68e

Fallback cases are completely handled by intrinsic-based kernels.

Armv8a dgemmsup_6x8n Support Extended M

8c9ec74

Remove Debug Output

97d25cc

Armv8a NEON gemmsup_rv_d6x5m

db588eb

Armv8a dgemmsup Extend N-dim

c1e57a9

Armv8a dgemmsup Extend MR/NR for RD Case Also

56673b4

Update bli_gemmsup_rv_armv8a_asm_d6x6m.c

8847d8f

Typo fix: branch flag compiled by GCC

fgvanzee merged commit dfa5413 into flame:master Aug 30, 2022

fgvanzee mentioned this pull request Aug 30, 2022

Arm64 Row-Major d8x6 #650

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Arm64 DGEMMSUP with Extended MR&NR #655

Arm64 DGEMMSUP with Extended MR&NR #655

xrq-phys commented Aug 19, 2022 •

edited

Loading

fgvanzee commented Aug 27, 2022

xrq-phys commented Aug 28, 2022

fgvanzee commented Aug 28, 2022

xrq-phys commented Aug 29, 2022

Arm64 DGEMMSUP with Extended MR&NR #655

Arm64 DGEMMSUP with Extended MR&NR #655

Conversation

xrq-phys commented Aug 19, 2022 • edited Loading

fgvanzee commented Aug 27, 2022

xrq-phys commented Aug 28, 2022

fgvanzee commented Aug 28, 2022

xrq-phys commented Aug 29, 2022

xrq-phys commented Aug 19, 2022 •

edited

Loading