Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REQUEST] Add missing SVE instructions to database for V2 #110

Open
stefandesouza opened this issue Dec 18, 2024 · 0 comments
Open

[REQUEST] Add missing SVE instructions to database for V2 #110

stefandesouza opened this issue Dec 18, 2024 · 0 comments
Assignees

Comments

@stefandesouza
Copy link
Collaborator

stefandesouza commented Dec 18, 2024

Is your feature request related to a problem? Please describe.
Yes. I'm trying to analyze the following kernel that uses SVE instructions with OSACA 0.6.1.

ld1w	{z1.s}, p1/z, [x8, x4,lsl #2]
sub	z1.s, z16.s, z1.s
sunpklo	z4.d, z1
sunpkhi	z1.d, z1
mad	z4.d, p0/m, z13.d, z12.d
mad	z1.d, p0/m, z13.d, z12.d
ld1w	{z0.s}, p1/z, [x7, x4,lsl #2]
sub	z0.s, z31.s, z0.s
sunpklo	z20.d, z0
sunpkhi	z0.d, z0
mad	z20.d, p0/m, z11.d, z4.d
movprfx	z4, z1
mla	z4.d, p0/m, z11.d, z0.d
ld1w	{z1.s}, p1/z, [x10, x4,lsl #2]
sub	z1.s, z15.s, z1.s
sunpklo	z19.d, z1
sunpkhi	z1.d, z1
mad	z1.d, p0/m, z10.d, z4.d
punpklo	p2.h, p1.b
sunpklo	z17.d, z7
punpkhi	p1.h, p1.b
sunpkhi	z0.d, z7
mad	z19.d, p0/m, z10.d, z20.d
mad	z0.d, p0/m, z9.d, z1.d
movprfx	z4, z19
mla	z4.d, p0/m, z9.d, z17.d
mul	z0.d, p0/m, z0.d, z8.d
mul	z4.d, p0/m, z4.d, z8.d
ld1d	{z0.d}, p1/z, [x14, z0.d]
ld1d	{z1.d}, p2/z, [x14, z4.d]
st1d	{z1.d}, p2, [x6, x4,lsl #3]
st1d	{z0.d}, p1, [x9, x4,lsl #3]
incw	z7.s, all
incw	x4, all
whilelo	p1.s, w4, w5
b.ne	412164

This is the output I get from OSACA using the Neoverse V2 architecture flag:

Open Source Architecture Code Analyzer (OSACA) - 0.6.1
Analyzed file:      fixed.asm
Architecture:       V2
Timestamp:          2024-12-18 12:49:41


 P - Throughput of LOAD operation can be hidden behind a past or future STORE instruction
 * - Instruction micro-ops not bound to a port
 X - No throughput/latency information for this instruction in data file


Combined Analysis Report
------------------------
                                                                        Port pressure in cycles                                                                         
     |  0   |  1   |  2   |  3   |  4   |  5   |  6   - 6DV  |  7   - 7DV  |  8   - 8DV  |  9   |  10  - 10DV |  11  |  12  |  13  |  14  |  15  |  16  ||  CP  | LCD  |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   1 |      |      |      |      |      |      |             |             |             |      |             |      |      |      |      |      |      ||      |      | X ld1w {z1.s}, p1/z, [x8, x4,lsl #2]
   2 |      |      |      |      |      |      |             |             |             |      |             |      |      |      |      |      |      ||      |      | X sub z1.s, z16.s, z1.s
   3 |      |      |      |      |      |      |             |             |             |      |             |      |      |      |      |      |      ||      |      | X sunpklo z4.d, z1
   4 |      |      |      |      |      |      |             |             |             |      |             |      |      |      |      |      |      ||      |      | X sunpkhi z1.d, z1
   5 |      |      |      |      |      |      |             |             |             |      |             |      |      |      |      |      |      ||  0.0 |      | X mad z4.d, p0/m, z13.d, z12.d
   6 |      |      |      |      |      |      |             |             |             |      |             |      |      |      |      |      |      ||      |      | X mad z1.d, p0/m, z13.d, z12.d
   7 |      |      |      |      |      |      |             |             |             |      |             |      |      |      |      |      |      ||      |      | X ld1w {z0.s}, p1/z, [x7, x4,lsl #2]
   8 |      |      |      |      |      |      |             |             |             |      |             |      |      |      |      |      |      ||      |      | X sub z0.s, z31.s, z0.s
   9 |      |      |      |      |      |      |             |             |             |      |             |      |      |      |      |      |      ||      |      | X sunpklo z20.d, z0
  10 |      |      |      |      |      |      |             |             |             |      |             |      |      |      |      |      |      ||      |      | X sunpkhi z0.d, z0
  11 |      |      |      |      |      |      |             |             |             |      |             |      |      |      |      |      |      ||  0.0 |      | X mad z20.d, p0/m, z11.d, z4.d
  12 |      |      |      |      |      |      |             |             | 0.25        | 0.25 | 0.25        | 0.25 |      |      |      |      |      ||      |      |   movprfx z4, z1
  13 |      |      |      |      |      |      |             |             | 0.25        | 0.25 | 0.25        | 0.25 |      |      |      |      |      ||      |      |   mla z4.d, p0/m, z11.d, z0.d
  14 |      |      |      |      |      |      |             |             |             |      |             |      |      |      |      |      |      ||      |      | X ld1w {z1.s}, p1/z, [x10, x4,lsl #2]
  15 |      |      |      |      |      |      |             |             |             |      |             |      |      |      |      |      |      ||      |      | X sub z1.s, z15.s, z1.s
  16 |      |      |      |      |      |      |             |             |             |      |             |      |      |      |      |      |      ||      |      | X sunpklo z19.d, z1
  17 |      |      |      |      |      |      |             |             |             |      |             |      |      |      |      |      |      ||      |      | X sunpkhi z1.d, z1
  18 |      |      |      |      |      |      |             |             |             |      |             |      |      |      |      |      |      ||      |      | X mad z1.d, p0/m, z10.d, z4.d
  19 |      |      |      |      |      |      |             |             |             |      |             |      |      |      |      |      |      ||      |      | X punpklo p2.h, p1.b
  20 |      |      |      |      |      |      |             |             |             |      |             |      |      |      |      |      |      ||      |      | X sunpklo z17.d, z7
  21 |      |      |      |      |      |      |             |             |             |      |             |      |      |      |      |      |      ||      |      | X punpkhi p1.h, p1.b
  22 |      |      |      |      |      |      |             |             |             |      |             |      |      |      |      |      |      ||      |      | X sunpkhi z0.d, z7
  23 |      |      |      |      |      |      |             |             |             |      |             |      |      |      |      |      |      ||  0.0 |      | X mad z19.d, p0/m, z10.d, z20.d
  24 |      |      |      |      |      |      |             |             |             |      |             |      |      |      |      |      |      ||      |      | X mad z0.d, p0/m, z9.d, z1.d
  25 |      |      |      |      |      |      |             |             | 0.25        | 0.25 | 0.25        | 0.25 |      |      |      |      |      ||  2.0 |      |   movprfx z4, z19
  26 |      |      |      |      |      |      |             |             | 0.25        | 0.25 | 0.25        | 0.25 |      |      |      |      |      ||  4.0 |      |   mla z4.d, p0/m, z9.d, z17.d
  27 |      |      |      |      |      |      |             |             |             |      |             |      |      |      |      |      |      ||      |      | X mul z0.d, p0/m, z0.d, z8.d
  28 |      |      |      |      |      |      |             |             |             |      |             |      |      |      |      |      |      ||  0.0 |      | X mul z4.d, p0/m, z4.d, z8.d
  29 |      |      |      |      |      |      |             |             |             |      |             |      | 0.33 | 0.33 | 0.33 |      |      ||      |      |   ld1d {z0.d}, p1/z, [x14, z0.d]
  30 |      |      |      |      |      |      |             |             |             |      |             |      | 0.00 | 0.00 | 1.00 |      |      ||      |      |   ld1d {z1.d}, p2/z, [x14, z4.d]
  31 |      |      |      |      |      |      |             |             |             |      |             |      | 0.50 | 0.50 |      | 1.00 | 1.00 ||      |      |   st1d {z1.d}, p2, [x6, x4,lsl #3]
  32 |      |      |      |      |      |      |             |             |             |      |             |      | 0.50 | 0.50 |      | 1.00 | 1.00 ||      |      |   st1d {z0.d}, p1, [x9, x4,lsl #3]
  33 |      |      |      |      |      |      |             |             |             |      |             |      |      |      |      |      |      ||      |      | X incw z7.s, all
  34 |      |      |      |      |      |      |             |             |             |      |             |      |      |      |      |      |      ||      |      | X incw x4, all
  35 |      |      |      |      |      |      |             |             |             |      |             |      |      |      |      |      |      ||      |      | X whilelo p1.s, w4, w5
  36 |      |      |      |      |      |      |             |             |             |      |             |      |      |      |      |      |      ||      |      | X b.ne 412164

------------------ WARNING: The performance data for 28 instructions is missing.------------------
                     No final analysis is given. If you want to ignore this
                     warning and run the analysis anyway, start osaca with
                                       --ignore-unknown flag.
--------------------------------------------------------------------------------------------------




Loop-Carried Dependencies Analysis Report
-----------------------------------------

There's missing performance data for quite a few instructions. Some of them, liked the SVE signed unpacks aren't in the database for V2. Some of them like sub and mul are present, but not for SVE z-registers. Finally the load ld1w is present, but doesn't seem to match (maybe because of the predicate?)

Describe the solution you'd like
The 'Code Quality Analyzer' in the MAQAO tools seems to provide performance data. Apparently for ARM64 they take latencies from a developer manual and aren't actually benchmarked. The current released version of MAQAO doesn't have V2 architecture support, so this analysis was done for V1, hence the different port model- but something similar would be great.

maqao

@stefandesouza stefandesouza changed the title [REQUEST] Add missing SVE instructions to database [REQUEST] Add missing SVE instructions to database for V2 Dec 18, 2024
@JanLJL JanLJL self-assigned this Dec 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants