Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port System.Runtime.Extensions.Performance.Tests #64

Closed
adamsitnik opened this issue Jun 26, 2018 · 0 comments
Closed

Port System.Runtime.Extensions.Performance.Tests #64

adamsitnik opened this issue Jun 26, 2018 · 0 comments
Assignees
Labels
good first issue Good for newcomers

Comments

@adamsitnik
Copy link
Member

No description provided.

@adamsitnik adamsitnik added the good first issue Good for newcomers label Jun 26, 2018
@ViktorHofer ViktorHofer self-assigned this Jul 11, 2018
@adamsitnik adamsitnik assigned adamsitnik and unassigned ViktorHofer Aug 15, 2018
kunalspathak added a commit to kunalspathak/performance that referenced this issue Apr 13, 2020
We were consuming the values returned from `GetPinnedReference()` API, however
`.Consume()` uses `volatile` and that introduces memory barriers for ARM64.
Since we just want to measure the performance of `GetPinnedReference()` it is
unnecessary to introduce a differentiating factor for 1 architecture and not the
other. Hence I have removed the `.Consume()` and instead just storing the returned
result in a variable.

Before:
```asm
...
G_M52573_IG23:
        79400063          ldrh    w3, [x3]
        D5033BBF          dmb     ish
        79008023          strh    w3, [x1,dotnet#64]
        D2800003          mov     x3, #0
        34000040          cbz     w0, G_M52573_IG25
                                                ;; bbWeight=1    PerfScore 15.50
G_M52573_IG24:
        AA0203E3          mov     x3, x2
                                                ;; bbWeight=0.25 PerfScore 0.13
G_M52573_IG25:
        79400063          ldrh    w3, [x3]
        D5033BBF          dmb     ish
        79008023          strh    w3, [x1,dotnet#64]
        D2800003          mov     x3, #0
        34000040          cbz     w0, G_M52573_IG27
                                                ;; bbWeight=1    PerfScore 15.50
G_M52573_IG26:
        AA0203E3          mov     x3, x2
                                                ;; bbWeight=0.25 PerfScore 0.13
...

```

After
```asm
...
G_M51552_IG23:
        79400021          ldrh    w1, [x1]
        D2800001          mov     x1, #0
        34000040          cbz     w0, G_M51552_IG25
                                                ;; bbWeight=1    PerfScore 4.50
G_M51552_IG24:
        AA0203E1          mov     x1, x2
                                                ;; bbWeight=0.25 PerfScore 0.13
G_M51552_IG25:
        79400021          ldrh    w1, [x1]
        D2800001          mov     x1, #0
        34000040          cbz     w0, G_M51552_IG27
                                                ;; bbWeight=1    PerfScore 4.50
G_M51552_IG26:
        AA0203E1          mov     x1, x2
                                                ;; bbWeight=0.25 PerfScore 0.13
...

```

This change reduces the ARM64 numbers for this benchmark from 10ns to 1ns. I am not sure if
we should just add `Consume()` method and mark it as `NoInline`. That's what is done for
`System.Memory.Span<T>.GetPinnedReference()` benchmark.
adamsitnik pushed a commit that referenced this issue Apr 15, 2020
* Stop consuming pinned references

We were consuming the values returned from `GetPinnedReference()` API, however
`.Consume()` uses `volatile` and that introduces memory barriers for ARM64.
Since we just want to measure the performance of `GetPinnedReference()` it is
unnecessary to introduce a differentiating factor for 1 architecture and not the
other. Hence I have removed the `.Consume()` and instead just storing the returned
result in a variable.

Before:
```asm
...
G_M52573_IG23:
        79400063          ldrh    w3, [x3]
        D5033BBF          dmb     ish
        79008023          strh    w3, [x1,#64]
        D2800003          mov     x3, #0
        34000040          cbz     w0, G_M52573_IG25
                                                ;; bbWeight=1    PerfScore 15.50
G_M52573_IG24:
        AA0203E3          mov     x3, x2
                                                ;; bbWeight=0.25 PerfScore 0.13
G_M52573_IG25:
        79400063          ldrh    w3, [x3]
        D5033BBF          dmb     ish
        79008023          strh    w3, [x1,#64]
        D2800003          mov     x3, #0
        34000040          cbz     w0, G_M52573_IG27
                                                ;; bbWeight=1    PerfScore 15.50
G_M52573_IG26:
        AA0203E3          mov     x3, x2
                                                ;; bbWeight=0.25 PerfScore 0.13
...

```

After
```asm
...
G_M51552_IG23:
        79400021          ldrh    w1, [x1]
        D2800001          mov     x1, #0
        34000040          cbz     w0, G_M51552_IG25
                                                ;; bbWeight=1    PerfScore 4.50
G_M51552_IG24:
        AA0203E1          mov     x1, x2
                                                ;; bbWeight=0.25 PerfScore 0.13
G_M51552_IG25:
        79400021          ldrh    w1, [x1]
        D2800001          mov     x1, #0
        34000040          cbz     w0, G_M51552_IG27
                                                ;; bbWeight=1    PerfScore 4.50
G_M51552_IG26:
        AA0203E1          mov     x1, x2
                                                ;; bbWeight=0.25 PerfScore 0.13
...

```

This change reduces the ARM64 numbers for this benchmark from 10ns to 1ns. I am not sure if
we should just add `Consume()` method and mark it as `NoInline`. That's what is done for
`System.Memory.Span<T>.GetPinnedReference()` benchmark.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants