Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unroll SequenceEqual for u8 literals #82474

Closed
3 tasks
EgorBo opened this issue Feb 22, 2023 · 6 comments
Closed
3 tasks

Unroll SequenceEqual for u8 literals #82474

EgorBo opened this issue Feb 22, 2023 · 6 comments
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Milestone

Comments

@EgorBo
Copy link
Member

EgorBo commented Feb 22, 2023

bool IsValue(ReadOnlySpan<byte> a) => a.SequenceEqual("value"u8);
; Method P:IsValue(System.ReadOnlySpan`1[ubyte]):bool:this
       sub      rsp, 40
       mov      r8, 0xD1FFAB1E ;; address of RVA field
       mov      rcx, bword ptr [rdx]
       mov      edx, dword ptr [rdx+08H]
       mov      bword ptr [rsp+20H], r8
       cmp      edx, 5
       jne      SHORT G_M29601_IG04
       mov      edx, 5
       mov      r8, rdx
       mov      rdx, bword ptr [rsp+20H]
       call     [System.SpanHelpers:SequenceEqual(byref,byref,ulong):bool]
       jmp      SHORT G_M29601_IG05
G_M29601_IG04:
       xor      eax, eax
       add      rsp, 40
       ret      
; Total bytes of code: 58

Should be unrolled similar to UTF-16:

bool IsValue(ReadOnlySpan<char> a) => a.SequenceEqual("value");
; Method P:IsValue(System.ReadOnlySpan`1[ushort]):bool:this
       mov      rax, bword ptr [rdx]
       cmp      dword ptr [rdx+08H], 5
       jne      SHORT G_M10905_IG04
       mov      rdx, 0x75006C00610076
       xor      rdx, qword ptr [rax]
       mov      eax, dword ptr [rax+06H]
       xor      eax, 0x650075
       or       rax, rdx
       sete     al
       movzx    rax, al
       jmp      SHORT G_M10905_IG05
G_M10905_IG04:
       xor      eax, eax
G_M10905_IG05:
       ret      
; Total bytes of code: 44

Motivation

To clean up various places like:

  1. JsonSerializer.Read.HandleMetadata.cs
  2. aspnetcore's HttpHeaders.Generated.cs
  3. ^ similar to that in runtime: HeaderDescriptor.cs - this currently picks a candidate and then does non-optimized comparison while it could do comparison against u8 literal in each case
  4. Should optimize TE benchmarks as well e.g. here BenchmarkApplication.cs

Work items

  • SequenceEqual
  • OrdinalIgnoreCase
  • StartsWith
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Feb 22, 2023
@EgorBo EgorBo self-assigned this Feb 22, 2023
@EgorBo EgorBo added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI and removed untriaged New issue has not been triaged by the area owner labels Feb 22, 2023
@EgorBo EgorBo added this to the Future milestone Feb 22, 2023
@ghost
Copy link

ghost commented Feb 22, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak
See info in area-owners.md if you want to be subscribed.

Issue Details
bool IsValue(ReadOnlySpan<byte> a) => a.SequenceEqual("value"u8);
Current codegen:
; Method P:IsValue(System.ReadOnlySpan`1[ubyte]):bool:this
G_M29601_IG01:
       sub      rsp, 40
G_M29601_IG02:
       mov      r8, 0xD1FFAB1E
       mov      rcx, bword ptr [rdx]
       mov      edx, dword ptr [rdx+08H]
       mov      bword ptr [rsp+20H], r8
       cmp      edx, 5
       jne      SHORT G_M29601_IG04
G_M29601_IG03:
       mov      edx, 5
       mov      r8, rdx
       mov      rdx, bword ptr [rsp+20H]
       call     [System.SpanHelpers:SequenceEqual(byref,byref,ulong):bool]
       jmp      SHORT G_M29601_IG05
G_M29601_IG04:
       xor      eax, eax
G_M29601_IG05:
       add      rsp, 40
       ret      
; Total bytes of code: 58

Should be unrolled similar to UTF-16:

bool IsValue(ReadOnlySpan<char> a) => a.SequenceEqual("value");
Current codegen:
; Method P:IsValue(System.ReadOnlySpan`1[ushort]):bool:this
G_M10905_IG01:
G_M10905_IG02:
       mov      rax, bword ptr [rdx]
       cmp      dword ptr [rdx+08H], 5
       jne      SHORT G_M10905_IG04
G_M10905_IG03:
       mov      rdx, 0x75006C00610076
       xor      rdx, qword ptr [rax]
       mov      eax, dword ptr [rax+06H]
       xor      eax, 0x650075
       or       rax, rdx
       sete     al
       movzx    rax, al
       jmp      SHORT G_M10905_IG05
G_M10905_IG04:
       xor      eax, eax
G_M10905_IG05:
       ret      
; Total bytes of code: 44

Motivation

To clean up various places like:

  1. JsonSerializer.Read.HandleMetadata.cs
  2. aspnetcore's HttpHeaders.Generated.cs
  3. ^ similar to that in runtime: HeaderDescriptor.cs - this currently picks a candidate and then does non-optimized comparison while it could do comparison against u8 literal in each case
  4. Should optimize TE benchmarks as well e.g. here BenchmarkApplication.cs
Author: EgorBo
Assignees: EgorBo
Labels:

area-CodeGen-coreclr

Milestone: -

@stephentoub
Copy link
Member

I thought we already did this :( Thanks for opening the issue. We don't do any such handling for bytes today, just chars?

@EgorBo
Copy link
Member Author

EgorBo commented Feb 22, 2023

I thought we already did this :( Thanks for opening the issue. We don't do any such handling for bytes today, just chars?

Yep, chars only at the moment. RVA didn't fit well into that importer-level optimization, but I have a quick prototype to do this expansion later in ValueNumbering (or Early Prop). We also can then expand it for other patterns like "false"u8.CopyTo(span)

ulong fals_val = BitConverter.IsLittleEndian ? 0x73006C00610046ul : 0x460061006C0073ul; // "Fals"
MemoryMarshal.Write<ulong>(MemoryMarshal.AsBytes(destination), ref fals_val);
destination[4] = 'e';

@EgorBo EgorBo modified the milestones: Future, 8.0.0 Apr 1, 2023
@EgorBo
Copy link
Member Author

EgorBo commented Apr 1, 2023

Closed via #83945

bool IsValue(ReadOnlySpan<byte> a) => a.SequenceEqual("value"u8);
; Method Prog:IsValue(System.ReadOnlySpan`1[ubyte]):bool:this
G_M18139_IG01:
       sub      rsp, 40
						;; size=4 bbWeight=1 PerfScore 0.25

G_M18139_IG02:
       mov      rax, 0xD1FFAB1E      ; data for <PrivateImplementationDetails>:2AC3E771679DFE9EC82B8748F6F7718F3AD0E4EAAFABC9F68F4BC25187A5F230
       mov      rcx, bword ptr [rdx]
       mov      edx, dword ptr [rdx+08H]
       cmp      edx, 5
       jne      SHORT G_M18139_IG04
						;; size=21 bbWeight=1 PerfScore 5.50

G_M18139_IG03:
       mov      edx, dword ptr [rcx]
       mov      r8d, dword ptr [rax]
       mov      ecx, dword ptr [rcx+01H]
       mov      eax, dword ptr [rax+01H]
       xor      edx, r8d
       xor      eax, ecx
       or       eax, edx
       sete     al
       movzx    rax, al
       jmp      SHORT G_M18139_IG05
						;; size=26 bbWeight=0.50 PerfScore 6.00

G_M18139_IG04:
       xor      eax, eax
						;; size=2 bbWeight=0.50 PerfScore 0.12

G_M18139_IG05:
       add      rsp, 40
       ret      
						;; size=5 bbWeight=1 PerfScore 1.25
; Total bytes of code: 58

@EgorBo EgorBo closed this as completed Apr 1, 2023
@gfoidl
Copy link
Member

gfoidl commented Apr 2, 2023

@EgorBo for char the xor is done against two constants, saving two memory loads. The JIT is able to do this when written manually.

Is it possible to have this codegen here too? I.e. avoid the memory loads for the constant parts, and use only constant values.

@EgorBo
Copy link
Member Author

EgorBo commented Apr 2, 2023

@EgorBo for char the xor is done against two constants, saving two memory loads. The JIT is able to do this when written manually.

Is it possible to have this codegen here too? I.e. avoid the memory loads for the constant parts, and use only constant values.

A bit more complicated fix since I'll have to do that in an earlier phase. I just wanted to land a quick fix and see impact from it, so far not much

@ghost ghost locked as resolved and limited conversation to collaborators May 2, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

No branches or pull requests

3 participants