Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NativeAOT: Partially expand static initialization #83911

Merged
merged 40 commits into from
Apr 8, 2023

Conversation

EgorBo
Copy link
Member

@EgorBo EgorBo commented Mar 24, 2023

Closes #80954 and contributes to #64242

In the JIT world we successfully get rid of static class initializations (either beforefieldinit or normal cctors) naturally or via tiered compilation (Tier1 means that we already initialized eveyrything we needed on hot path in the previous tier). However, it's not the case for AOT. The plan is to help NativeAOT to avoid hitting call overhead in this case by partially inline "is class already initialized?" check with a fast path, e.g.:

// non-GC static
static int s_Int = int.Parse("42");
int GetInt() => s_Int;

// GC static
static string s_Str = (42).ToString();
string GetStr() => s_Str;

Current codegen on NativeAOT:

; Method Prog:GetStr():System.String:this
G_M45136_IG01:              
       4883EC28             sub      rsp, 40
G_M45136_IG02:              
       E800000000           call     CORINFO_HELP_READYTORUN_GCSTATIC_BASE
       488B4008             mov      rax, gword ptr [rax+08H]
G_M45136_IG03:              
       4883C428             add      rsp, 40
       C3                   ret      
; Total bytes of code: 18


; Method Prog:GetInt():int:this
G_M29019_IG01:              
       4883EC28             sub      rsp, 40
G_M29019_IG02:              
       E800000000           call     CORINFO_HELP_READYTORUN_NONGCSTATIC_BASE
       8B00                 mov      eax, dword ptr [rax]
G_M29019_IG03:              
       4883C428             add      rsp, 40
       C3                   ret      
; Total bytes of code: 16

New codegen (NativeAOT):

; Method Prog:GetStr():System.String:this
       4883EC28             sub      rsp, 40
       488D0500000000       lea      rax, [(reloc 0x4000000000421128)]
       8378F801             cmp      dword ptr [rax-08H], 1
       7510                 jne      SHORT G_M25482_IG05
G_M25482_IG03:              
       488B0500000000       mov      rax, qword ptr [(reloc 0x4000000000421130)]
       488B4008             mov      rax, gword ptr [rax+08H]
       4883C428             add      rsp, 40
       C3                   ret      
G_M25482_IG05:              
       E800000000           call     CORINFO_HELP_READYTORUN_GCSTATIC_BASE
       EBE9                 jmp      SHORT G_M25482_IG03
; Total bytes of code: 40


; Method Prog:GetInt():int:this
       56                   push     rsi
       4883EC20             sub      rsp, 32
       488D3500000000       lea      rsi, [(reloc 0x4000000000421008)]
       837EF801             cmp      dword ptr [rsi-08H], 1
       7508                 jne      SHORT G_M56961_IG05
G_M56961_IG03:             
       8B06                 mov      eax, dword ptr [rsi]
       4883C420             add      rsp, 32
       5E                   pop      rsi
       C3                   ret      
G_M56961_IG05:              
       E800000000           call     CORINFO_HELP_READYTORUN_NONGCSTATIC_BASE
       EBF1                 jmp      SHORT G_M56961_IG03
; Total bytes of code: 33

It also works for JIT (mainly, to test it, but could be useful for TC=0).

Benchmarks

I was using JIT with TieredCompilation=0

// case 1: normal cctor. JIT doesn't hoist such initializations
// from loops (needs loop peeling)

public class TestClassWithCctor
{
    public static int field;

    static TestClassWithCctor()
    {
        field = 42;
    }
}

[Benchmark]
public int NonHoistableStaticInit()
{
    int sum = 0;
    for (int i = 0; i < 10000; i++)
    {
        sum += TestClassWithCctor.field;
    }
    return sum;
}

// case 2: beforefieldinit

public class TestClassWithBeforefieldinit
{
    public static int field = 42;
}

[Benchmark]
public int SimpleStaticInit()
{
    return TestClassWithBeforefieldinit.field;
}
Method Toolchain Mean
NonHoistableStaticInit \runtime-base\corerun.exe 11,123.0890 ns
NonHoistableStaticInit \runtime\corerun.exe 1,893.7071 ns
SimpleStaticInit \runtime-base\corerun.exe 0.9206 ns
SimpleStaticInit \runtime\corerun.exe 0.0302 ns

codegen diff: https://www.diffchecker.com/87bWA1er/

Can't say yet the total size overhead for a hello world, in the worst case we can leave this optimization only for "prefer speed" mode (-Os)

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Mar 24, 2023
@ghost ghost assigned EgorBo Mar 24, 2023
@ghost
Copy link

ghost commented Mar 24, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak
See info in area-owners.md if you want to be subscribed.

Issue Details

Closes #80954 and contributes to #64242

In the JIT world we successfully get rid of static class initializations (either beforefieldinit or normal cctors) naturally or via tiered compilation (Tier1 means that we already initialized eveyrything we needed on hot path in the previous tier). However, it's not the case for AOT. The plan is to help NativeAOT to avoid hitting call overhead in this case by partially inline "is class already initialized?" check with a fast path, e.g.:

static int field = int.Parse("42"); // mimic a complex cctor

int Test()
{
    return field;
}

Current codegen on NativeAOT:

; Method Prog:Test():int:this
       sub      rsp, 40
       call     CORINFO_HELP_READYTORUN_NONGCSTATIC_BASE
       mov      eax, dword ptr [rax]
       add      rsp, 40
       ret      

Expected codegen:

       sub      rsp, 40
       test     byte  ptr [(reloc)], 1 ;; is already initialized?
       jne      SHORT G_M3272_IG04
       call     CORINFO_HELP_READYTORUN_NONGCSTATIC_BASE ;; it's not -- fallback
SHORT G_M3272_IG04:
       mov      eax, dword ptr [(reloc)] ;; just access field's value
       add      rsp, 40
       ret

The implementation in this PR only handles JIT for now just to test it, while I'm trying to figure out how to implement getIsClassInitedFieldAddress API on NativeAOT side, might need @MichalStrehovsky help

Author: EgorBo
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@jkotas
Copy link
Member

jkotas commented Mar 24, 2023

getIsClassInitedFieldAddress API on NativeAOT side,

factory.TypeNonGCStaticsSymbol(type) and subtract NonGCStaticsNode.GetClassConstructorContextSize(factory.Target) offset from it.

It should be compared with 1 (CoreCLR does bit test, native aot uses comparison).

@EgorBo
Copy link
Member Author

EgorBo commented Mar 25, 2023

Managed to get NAOT working locally (not yet pushed), so for

static int field = int.Parse("42"); // mimic a complex cctor

int Test()
{
    return field;
}

Was:

; Method Prog:Test():int:this
       sub      rsp, 40
       call     CORINFO_HELP_READYTORUN_NONGCSTATIC_BASE
       mov      eax, dword ptr [rax]
       add      rsp, 40
       ret
; Total bytes of code 16 

Now I get:

; Assembly listing for method Prog:Test():int:this
       sub      rsp, 40
       lea      rax, [(reloc)]
       cmp      dword ptr [rax-08H], 0
       je       SHORT G_M3272_IG05
G_M3272_IG03:
       mov      eax, dword ptr [(reloc)]
       add      rsp, 40
       ret
G_M3272_IG05:
       call     CORINFO_HELP_READYTORUN_NONGCSTATIC_BASE
       jmp      SHORT G_M3272_IG03
; Total bytes of code 35

(nongc statics)

@EgorBo
Copy link
Member Author

EgorBo commented Mar 25, 2023

Same but for JIT:

Was:

; Assembly listing for method Prog:Test():int:this
G_M3272_IG01:
       sub      rsp, 40
       mov      rcx, 0xD1FFAB1E
       mov      edx, 3
       call     CORINFO_HELP_GETSHARED_NONGCSTATIC_BASE
       mov      eax, dword ptr [(reloc)]
       add      rsp, 40
       ret      
; Total bytes of code 35

New:

; Assembly listing for method Prog:Test():int:this
       sub      rsp, 40
       test     byte  ptr [(reloc)], 1      
       je       SHORT G_M3272_IG05
G_M3272_IG03:
       mov      eax, dword ptr [(reloc)]
       add      rsp, 40
       ret      
G_M3272_IG05:
       mov      rcx, 0xD1FFAB1E
       mov      edx, 3
       call     CORINFO_HELP_GETSHARED_NONGCSTATIC_BASE
       jmp      SHORT G_M3272_IG03
; Total bytes of code 46

Presumably, JIT version can be made 5 bytes smaller if we replace the cold part with "call INITCLASS(cls)".

@EgorBo
Copy link
Member Author

EgorBo commented Mar 25, 2023

GC statics come with a bigger size increase on NAOT:

static string field = (42).ToString();

[MethodImpl(MethodImplOptions.NoInlining)]
string Test()
{
    return field;
}

Was:

; Method Prog:Test():System.String:this
       sub      rsp, 40
       call     CORINFO_HELP_READYTORUN_GCSTATIC_BASE
       mov      rax, gword ptr [rax+08H]
       add      rsp, 40
       ret      
; Total bytes of code: 18

Now:

; Method Prog:Test():System.String:this
       sub      rsp, 40
       lea      rax, [(reloc)] ;; NonGCStaticsBase
       cmp      dword ptr [rax-08H], 0
       je       SHORT G_M48517_IG05
G_M48517_IG03:
       mov      rax, qword ptr [(reloc)] ;; this is a different reloc (GCStaticBase)
       mov      rax, gword ptr [rax+08H]
       add      rsp, 40
       ret      
G_M48517_IG05:
       call     CORINFO_HELP_READYTORUN_GCSTATIC_BASE
       jmp      SHORT G_M48517_IG03
; Total bytes of code: 40

due to double-indirect for field access

@EgorBo
Copy link
Member Author

EgorBo commented Mar 25, 2023

Benchmarks (JIT, TieredCompilation=0):

// case 1: normal cctor. JIT doesn't hoist such initializations
// from loops (needs loop peeling)

public class TestClassWithCctor
{
    public static int field;

    static TestClassWithCctor()
    {
        field = 42;
    }
}

[Benchmark]
public int NonHoistableStaticInit()
{
    int sum = 0;
    for (int i = 0; i < 10000; i++)
    {
        sum += TestClassWithCctor.field;
    }
    return sum;
}

// case 2: beforefieldinit

public class TestClassWithBeforefieldinit
{
    public static int field = 42;
}

[Benchmark]
public int SimpleStaticInit()
{
    return TestClassWithBeforefieldinit.field;
}
Method Toolchain Mean
NonHoistableStaticInit \runtime-base\corerun.exe 11,123.0890 ns
NonHoistableStaticInit \runtime\corerun.exe 1,893.7071 ns
SimpleStaticInit \runtime-base\corerun.exe 0.9206 ns
SimpleStaticInit \runtime\corerun.exe 0.0302 ns

codegen diff: https://www.diffchecker.com/87bWA1er/

@EgorBo
Copy link
Member Author

EgorBo commented Mar 25, 2023

/azp list

@azure-pipelines

This comment was marked as outdated.

@EgorBo
Copy link
Member Author

EgorBo commented Mar 25, 2023

/azp run runtime-coreclr outerloop, runtime-extra-platforms, runtime-coreclr jitstress

@azure-pipelines
Copy link

Azure Pipelines successfully started running 3 pipeline(s).

@EgorBo EgorBo mentioned this pull request Apr 6, 2023
@EgorBo
Copy link
Member Author

EgorBo commented Apr 6, 2023

@kunalspathak @AndyAyersMS @SingleAccretion I've addressed feedback, anything else?

src/coreclr/jit/flowgraph.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/flowgraph.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/flowgraph.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/importer.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/importer.cpp Outdated Show resolved Hide resolved
EgorBo and others added 2 commits April 6, 2023 18:44
@EgorBo
Copy link
Member Author

EgorBo commented Apr 6, 2023

@SingleAccretion Thanks!

@kunalspathak @AndyAyersMS waiting for a green approve now 🙂

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to have some kind of "post phase" check to make sure each possible candidate was either expanded or intentionally not expanded?

src/coreclr/jit/flowgraph.cpp Show resolved Hide resolved
src/coreclr/jit/flowgraph.cpp Outdated Show resolved Hide resolved
@EgorBo
Copy link
Member Author

EgorBo commented Apr 6, 2023

Would it make sense to have some kind of "post phase" check to make sure each possible candidate was either expanded or intentionally not expanded?

E.g. for runtime lookups I put an assert in Lower.cpp to make sure all of them are expanded (because it was required). In this case I didn't do that because it's just an optimization and we skip some type of static initializations on JIT already + we skip cold blocks so I didn't do that

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@EgorBo EgorBo merged commit 4ec7db0 into dotnet:main Apr 8, 2023
@EgorBo EgorBo deleted the expand-static-init branch April 8, 2023 08:10
@ghost ghost locked as resolved and limited conversation to collaborators May 10, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

NativeAOT: Inline static initialization checks in codegen
7 participants