-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NativeAOT: Partially expand static initialization #83911
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak Issue DetailsCloses #80954 and contributes to #64242 In the JIT world we successfully get rid of static class initializations (either beforefieldinit or normal cctors) naturally or via tiered compilation (Tier1 means that we already initialized eveyrything we needed on hot path in the previous tier). However, it's not the case for AOT. The plan is to help NativeAOT to avoid hitting static int field = int.Parse("42"); // mimic a complex cctor
int Test()
{
return field;
} Current codegen on NativeAOT: ; Method Prog:Test():int:this
sub rsp, 40
call CORINFO_HELP_READYTORUN_NONGCSTATIC_BASE
mov eax, dword ptr [rax]
add rsp, 40
ret Expected codegen: sub rsp, 40
test byte ptr [(reloc)], 1 ;; is already initialized?
jne SHORT G_M3272_IG04
call CORINFO_HELP_READYTORUN_NONGCSTATIC_BASE ;; it's not -- fallback
SHORT G_M3272_IG04:
mov eax, dword ptr [(reloc)] ;; just access field's value
add rsp, 40
ret The implementation in this PR only handles JIT for now just to test it, while I'm trying to figure out how to implement
|
It should be compared with |
Managed to get NAOT working locally (not yet pushed), so for static int field = int.Parse("42"); // mimic a complex cctor
int Test()
{
return field;
} Was: ; Method Prog:Test():int:this
sub rsp, 40
call CORINFO_HELP_READYTORUN_NONGCSTATIC_BASE
mov eax, dword ptr [rax]
add rsp, 40
ret
; Total bytes of code 16 Now I get: ; Assembly listing for method Prog:Test():int:this
sub rsp, 40
lea rax, [(reloc)]
cmp dword ptr [rax-08H], 0
je SHORT G_M3272_IG05
G_M3272_IG03:
mov eax, dword ptr [(reloc)]
add rsp, 40
ret
G_M3272_IG05:
call CORINFO_HELP_READYTORUN_NONGCSTATIC_BASE
jmp SHORT G_M3272_IG03
; Total bytes of code 35 (nongc statics) |
Same but for JIT: Was: ; Assembly listing for method Prog:Test():int:this
G_M3272_IG01:
sub rsp, 40
mov rcx, 0xD1FFAB1E
mov edx, 3
call CORINFO_HELP_GETSHARED_NONGCSTATIC_BASE
mov eax, dword ptr [(reloc)]
add rsp, 40
ret
; Total bytes of code 35 New: ; Assembly listing for method Prog:Test():int:this
sub rsp, 40
test byte ptr [(reloc)], 1
je SHORT G_M3272_IG05
G_M3272_IG03:
mov eax, dword ptr [(reloc)]
add rsp, 40
ret
G_M3272_IG05:
mov rcx, 0xD1FFAB1E
mov edx, 3
call CORINFO_HELP_GETSHARED_NONGCSTATIC_BASE
jmp SHORT G_M3272_IG03
; Total bytes of code 46 Presumably, JIT version can be made 5 bytes smaller if we replace the cold part with "call INITCLASS(cls)". |
GC statics come with a bigger size increase on NAOT: static string field = (42).ToString();
[MethodImpl(MethodImplOptions.NoInlining)]
string Test()
{
return field;
} Was: ; Method Prog:Test():System.String:this
sub rsp, 40
call CORINFO_HELP_READYTORUN_GCSTATIC_BASE
mov rax, gword ptr [rax+08H]
add rsp, 40
ret
; Total bytes of code: 18 Now: ; Method Prog:Test():System.String:this
sub rsp, 40
lea rax, [(reloc)] ;; NonGCStaticsBase
cmp dword ptr [rax-08H], 0
je SHORT G_M48517_IG05
G_M48517_IG03:
mov rax, qword ptr [(reloc)] ;; this is a different reloc (GCStaticBase)
mov rax, gword ptr [rax+08H]
add rsp, 40
ret
G_M48517_IG05:
call CORINFO_HELP_READYTORUN_GCSTATIC_BASE
jmp SHORT G_M48517_IG03
; Total bytes of code: 40 due to double-indirect for field access |
Benchmarks (JIT, TieredCompilation=0): // case 1: normal cctor. JIT doesn't hoist such initializations
// from loops (needs loop peeling)
public class TestClassWithCctor
{
public static int field;
static TestClassWithCctor()
{
field = 42;
}
}
[Benchmark]
public int NonHoistableStaticInit()
{
int sum = 0;
for (int i = 0; i < 10000; i++)
{
sum += TestClassWithCctor.field;
}
return sum;
}
// case 2: beforefieldinit
public class TestClassWithBeforefieldinit
{
public static int field = 42;
}
[Benchmark]
public int SimpleStaticInit()
{
return TestClassWithBeforefieldinit.field;
}
codegen diff: https://www.diffchecker.com/87bWA1er/ |
/azp list |
This comment was marked as outdated.
This comment was marked as outdated.
/azp run runtime-coreclr outerloop, runtime-extra-platforms, runtime-coreclr jitstress |
Azure Pipelines successfully started running 3 pipeline(s). |
Co-authored-by: Jan Kotas <[email protected]>
Co-authored-by: Jan Kotas <[email protected]>
… expand-static-init
@kunalspathak @AndyAyersMS @SingleAccretion I've addressed feedback, anything else? |
Co-authored-by: SingleAccretion <[email protected]>
Co-authored-by: SingleAccretion <[email protected]>
@SingleAccretion Thanks! @kunalspathak @AndyAyersMS waiting for a green approve now 🙂 |
src/coreclr/tools/aot/ILCompiler.RyuJit/JitInterface/CorInfoImpl.RyuJit.cs
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to have some kind of "post phase" check to make sure each possible candidate was either expanded or intentionally not expanded?
E.g. for runtime lookups I put an assert in Lower.cpp to make sure all of them are expanded (because it was required). In this case I didn't do that because it's just an optimization and we skip some type of static initializations on JIT already + we skip cold blocks so I didn't do that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Closes #80954 and contributes to #64242
In the JIT world we successfully get rid of static class initializations (either beforefieldinit or normal cctors) naturally or via tiered compilation (Tier1 means that we already initialized eveyrything we needed on hot path in the previous tier). However, it's not the case for AOT. The plan is to help NativeAOT to avoid hitting
call
overhead in this case by partially inline "is class already initialized?" check with a fast path, e.g.:Current codegen on NativeAOT:
New codegen (NativeAOT):
It also works for JIT (mainly, to test it, but could be useful for TC=0).
Benchmarks
I was using JIT with TieredCompilation=0
codegen diff: https://www.diffchecker.com/87bWA1er/
Can't say yet the total size overhead for a hello world, in the worst case we can leave this optimization only for "prefer speed" mode (-Os)