-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize field access in types not marked with beforefieldinit #1327
Comments
Thanks; yes this The difference right now is that the ;;; Static
G_M33157_IG03:
48B9B0532B42F97F0000 mov rcx, 0x7FF9422B53B0
BA01000000 mov edx, 1
E8B5E9885F call CORINFO_HELP_GETSHARED_NONGCSTATIC_BASE
FFC6 inc esi
81FE10270000 cmp esi, 0x2710
7CE2 jl SHORT G_M33157_IG03
;; Field
G_M65279_IG02:
33F6 xor esi, esi
48B9B0532B42F97F0000 mov rcx, 0x7FF9422B53B0
BA02000000 mov edx, 2
E875E9885F call CORINFO_HELP_GETSHARED_NONGCSTATIC_BASE
G_M65279_IG03:
FFC6 inc esi
81FE10270000 cmp esi, 0x2710
7CF6 jl SHORT G_M65279_IG03 The code in |
@AndyAyersMS If the class have been initialized already, the importer won't create To fix this specific case, we would need to run the static initialization during the first iteration of the loop only. |
If this issue is fixed, are there still going to be significant performance penalties for static field access on non-beforefieldinit types in some situations or has that largely been mitigated in newer .NET Core releases? I haven't seen any recent discussions on the performance implications of having a static constructor. The latest thing I could find is this PR dotnet/corefx#6016 where it seems like people are still trying to avoid static constructors where possible for performance reasons. |
Depends what you mean by significant. Non-beforefieldinit type are more expensive than beforefieldinit types by design. Note that the costs are not just in throughput, but also in code size. |
That's what I thought, but @YairHalberstadt seemed to think that tiered compilation could/should remove the overhead of accessing the field. |
Yes, it does. It takes some time for the tiered compilation to kick in and you pay the costs until that happens. I do not think it kicked in for your test at the top of this issue. |
Hmm, I see. What's the order of magnitude of "some time" / what conditions affect how long it takes? I'm trying to gauge if it is still worth complicating class initialization code, removing I mean, ideally, we just get the attribute I proposed in the related issue so the tradeoff becomes unnecessary, but I wouldn't mind refactoring some nasty code if it ultimately doesn't make a difference after a few second warmup. |
Right, just came to the same realization. Assembly code above is for the case where the .cctors have not yet been run. We only jit these methods once: they have a loop, and by default
Not in this case, though. We jit at Tier1 initially because of the loop, and won't ever rejit. If you set
|
Presumably in real life code if this is a bottleneck we will rejit, since the loop is unlikely to be top level? |
Is there any way to force tiered compilation to run before a test starts so that I can benchmark the post-tiered compilation performance impact of adding static initializers to some of my code? |
Not currently, no. Once we jit at Tier1 we never rejit. So with default settings, methods with loops are jitted just once. The intent of QJFL=0 is to avoid getting trapped in Tier0 code; the downside is you may end up with sub-optimal Tier1 code. The general fix for this is to implement something like on-stack replacement, so we can jit at Tier0 initially and transition (mid-method, if necessary) to Tier1 code if the method is called frequently, or loops around frequently.
BenchmarkDotNet should generally get you to Tier1 code. But as noted above, not all Tier1 code is equally performant; there's a benefit in some cases to running the Tier0 code first. Setting |
I see. Thanks for the info @AndyAyersMS Do you think this is a problem that will be resolved in the future, or would it be worth considering adding a language feature for this case? |
Possibly. I have been looking into on-stack replacement, but mostly thinking of it as something that could help improve startup, not as something that could help improve steady-state. I would be interested in learning about more realistic "real-world" cases where performance improves when running with QJFL=1. Measurements on simple benchmarks can highlight problem areas, but often overstate their impact. |
I do not think that a language feature for this makes sense. Another way to fix this issue is to teach the JIT to inline the "already initialized" check into the code. It would eliminate the call overhead on the fast path. The cost of the well-predicted inlined check would be negligible. FWIW, .NET Native did the inlining of the "already initialized" check optimization. |
What case/language feature are we talking about here? I think a I bet 95%+ of classes that have static constructors don't actually want/need type initialization on first field access, it's just there for convenience (but that's a bit of a wild guess, kind of hard to measure). Even if this gets optimized on a JIT level, letting people just set |
I do not think that |
I understand that, but I respectfully disagree. I see library authors constantly dealing with this and it doesn't really add any complexity to anyone that doesn't care about optimizing performance at that level...it's just an optional attribute that does something really simple, i.e. control whether the compiler emits |
To recap the above: in general, runtime overhead of class init checks should largely be mitigated by tiered compilation. We can still see impact of class init checks in methods that are not yet tiered up; impact of this should be limited to startup phases as the code will eventually get replaced by Tier1 code:
We can also see impact of class init checks in methods that aren't eligible for tiering; in such cases impact of class init checks can persist indefinitely:
Seems like we should look more deeply into implementing support for partially inlined class init checks in the jit, so am marking this as 5.0 for now. |
I current don't see us getting to this for 5.0, so moving to future. |
We should consider this for 6.0, so updating milestone. |
@jkotas @AndyAyersMS a few questions:
if (moduleID->m_pDataBlob[classId] & ClassInitFlags::INITIALIZED_FLAG == 0)
CORINFO_HELP_GETSHARED_NONGCSTATIC_BASE(moduleId, classId); |
Quick prototype for "inlined check": EgorBo@78436b0 (jit only, doesn't do hoisting opt)
diff for |
In general case, it would be observable behavior change in the timing of static constructors. You can do this in some limited cases like when you can prove the loop will run at least once and the static is accessed as the first thing in the loop, etc.
The offsets and masks should be abstracted via JIT/EE interface. It will also allow enabling this optimization for non-shared generics. |
This issue is basically fixed by QuickJit enabled by default for Loops in .NET 7.0 TC=0 .NET 7.0:
TC=0 .NET 8.0:
|
Is it possible for tiered compilation to partially alleviate my issue regarding
beforefieldinit
by addressing the performance aspect of that issue discussed here: dotnet/csharplang#3080?I would still like the option of explicit control over class initialization timing as discussed there but it would be much less of an issue in most cases if tiered compilation could at least solve the performance aspects. For reference, here is a copy-paste of the benchmark code in that thread that shows the big performance discrepancy between static field access in classes with and without
beforefieldinit
:Code:
category:cq
theme:optimization
skill-level:intermediate
cost:medium
The text was updated successfully, but these errors were encountered: