Optimize field access in types not marked with beforefieldinit #1327

mikernet · 2020-01-06T16:46:07Z

Is it possible for tiered compilation to partially alleviate my issue regarding beforefieldinit by addressing the performance aspect of that issue discussed here: dotnet/csharplang#3080?

I would still like the option of explicit control over class initialization timing as discussed there but it would be much less of an issue in most cases if tiered compilation could at least solve the performance aspects. For reference, here is a copy-paste of the benchmark code in that thread that shows the big performance discrepancy between static field access in classes with and without beforefieldinit:

BenchmarkDotNet=v0.12.0, OS=Windows 10.0.18362
Intel Core i7-3770 CPU 3.40GHz (Ivy Bridge), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=3.1.100
  [Host]     : .NET Core 3.1.0 (CoreCLR 4.700.19.56402, CoreFX 4.700.19.56404), X64 RyuJIT
  DefaultJob : .NET Core 3.1.0 (CoreCLR 4.700.19.56402, CoreFX 4.700.19.56404), X64 RyuJIT

| Method |      Mean |     Error |    StdDev |
|------- |----------:|----------:|----------:|
| Static | 18.870 us | 0.1066 us | 0.0997 us |
|  Field |  2.713 us | 0.0099 us | 0.0092 us |

Code:

const int iterations = 10000;

public static class StaticCctor
{
    public static bool Value { get; set; }

    static StaticCctor()
    {
        Value = true;
    }
}

public static class FieldInit
{
    public static bool Value { get; set; } = true;
}

public class BeforeFieldInit
{
    [Benchmark]
    public void Static()
    {
        bool value;

        for (int i = 0; i < iterations; i++)
            value = StaticCctor.Value;

    }

    [Benchmark]
    public void Field()
    {
        bool value;

        for (int i = 0; i < iterations; i++)
            value = FieldInit.Value;
    }
}

category:cq
theme:optimization
skill-level:intermediate
cost:medium

The text was updated successfully, but these errors were encountered:

AndyAyersMS · 2020-01-06T17:05:30Z

Thanks; yes this ~~is something we can fix~~ a consequence of us not re-jitting methods with loops, by default (see note below).

The difference right now is that the Field case hoists the static base accessor out of the loop, and the Static case does not:

;;; Static

G_M33157_IG03:
       48B9B0532B42F97F0000 mov      rcx, 0x7FF9422B53B0
       BA01000000           mov      edx, 1
       E8B5E9885F           call     CORINFO_HELP_GETSHARED_NONGCSTATIC_BASE
       FFC6                 inc      esi
       81FE10270000         cmp      esi, 0x2710
       7CE2                 jl       SHORT G_M33157_IG03

;; Field

G_M65279_IG02:
       33F6                 xor      esi, esi
       48B9B0532B42F97F0000 mov      rcx, 0x7FF9422B53B0
       BA02000000           mov      edx, 2
       E875E9885F           call     CORINFO_HELP_GETSHARED_NONGCSTATIC_BASE

G_M65279_IG03:
       FFC6                 inc      esi
       81FE10270000         cmp      esi, 0x2710
       7CF6                 jl       SHORT G_M65279_IG03

The code in optComputeLoopSideEffectsOfBlock is too conservative, it checks if the helper call could cause class initialization, but doesn't also check if the class happens to already have been initialized.

jkotas · 2020-01-06T17:10:21Z

@AndyAyersMS If the class have been initialized already, the importer won't create CORINFO_HELP_GETSHARED_NONGCSTATIC_BASE at all.

To fix this specific case, we would need to run the static initialization during the first iteration of the loop only.

mikernet · 2020-01-06T17:16:52Z

If this issue is fixed, are there still going to be significant performance penalties for static field access on non-beforefieldinit types in some situations or has that largely been mitigated in newer .NET Core releases? I haven't seen any recent discussions on the performance implications of having a static constructor. The latest thing I could find is this PR dotnet/corefx#6016 where it seems like people are still trying to avoid static constructors where possible for performance reasons.

jkotas · 2020-01-06T17:27:01Z

Depends what you mean by significant. Non-beforefieldinit type are more expensive than beforefieldinit types by design. Note that the costs are not just in throughput, but also in code size.

mikernet · 2020-01-06T17:34:27Z

That's what I thought, but @YairHalberstadt seemed to think that tiered compilation could/should remove the overhead of accessing the field.

jkotas · 2020-01-06T18:18:17Z

tiered compilation could/should remove the overhead of accessing the field.

Yes, it does. It takes some time for the tiered compilation to kick in and you pay the costs until that happens. I do not think it kicked in for your test at the top of this issue.

mikernet · 2020-01-06T18:32:52Z

Hmm, I see. What's the order of magnitude of "some time" / what conditions affect how long it takes?

I'm trying to gauge if it is still worth complicating class initialization code, removing readonly field modifiers, making it more cumbersome to work with NRT fields, etc in order to avoid having a static constructor in performance-sensitive code like a DB engine. If it takes on the order of seconds for that to kick in and optimize the code then I guess it usually isn't that big of a deal.

I mean, ideally, we just get the attribute I proposed in the related issue so the tradeoff becomes unnecessary, but I wouldn't mind refactoring some nasty code if it ultimately doesn't make a difference after a few second warmup.

AndyAyersMS · 2020-01-06T18:33:13Z

If the class have been initialized already, the importer won't create CORINFO_HELP_GETSHARED_NONGCSTATIC_BASE at all.

Right, just came to the same realization. Assembly code above is for the case where the .cctors have not yet been run. We only jit these methods once: they have a loop, and by default COMPlus_TC_QuickJitForLoops=0, so they bypass Tier0.

It takes some time for the tiered compilation to kick in

Not in this case, though. We jit at Tier1 initially because of the loop, and won't ever rejit. If you set COMPlus_TC_QuickJitForLoops=1 and run, then tiering can remove the class init overhead:

Method	Mean	Error	StdDev
Static	2.631 us	0.0427 us	0.0378 us
Field	2.574 us	0.0508 us	0.0696 us

YairHalberstadt · 2020-01-06T18:35:09Z

We jit at Tier1 initially because of the loop, and won't ever rejit.

Presumably in real life code if this is a bottleneck we will rejit, since the loop is unlikely to be top level?

mikernet · 2020-01-06T18:39:58Z

Is there any way to force tiered compilation to run before a test starts so that I can benchmark the post-tiered compilation performance impact of adding static initializers to some of my code?

AndyAyersMS · 2020-01-06T18:50:36Z

Presumably in real life code if this is a bottleneck we will rejit, since the loop is unlikely to be top level?

Not currently, no. Once we jit at Tier1 we never rejit. So with default settings, methods with loops are jitted just once.

The intent of QJFL=0 is to avoid getting trapped in Tier0 code; the downside is you may end up with sub-optimal Tier1 code.

The general fix for this is to implement something like on-stack replacement, so we can jit at Tier0 initially and transition (mid-method, if necessary) to Tier1 code if the method is called frequently, or loops around frequently.

Is there any way to force tiered compilation to run before a test starts

BenchmarkDotNet should generally get you to Tier1 code. But as noted above, not all Tier1 code is equally performant; there's a benefit in some cases to running the Tier0 code first. Setting COMPlus_TC_QuickJitForLoops=1 will ensure this.

YairHalberstadt · 2020-01-06T18:53:16Z

I see. Thanks for the info @AndyAyersMS

Do you think this is a problem that will be resolved in the future, or would it be worth considering adding a language feature for this case?

AndyAyersMS · 2020-01-06T19:19:22Z

Do you think this is a problem that will be resolved in the future

Possibly. I have been looking into on-stack replacement, but mostly thinking of it as something that could help improve startup, not as something that could help improve steady-state.

I would be interested in learning about more realistic "real-world" cases where performance improves when running with QJFL=1. Measurements on simple benchmarks can highlight problem areas, but often overstate their impact.

jkotas · 2020-01-06T19:32:48Z

a language feature for this case?

I do not think that a language feature for this makes sense.

Another way to fix this issue is to teach the JIT to inline the "already initialized" check into the code. It would eliminate the call overhead on the fast path. The cost of the well-predicted inlined check would be negligible.

FWIW, .NET Native did the inlining of the "already initialized" check optimization.

mikernet · 2020-01-06T19:34:32Z

What case/language feature are we talking about here? I think a [BeforeFieldInit(bool)] attribute to control the beforefieldinit flag is quite sensible. Is that what you are referring to @jkotas / @YairHalberstadt or something else?

I bet 95%+ of classes that have static constructors don't actually want/need type initialization on first field access, it's just there for convenience (but that's a bit of a wild guess, kind of hard to measure). Even if this gets optimized on a JIT level, letting people just set beforefieldinit will make less work for the JIT and reduce code size.

jkotas · 2020-01-06T19:39:57Z

I do not think that [BeforeFieldInit(bool)] is worth the complexity that it would adds to the language spec. https://blogs.msdn.microsoft.com/ericgu/2004/01/12/minus-100-points/ will give you an idea for how the language team evaluates whether the language feature is worth it.

mikernet · 2020-01-06T19:42:08Z

I understand that, but I respectfully disagree. I see library authors constantly dealing with this and it doesn't really add any complexity to anyone that doesn't care about optimizing performance at that level...it's just an optional attribute that does something really simple, i.e. control whether the compiler emits beforefieldinit on a method or not.

AndyAyersMS · 2020-05-06T18:30:04Z

To recap the above: in general, runtime overhead of class init checks should largely be mitigated by tiered compilation.

We can still see impact of class init checks in methods that are not yet tiered up; impact of this should be limited to startup phases as the code will eventually get replaced by Tier1 code:

prejitted methods (in non-AOT modes)
tier0 jittted methods

We can also see impact of class init checks in methods that aren't eligible for tiering; in such cases impact of class init checks can persist indefinitely:

all methods, in AOT modes
jitted methods with loops which bypass tier0 (unless QJFL=1)
methods marked with AggressiveOptimization which also bypass tier0
dynamic methods and other special cases where we don't allow tiering

Seems like we should look more deeply into implementing support for partially inlined class init checks in the jit, so am marking this as 5.0 for now.

AndyAyersMS · 2020-06-03T23:08:47Z

Seems like we should look more deeply into implementing support for partially inlined class init checks in the jit

I current don't see us getting to this for 5.0, so moving to future.

AndyAyersMS · 2021-01-25T21:59:40Z

We should consider this for 6.0, so updating milestone.

EgorBo · 2021-02-03T21:06:55Z

@jkotas @AndyAyersMS a few questions:

Can we just hoist call CORINFO_HELP_GETSHARED_NONGCSTATIC_BASE from the loop body in Static just like it was done for Field? https://www.diffchecker.com/mTHFbDfu (Static is on the left)
Regarding the inlined "is init" check - if I understand it correctly it should be something like this?

if (moduleID->m_pDataBlob[classId] & ClassInitFlags::INITIALIZED_FLAG == 0)
    CORINFO_HELP_GETSHARED_NONGCSTATIC_BASE(moduleId, classId);

EgorBo · 2021-02-04T11:53:40Z

Quick prototype for "inlined check": EgorBo@78436b0 (jit only, doesn't do hoisting opt)

       | Method |     Mean |     Error |
       |------- |---------:|----------:|
master | Static | 8.808 us | 0.1702 us |
    PR | Static | 4.285 us | 0.0163 us |

diff for Static method: https://www.diffchecker.com/9swAJvSl

jkotas · 2021-02-04T15:00:32Z

Can we just hoist call CORINFO_HELP_GETSHARED_NONGCSTATIC_BASE from the loop body in Static just like it was done for Field?

In general case, it would be observable behavior change in the timing of static constructors. You can do this in some limited cases like when you can prove the loop will run at least once and the static is accessed as the first thing in the loop, etc.

inlined check

The offsets and masks should be abstracted via JIT/EE interface. It will also allow enabling this optimization for non-shared generics.

EgorBo · 2023-04-27T22:42:20Z

This issue is basically fixed by QuickJit enabled by default for Loops in .NET 7.0
For NativeAOT and TieredCompilation=0 it's fixed in .NET 8.0 via #83911

TC=0 .NET 7.0:

| Method |     Mean |     Error |    StdDev |
|------- |---------:|----------:|----------:|
| Static | 9.212 us | 3.1957 us | 0.1752 us |
|  Field | 1.842 us | 0.0979 us | 0.0054 us |

TC=0 .NET 8.0:


| Method |     Mean |     Error |    StdDev |
|------- |---------:|----------:|----------:|
| Static | 1.837 us | 0.0926 us | 0.0051 us |
|  Field | 1.829 us | 0.4411 us | 0.0242 us |

Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label Jan 6, 2020

jkotas added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jan 6, 2020

AndyAyersMS removed the untriaged New issue has not been triaged by the area owner label Jan 6, 2020

AndyAyersMS self-assigned this Jan 6, 2020

YairHalberstadt mentioned this issue Jan 6, 2020

Allow explicit control of beforefieldinit (especially with NRTs) dotnet/csharplang#3080

Closed

AndyAyersMS added this to the 5.0 milestone May 6, 2020

AndyAyersMS modified the milestones: 5.0, Future Jun 3, 2020

BruceForstall added the JitUntriaged CLR JIT issues needing additional triage label Oct 28, 2020

BruceForstall removed the JitUntriaged CLR JIT issues needing additional triage label Nov 25, 2020

AndyAyersMS modified the milestones: Future, 6.0.0 Jan 25, 2021

JulieLeeMSFT assigned EgorBo and unassigned AndyAyersMS Jan 26, 2021

EgorBo mentioned this issue Feb 5, 2021

RyuJIT: Inlined "is class statically inited" check #47901

Closed

ghost added the in-pr There is an active PR which will close this issue when it is merged label Feb 5, 2021

ghost removed the in-pr There is an active PR which will close this issue when it is merged label May 10, 2021

EgorBo modified the milestones: 6.0.0, Future May 24, 2021

EgorBo modified the milestones: Future, 8.0.0 Apr 27, 2023

EgorBo closed this as completed Apr 27, 2023

ghost locked as resolved and limited conversation to collaborators May 28, 2023

JulieLeeMSFT added this to .NET Core CodeGen Jun 5, 2024

JulieLeeMSFT moved this to Done in .NET Core CodeGen Jun 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize field access in types not marked with beforefieldinit #1327

Optimize field access in types not marked with beforefieldinit #1327

mikernet commented Jan 6, 2020 •

edited by BruceForstall

Loading

AndyAyersMS commented Jan 6, 2020 •

edited

Loading

jkotas commented Jan 6, 2020

mikernet commented Jan 6, 2020 •

edited

Loading

jkotas commented Jan 6, 2020 •

edited

Loading

mikernet commented Jan 6, 2020 •

edited

Loading

jkotas commented Jan 6, 2020

mikernet commented Jan 6, 2020 •

edited

Loading

AndyAyersMS commented Jan 6, 2020

YairHalberstadt commented Jan 6, 2020

mikernet commented Jan 6, 2020 •

edited

Loading

AndyAyersMS commented Jan 6, 2020 •

edited

Loading

YairHalberstadt commented Jan 6, 2020

AndyAyersMS commented Jan 6, 2020

jkotas commented Jan 6, 2020

mikernet commented Jan 6, 2020 •

edited

Loading

jkotas commented Jan 6, 2020 •

edited

Loading

mikernet commented Jan 6, 2020

AndyAyersMS commented May 6, 2020

AndyAyersMS commented Jun 3, 2020

AndyAyersMS commented Jan 25, 2021

EgorBo commented Feb 3, 2021 •

edited

Loading

EgorBo commented Feb 4, 2021 •

edited

Loading

jkotas commented Feb 4, 2021

EgorBo commented Apr 27, 2023

Optimize field access in types not marked with beforefieldinit #1327

Optimize field access in types not marked with beforefieldinit #1327

Comments

mikernet commented Jan 6, 2020 • edited by BruceForstall Loading

AndyAyersMS commented Jan 6, 2020 • edited Loading

jkotas commented Jan 6, 2020

mikernet commented Jan 6, 2020 • edited Loading

jkotas commented Jan 6, 2020 • edited Loading

mikernet commented Jan 6, 2020 • edited Loading

jkotas commented Jan 6, 2020

mikernet commented Jan 6, 2020 • edited Loading

AndyAyersMS commented Jan 6, 2020

YairHalberstadt commented Jan 6, 2020

mikernet commented Jan 6, 2020 • edited Loading

AndyAyersMS commented Jan 6, 2020 • edited Loading

YairHalberstadt commented Jan 6, 2020

AndyAyersMS commented Jan 6, 2020

jkotas commented Jan 6, 2020

mikernet commented Jan 6, 2020 • edited Loading

jkotas commented Jan 6, 2020 • edited Loading

mikernet commented Jan 6, 2020

AndyAyersMS commented May 6, 2020

AndyAyersMS commented Jun 3, 2020

AndyAyersMS commented Jan 25, 2021

EgorBo commented Feb 3, 2021 • edited Loading

EgorBo commented Feb 4, 2021 • edited Loading

jkotas commented Feb 4, 2021

EgorBo commented Apr 27, 2023

mikernet commented Jan 6, 2020 •

edited by BruceForstall

Loading

AndyAyersMS commented Jan 6, 2020 •

edited

Loading

mikernet commented Jan 6, 2020 •

edited

Loading

jkotas commented Jan 6, 2020 •

edited

Loading

mikernet commented Jan 6, 2020 •

edited

Loading

mikernet commented Jan 6, 2020 •

edited

Loading

mikernet commented Jan 6, 2020 •

edited

Loading

AndyAyersMS commented Jan 6, 2020 •

edited

Loading

mikernet commented Jan 6, 2020 •

edited

Loading

jkotas commented Jan 6, 2020 •

edited

Loading

EgorBo commented Feb 3, 2021 •

edited

Loading

EgorBo commented Feb 4, 2021 •

edited

Loading