Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Lightbulb Perf] Async lightbulb performance improvement #66970

Merged
merged 38 commits into from
Apr 3, 2023

Conversation

mavasani
Copy link
Contributor

@mavasani mavasani commented Feb 21, 2023

Final piece of planned work for #66968
Closes #66968

This PR aims to improve the async lightbulb performance by moving the relatively expensive SymbolStart/SymbolEnd action and SemanticModel analyzers to the Low priority bucket. We ensure this happens in extremely rare cases by performing following additional checks:

  • If analyzer has already executed in background analysis, and has cached diagnostics, we retain the normal pri LB order for its code fixes
  • Otherwise, if the previous snapshot had analyzer diagnostic on the LB span, then it is quite likely that user intends to fix it + it is also pretty likely that the analyzer will report the same diagnostic on edited line. We retain the normal pri LB order and force compute such an analyzer. This check is essentially giving us an implicit syntax filter based on analyzer execution on prior document snapshot.
  • Otherwise, if both the above are false, but the analyzer does not register any SymbolStart or SemanticModel actions, we retain  the normal pri LB order and force complete such analyzers as they have smaller analysis scope and are known not to be observably expensive on line span execution.
  • Finally, if all the above three are false, and the newly added option LightbulbSkipExecutingDeprioritizedAnalyzers = false we de-prioritize the analyzer down to low-pri bucket. If LightbulbSkipExecutingDeprioritizedAnalyzers = true, then we completely drop this analyzer. This case should be extremely rare now so as to not have any user noticeable impact.

PR has following core implementation pieces:

  • Add support for 'Low' priority bucket for async lightbulb. Done with Simplify logic for lightbulb priority classes #67554
  • Introduce CodeActionRequestPriorityProvider that exposes the current request priority and also tracks analyzers that are de-prioritized to CodeActionRequestPriority.Low bucket
  • Add logic in GetDiagnosticsForSpanAsync API to de-prioritize analyzers based on the above conditions.

Performance measurements on large file

Main branch

1-2 progress bar cycles

LightBulb_MainBranch_13k_LOC

PR branch

0.5-1 progress bar cycle

LightBulb_PR_Branch_13k_LOC

Addresses part of dotnet#66968

This PR aims to improve the async lightbulb performance by moving the relatively expensive SymbolStart/SymbolEnd action analyzers to the `Low` priority bucket. See the 2nd observation in the Summary section of dotnet#66968 (comment) for more details.

> The overhead coming from SymbolStart/End analyzers needing to run on all partial types seems to add a significant overhead. So I played around with moving all the SymbolStart/End analyzers and corresponding code fixes to a separate CodeActionPriorityRequest.Low bucket for async lightbulb, and this gives significant performance improvements as we populate all the code fixes from non-SymbolStart/End analyzers and code refactorings in about half a progress bar cycle for OOP, and almost instantaneously for InProc. The async lightbulb does take the required additional one cycle to then populate the fixes for SymbolStart/End analyzers and the Suppress/Configure actions, but that seems completely reasonable to me. We only have a handful of SymbolStart/End analyzers and it seems reasonable to me to move them below to improve the overall lightbulb performance.
@sharwell

This comment was marked as outdated.

@CyrusNajmabadi
Copy link
Member

I'm also hesitant, mostly because I don't understand why symbol start/end (and partial types) are causing such a problem.

This demonstrates that an issue does exist... But I'm this change feels more like a workaround than a fix at the best level.

Do we understand why this sort of analysis is so costly? Could it be because the compiler doesn't know which files to end up analyzing, so it analyzes them all?

@mavasani
Copy link
Contributor Author

I'm also hesitant, mostly because I don't understand why symbol start/end (and partial types) are causing such a problem.

This demonstrates that an issue does exist... But I'm this change feels more like a workaround than a fix at the best level.

Do we understand why this sort of analysis is so costly? Could it be because the compiler doesn't know which files to end up analyzing, so it analyzes them all?

This is by design and nature of SymbolStart/End analyzers. For such analyzer’s diagnostics to be computed for any given line, the analyzer needs to be executed on all symbols/nodes/operations within the containing type symbol of that line, including all partial declarations. So, we basically linearly increase the analyzer execution time with more partial declarations. This is not true for rest of the analyzers as they can compute diagnostics by analyzing just the specific line or entire file at max. These analyzers are expensive by default based on their desired analysis scope, there is nothing we can do to reduce its analysis scope here. Just like we can do nothing to reduce the analysis scope of the most expensive analyzers, which are CompilationEnd analyzers, and hence we don’t support code fixes for CompilationEnd analyzer diagnostics.

IMO moving the expensive lightbulb quick actions, which can only provide quick actions for a very small percentage of lightbulb invocations, down the list doesn’t degrade the experience especially given the big performance gain here for populating rest of the quick actions.

@CyrusNajmabadi
Copy link
Member

This is by design and nature of SymbolStart/End analyzers. For such analyzer’s diagnostics to be computed for any given line, the analyzer needs to be executed on all symbols/nodes/operations within the containing type symbol of that line, including all partial declarations.

Sorry, i'm not being clear. I'm not asking if we can avoid analyzing those other parts. What i'm asking is:

Is it possible that while we should only be analyzing the other parts, that analysis is not being done smartly? for example, the compiler is analyzing unnecessary files? Or, for example, the compiler keeps creating/throwing-away the semantic models for those other parts?

I only bring that up because it's the type of issue that has caused bad performance in other features in the past when dealing with partials, and i want to distinguish essential cost of doing this work versus wasteful costs that aren't needed.

@arunchndr
Copy link
Member

I have viewed the after gif about 20 times now and I can pretty much guarantee it will be the highlight of my week. This is great!

@arunchndr
Copy link
Member

Any telemetry additions that would be needed to minimize risk with taking this change and help with future root causing?

@genlu
Copy link
Member

genlu commented Feb 21, 2023

Are we taking this change while continue investigating how to improve SymbolStart/End analysis in compiler?

@mavasani
Copy link
Contributor Author

Is it possible that while we should only be analyzing the other parts, that analysis is not being done smartly? for example, the compiler is analyzing unnecessary files? Or, for example, the compiler keeps creating/throwing-away the semantic models for those other parts?

Discussed offline with Cyrus. We are not doing any unnecessary work here, it is just the design of SymbolStart/End analyzers that they need to execute on all partial definitions as well as nested type definitions before SymbolEnd is executed to report diagnostics. The repro case above actually has a containing type with 20 partial definitions, so that is leading to such a large overhead for SymbolStart/End analyzers. If there were no partial definitions, then it would lead to no overhead, but on average we would expect at least some overhead on types with partials. I am going to validate the performance overhead for types with single declaration and also share an update.

Additionally, we discussed that currently the analyzer driver supports running only single file analysis or full compilation analysis, and we run analysis for these files sequentially. I am going to experiment enhancing the driver to support multiple file analysis and execute these concurrently and report back the comparisons.

We will try to get more performance data here and try some more approaches and see if the perf benefit can be gained in way that there is not user experience change.

@mavasani
Copy link
Contributor Author

mavasani commented Feb 22, 2023

Update

  1. I verified that the improvement for this PR is only seen when invoking the lightbulb within a type which has multiple partial definitions. The more the number of partials OR the bigger the partial declaratios for non-active document, bigger the performance improvement we see from this PR. Note that the improvement comes for both OutOfProc and InProc analyzer execution.

  2. I collected performance traces for the lightbulb scenario for both OOP and InProc with bits built out of the main branch, using a source file with no partial declarations. I have shared the traces at \\mlangfs1\public\mavasani\LightbulbPerf\Traces. The primary overhead in OOP seems to be coming from building compilations and skeleton references. We are spending almost double the time for it in OOP and consequently spending lesser percentage of time executing analyzers and generating compilation events to drive analysis.

OOP Trace

~38% in CompilationTracker and <20% in AnalyzerDriver/CompilationWithAnalyzers

Name                                                                                                                                                                                                                                                                                                                                        Exc %ExcInc %
Microsoft.CodeAnalysis.Workspaces!Microsoft.CodeAnalysis.SolutionState+CompilationTracker+<GetOrBuildCompilationInfoAsync>d__30.MoveNext()                                                                                                                                                                                                    0.0  1 37.9
Microsoft.CodeAnalysis.Workspaces!Microsoft.CodeAnalysis.SolutionState+CompilationTracker+<FinalizeCompilationAsync>d__38.MoveNext()                                                                                                                                                                                                          0.0  9 36.5
Microsoft.CodeAnalysis.Workspaces!Microsoft.CodeAnalysis.SolutionState+CompilationTracker+<BuildCompilationInfoAsync>d__31.MoveNext()                                                                                                                                                                                                         0.0  0 34.9
mscorlib!System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[Microsoft.CodeAnalysis.SolutionState+CompilationTracker+CompilationInfo].Start(!!0&)                                                                                                                                                                                        0.0  0 34.7
Microsoft.CodeAnalysis.Workspaces!Microsoft.CodeAnalysis.SolutionState+CompilationTracker.BuildCompilationInfoAsync(class Microsoft.CodeAnalysis.SolutionState,value class System.Threading.CancellationToken)                                                                                                                                0.0  0 34.7
Microsoft.CodeAnalysis.Workspaces!Microsoft.CodeAnalysis.SolutionState+CompilationTracker.GetOrBuildCompilationInfoAsync(class Microsoft.CodeAnalysis.SolutionState,bool,value class System.Threading.CancellationToken)                                                                                                                      0.0  0 34.7
Microsoft.CodeAnalysis.Workspaces!Microsoft.CodeAnalysis.SolutionState+CompilationTracker.FinalizeCompilationAsync(class Microsoft.CodeAnalysis.SolutionState,class Microsoft.CodeAnalysis.Compilation,value class CompilationTrackerGeneratorInfo,class Microsoft.CodeAnalysis.Compilation,value class System.Threading.CancellationToken)  0.0  0 33.8
mscorlib.ni!System.Threading.Tasks.Task.Finish(Boolean)                                                                                                                                                                                                                                                                                       0.0  1 27.9
Microsoft.CodeAnalysis.Workspaces!Microsoft.CodeAnalysis.SolutionState+CompilationTracker+<GetCompilationSlowAsync>d__29.MoveNext()                                                                                                                                                                                                           0.0  0 27.2
Name                                                                                                                                                                                                                                                               Exc %ExcInc %
Microsoft.CodeAnalysis.Workspaces!Microsoft.CodeAnalysis.SolutionState+<GetMetadataReferenceAsync>d__156.MoveNext()                                                                                                                                                  0.0  0 26.6
Microsoft.CodeAnalysis.Workspaces!Microsoft.CodeAnalysis.SolutionState+SkeletonReferenceCache+<GetOrBuildReferenceAsync>d__9.MoveNext()                                                                                                                              0.0  0 26.3
Microsoft.CodeAnalysis.Workspaces!Microsoft.CodeAnalysis.SolutionState+SkeletonReferenceCache+<CreateSkeletonReferenceSetAsync>d__11.MoveNext()                                                                                                                      0.0  0 26.3
Microsoft.CodeAnalysis.Workspaces!Microsoft.CodeAnalysis.SolutionState+SkeletonReferenceCache+<TryGetOrCreateReferenceSetAsync>d__10.MoveNext()                                                                                                                      0.0  0 26.3
mscorlib!System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[Microsoft.CodeAnalysis.Diagnostics.AnalyzerActions].SetResult(!0)                                                                                                                                  0.0  3 25.6
mscorlib!System.Threading.Tasks.Task`1[Microsoft.CodeAnalysis.Diagnostics.AnalyzerActions].TrySetResult(!0)                                                                                                                                                          0.0  3 25.6
Microsoft.CodeAnalysis!Microsoft.CodeAnalysis.Diagnostics.AnalyzerDriver+<GetAnalyzerActionsAsync>d__150.MoveNext()                                                                                                                                                  0.0 14 25.6
Microsoft.CodeAnalysis!Microsoft.CodeAnalysis.Diagnostics.AnalyzerManager+<GetAnalyzerActionsAsync>d__12.MoveNext()                                                                                                                                                  0.0 20 25.1
Microsoft.CodeAnalysis.Workspaces!Roslyn.Utilities.AsyncLazy`1[System.__Canon].GetValueAsync(value class System.Threading.CancellationToken)                                                                                                                         0.0  1 24.9
Microsoft.CodeAnalysis.Workspaces!Roslyn.Utilities.AsyncLazy`1[System.__Canon].StartAsynchronousComputation(value class AsynchronousComputationToStart,class Request,value class System.Threading.CancellationToken)                                                 0.0  1 24.9
Microsoft.CodeAnalysis.Workspaces!Microsoft.CodeAnalysis.SolutionState+CompilationTracker.GetCompilationAsync(class Microsoft.CodeAnalysis.SolutionState,value class System.Threading.CancellationToken)                                                             0.0  1 24.2
Microsoft.CodeAnalysis.Workspaces!Microsoft.CodeAnalysis.SolutionState+CompilationTracker.GetCompilationSlowAsync(class Microsoft.CodeAnalysis.SolutionState,value class System.Threading.CancellationToken)                                                         0.0  0 24.1
Microsoft.CodeAnalysis.Workspaces!SolutionState.GetMetadataReferenceAsync                                                                                                                                                                                            0.0  0 23.0
Microsoft.CodeAnalysis.Workspaces!SolutionState.GetMetadataReferenceAsync                                                                                                                                                                                            0.0  0 23.0
Microsoft.CodeAnalysis.Workspaces!Microsoft.CodeAnalysis.SolutionState+SkeletonReferenceCache.CreateSkeletonSet(class Microsoft.CodeAnalysis.Host.SolutionServices,class Microsoft.CodeAnalysis.Compilation,value class System.Threading.CancellationToken)          0.0  0 23.0
Microsoft.CodeAnalysis.Workspaces!Microsoft.CodeAnalysis.SolutionState+SkeletonReferenceCache+<>c__DisplayClass11_1.<CreateSkeletonReferenceSetAsync>b__1(value class System.Threading.CancellationToken)                                                            0.0  0 23.0
Microsoft.CodeAnalysis.Workspaces!Microsoft.CodeAnalysis.SolutionState+SkeletonReferenceCache.TryCreateMetadataStorage(class Microsoft.CodeAnalysis.Host.SolutionServices,class Microsoft.CodeAnalysis.Compilation,value class System.Threading.CancellationToken)  0.0  0 23.0

InProc trace

~40% in AnalyzerDriver/CompilationWithAnalyzers, ~20% in CompilationTracker

Name                                                                                                                                                                    Exc %ExcInc %
mscorlib!System.Threading.Tasks.Task`1[Microsoft.CodeAnalysis.Diagnostics.AnalyzerActions].TrySetResult(!0)                                                               0.0  2 41.1
mscorlib!System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[Microsoft.CodeAnalysis.Diagnostics.AnalyzerActions].SetResult(!0)                                       0.0  1 41.1
Microsoft.CodeAnalysis!Microsoft.CodeAnalysis.Diagnostics.AnalyzerDriver+<GetAnalyzerActionsAsync>d__150.MoveNext()                                                       0.0 25 40.7
Microsoft.CodeAnalysis!Microsoft.CodeAnalysis.Diagnostics.AnalyzerManager+<GetAnalyzerActionsAsync>d__12.MoveNext()                                                       0.0 25 40.6
mscorlib.ni!System.Threading.Tasks.Task`1[System.Threading.Tasks.VoidTaskResult].TrySetResult(System.Threading.Tasks.VoidTaskResult)                                      0.0  0 34.5
mscorlib.ni!System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[System.Threading.Tasks.VoidTaskResult].SetResult(System.Threading.Tasks.VoidTaskResult)              0.0  0 30.5
mscorlib.ni!System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[System.Threading.Tasks.VoidTaskResult].SetResult(System.Threading.Tasks.Task`1)                      0.0  0 30.5
Microsoft.CodeAnalysis!Microsoft.CodeAnalysis.Diagnostics.AnalyzerManager+<GetCompilationAnalysisScopeAsync>d__6.MoveNext()                                               0.0  8 27.0
Microsoft.CodeAnalysis!Microsoft.CodeAnalysis.Diagnostics.AnalyzerManager+<GetCompilationAnalysisScopeCoreAsync>d__7.MoveNext()                                           0.0  6 26.8
mscorlib.ni!System.Threading.Tasks.Task.Execute()                                                                                                                         0.0  3 25.6
mscorlib.ni!System.Threading.Tasks.Task`1[System.__Canon].InnerInvoke()                                                                                                   0.0  1 25.1
Microsoft.CodeAnalysis!Microsoft.CodeAnalysis.Diagnostics.AnalyzerDriver+<>c__DisplayClass87_0+<<Initialize>b__0>d.MoveNext()                                             0.0  0 24.5
mscorlib!System.Threading.Tasks.Task`1[System.ValueTuple`2[Microsoft.CodeAnalysis.Diagnostics.AnalyzerActions,System.__Canon]].TrySetResult(!0)                           0.0  0 23.8
mscorlib!System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[System.ValueTuple`2[Microsoft.CodeAnalysis.Diagnostics.AnalyzerActions,System.__Canon]].SetResult(!0)  0.0  0 23.8
Microsoft.CodeAnalysis!Microsoft.CodeAnalysis.Diagnostics.CompilationWithAnalyzers+<ComputeAnalyzerDiagnosticsAsync>d__59.MoveNext()                                      0.0  3 23.6
Name                                                                                                                                                                                                                                                                                                                                        Exc %   ExcInc %
Microsoft.CodeAnalysis.Workspaces!Microsoft.CodeAnalysis.SolutionState+CompilationTracker+<FinalizeCompilationAsync>d__38.MoveNext()                                                                                                                                                                                                          0.0    10 20.9
Microsoft.CodeAnalysis.Workspaces!Microsoft.CodeAnalysis.SolutionState+CompilationTracker+<GetOrBuildCompilationInfoAsync>d__30.MoveNext()                                                                                                                                                                                                    0.0     1 20.3
Microsoft.CodeAnalysis.Workspaces!Microsoft.CodeAnalysis.ProjectState+ProjectSyntaxTreeOptionsProvider.TryGetDiagnosticValue(class Microsoft.CodeAnalysis.SyntaxTree,class System.String,value class System.Threading.CancellationToken,value class Microsoft.CodeAnalysis.ReportDiagnostic&)                                                 3.5 3,689 19.8
mscorlib!System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[System.Collections.Immutable.ImmutableArray`1[System.__Canon]].Start(!!0&)                                                                                                                                                                                                  0.0     2 19.8
Microsoft.CodeAnalysis.Workspaces!Microsoft.CodeAnalysis.SolutionState+CompilationTracker+<BuildCompilationInfoAsync>d__31.MoveNext()                                                                                                                                                                                                         0.0     0 19.7
mscorlib!System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[Microsoft.CodeAnalysis.SolutionState+CompilationTracker+CompilationInfo].Start(!!0&)                                                                                                                                                                                        0.0     0 19.3
Microsoft.CodeAnalysis.Workspaces!Microsoft.CodeAnalysis.SolutionState+CompilationTracker.BuildCompilationInfoAsync(class Microsoft.CodeAnalysis.SolutionState,value class System.Threading.CancellationToken)                                                                                                                                0.0     0 19.3
Microsoft.CodeAnalysis.Workspaces!Microsoft.CodeAnalysis.SolutionState+CompilationTracker.FinalizeCompilationAsync(class Microsoft.CodeAnalysis.SolutionState,class Microsoft.CodeAnalysis.Compilation,value class CompilationTrackerGeneratorInfo,class Microsoft.CodeAnalysis.Compilation,value class System.Threading.CancellationToken)  0.0     0 19.3
Microsoft.CodeAnalysis.Workspaces!Microsoft.CodeAnalysis.SolutionState+CompilationTracker.GetOrBuildCompilationInfoAsync(class Microsoft.CodeAnalysis.SolutionState,bool,value class System.Threading.CancellationToken)                                                                                                                      0.0     0 19.2
Name                                                                                                                                                                                                                                                                                                       Exc %ExcInc %
Microsoft.CodeAnalysis.Workspaces!Microsoft.CodeAnalysis.SolutionState+SkeletonReferenceCache+<CreateSkeletonReferenceSetAsync>d__11.MoveNext()                                                                                                                                                              0.0  0 13.9
Microsoft.CodeAnalysis.Workspaces!Microsoft.CodeAnalysis.SolutionState+SkeletonReferenceCache+<TryGetOrCreateReferenceSetAsync>d__10.MoveNext()                                                                                                                                                              0.0  0 13.9
Microsoft.CodeAnalysis.Workspaces!NamedTypeSymbolReferenceFinder.DetermineGlobalAliasesAsync                                                                                                                                                                                                                 0.0  0 13.8
Microsoft.CodeAnalysis.Workspaces!SolutionState.GetMetadataReferenceAsync                                                                                                                                                                                                                                    0.0  0 13.7
Microsoft.CodeAnalysis.Workspaces!SolutionState.GetMetadataReferenceAsync                                                                                                                                                                                                                                    0.0  0 13.7
Microsoft.CodeAnalysis.Workspaces!Microsoft.CodeAnalysis.SolutionState+SkeletonReferenceCache+<>c__DisplayClass11_1.<CreateSkeletonReferenceSetAsync>b__1(value class System.Threading.CancellationToken)                                                                                                    0.0  0 13.6
Microsoft.CodeAnalysis.Workspaces!Microsoft.CodeAnalysis.SolutionState+SkeletonReferenceCache.GetOrBuildReferenceAsync(class ICompilationTracker,class Microsoft.CodeAnalysis.SolutionState,value class Microsoft.CodeAnalysis.MetadataReferenceProperties,value class System.Threading.CancellationToken)  0.0  0 13.6
Microsoft.CodeAnalysis.Workspaces!Microsoft.CodeAnalysis.SolutionState+SkeletonReferenceCache.CreateSkeletonSet(class Microsoft.CodeAnalysis.Host.SolutionServices,class Microsoft.CodeAnalysis.Compilation,value class System.Threading.CancellationToken)                                                  0.0  0 13.6
Microsoft.CodeAnalysis.Workspaces!Microsoft.CodeAnalysis.SolutionState+SkeletonReferenceCache.TryCreateMetadataStorage(class Microsoft.CodeAnalysis.Host.SolutionServices,class Microsoft.CodeAnalysis.Compilation,value class System.Threading.CancellationToken)                                           0.0  0 13.6
Microsoft.CodeAnalysis!Compilation.Emit                                                                                                                                                                                                                                                                      0.0  0 13.6
Microsoft.CodeAnalysis!Compilation.Emit                                                                                                                                                                                                                                                                      0.0  0 13.6
Microsoft.CodeAnalysis.Workspaces!Microsoft.CodeAnalysis.SolutionState+CompilationTracker.GetCompilationAsync(class Microsoft.CodeAnalysis.SolutionState,value class System.Threading.CancellationToken)                                                                                                     0.0  0 13.6
Microsoft.CodeAnalysis.Workspaces!Microsoft.CodeAnalysis.SolutionState+CompilationTracker.GetCompilationSlowAsync(class Microsoft.CodeAnalysis.SolutionState,value class System.Threading.CancellationToken)                                                                                                 0.0  0 13.6
Microsoft.CodeAnalysis.Workspaces!Microsoft.CodeAnalysis.SolutionState+SkeletonReferenceCache.CreateSkeletonReferenceSetAsync(class ICompilationTracker,class Microsoft.CodeAnalysis.SolutionState,value class System.Threading.CancellationToken)                                                           0.0  0 13.6
Microsoft.CodeAnalysis.Workspaces!Microsoft.CodeAnalysis.SolutionState+SkeletonReferenceCache.TryGetOrCreateReferenceSetAsync(class ICompilationTracker,class Microsoft.CodeAnalysis.SolutionState,value class Microsoft.CodeAnalysis.VersionStamp,value class System.Threading.CancellationToken)           0.0  0 13.6

@CyrusNajmabadi
Copy link
Member

wrt skeletons. Though should only be being built once (unless the project they are built for is changed itself). Are you seeing more than that?

@CyrusNajmabadi
Copy link
Member

@mavasani Your traces are interesting. When i open the _OutOfProc one i still see a ton of diagnostics work in process:

image

Taht seems bad.

Second, looking at OOP, i can see a ton of time is being spent because FAR-engine is running (potentially because of codelens or inheritance margin). This is going and doing work that forces source-generators to run (since it needs compilations):

image

And, as you said, it's also building skeletons, which we know are much much slower now due to SGs as well:

image

Thsi is the same issue i raised here #63891. But it looks like no traction has been made on it. @chsienki can the compiler do something here to make ref-emit fast with source generators. This is really killing IDE perf and has been doing so for many months now.

@mavasani
Copy link
Contributor Author

wrt skeletons. Though should only be being built once (unless the project they are built for is changed itself). Are you seeing more than that?

I'll debug and try to get more data here.

@mavasani
Copy link
Contributor Author

mavasani commented Feb 23, 2023

When i open the _OutOfProc one i still see a ton of diagnostics work in process:

That is coming from PreviewWorkspace diagnostic computation always running InProc. See the first bullet point in #66968 (comment). Note that even if I completely turn off PreviewWorkspace diagnostic computation, it doesn't seem to improve the lightbulb population time (this is as expected as the preview will be shown after quick actions have been populated and user selects one of the items). This still seems something that should be fixed, I filed #67014 to track this work.

@mavasani
Copy link
Contributor Author

Second, looking at OOP, i can see a ton of time is being spent because FAR-engine is running (potentially because of codelens or inheritance margin). This is going and doing work that forces source-generators to run (since it needs compilations)

Wouldn't this need to run InProc when we turn off OOP? Why the slowness in OOP compared to InProc for building skeletons and compilations, running generators, etc.?

@mavasani
Copy link
Contributor Author

Additionally, we discussed that currently the analyzer driver supports running only single file analysis or full compilation analysis, and we run analysis for these files sequentially. I am going to experiment enhancing the driver to support multiple file analysis and execute these concurrently and report back the comparisons.

@CyrusNajmabadi @arkalyanms I experimented a bit today to see if adding more concurrent execution for symbol start analyzers that need to analyze all the partial decl files helps us get the performance improvements that this current PR is giving, and it turns out not to be the case. Details below:

  1. Below is the core method that does all the work for SymbolStart/End analysis, which includes identifying the set of partial decl trees to analyze, generating compilation events for these trees (via SemanticModel.GetDiagnostics() calls) and subsequently populating the given builder so the caller makes analyzer callbacks for these symbolStart analyzers:

    static void processSymbolStartAnalyzers(
    SyntaxTree tree,
    ImmutableArray<CompilationEvent> compilationEventsForTree,
    ImmutableArray<DiagnosticAnalyzer> symbolStartAnalyzers,
    Compilation compilation,
    AnalysisResultBuilder analysisResultBuilder,
    ArrayBuilder<(AnalysisScope, ImmutableArray<CompilationEvent>)> builder,
    ImmutableArray<AdditionalText> additionalFiles,
    bool concurrentAnalysis,
    CancellationToken cancellationToken)
    {
    // This method processes all the compilation events generated for the tree in the
    // original requested analysis scope to identify symbol declared events whose symbol
    // declarations span across different trees. For the given symbolStartAnalyzers to
    // report the correct set of diagnostics for the original tree/span, we need to
    // execute them on all the partial declarations of the symbols across these different trees,
    // followed by the SymbolEnd action at the end.
    // This method computes these set of trees with partial declarations, and adds
    // analysis scopes to the 'builder' for each of these trees, along with the corresponding
    // compilation events generated for each tree.
    var partialTrees = PooledHashSet<SyntaxTree>.GetInstance();
    partialTrees.Add(tree);
    try
    {
    // Gather all trees with symbol declarations events in original analysis scope, except namespace symbols.
    foreach (var compilationEvent in compilationEventsForTree)
    {
    if (compilationEvent is SymbolDeclaredCompilationEvent symbolDeclaredEvent &&
    symbolDeclaredEvent.Symbol.Kind != SymbolKind.Namespace)
    {
    foreach (var location in symbolDeclaredEvent.Symbol.Locations)
    {
    if (location.SourceTree != null)
    {
    partialTrees.Add(location.SourceTree);
    }
    }
    }
    }
    // Next, generate compilation events for each of the partial trees
    // and add the (analysisScope, compilationEvents) tuple for each tree to the builder.
    // Process all trees sequentially: this is required to ensure the appropriate
    // compilation events are mapped to the tree for which they are generated.
    foreach (var partialTree in partialTrees)
    {
    if (tryProcessTree(partialTree, out var analysisScopeAndEvents))
    {
    builder.Add((analysisScopeAndEvents.Value.scope, analysisScopeAndEvents.Value.events));
    }
    }
    }
    finally
    {
    partialTrees.Free();
    }
    bool tryProcessTree(SyntaxTree partialTree, [NotNullWhen(true)] out (AnalysisScope scope, ImmutableArray<CompilationEvent> events)? scopeAndEvents)
    {
    scopeAndEvents = null;
    var file = new SourceOrAdditionalFile(partialTree);
    var analysisScope = new AnalysisScope(symbolStartAnalyzers, file, filterSpan: null,
    isSyntacticSingleFileAnalysis: false, concurrentAnalysis, categorizeDiagnostics: true);
    analysisScope = GetPendingAnalysisScope(analysisScope, analysisResultBuilder);
    if (analysisScope == null)
    return false;
    var compilationEvents = GetCompilationEventsForSingleFileAnalysis(compilation, analysisScope, additionalFiles, hasAnyActionsRequiringCompilationEvents: true, cancellationToken);
    // Include the already generated compilations events for the primary tree.
    if (partialTree == tree)
    {
    compilationEvents = compilationEventsForTree.AddRange(compilationEvents);
    // We shouldn't have any duplicate events.
    Debug.Assert(compilationEvents.Distinct().Length == compilationEvents.Length);
    }
    scopeAndEvents = (analysisScope, compilationEvents);
    return true;
    }
    }
    }

  2. I tweaked this code so that the method returns upfront and becomes a no-op. So basically, no generation of compilation events and no analyzer execution for symbolStart analyzers. As expected, this gives us the same lightbulb performance as this PR does, but without the Add readonly modifier code fix.

  3. Next, I reverted the above change and tweaked this method by changing the below foreach loop so that the body of the loop only contains compilation.GetSemanticModel(partialTree).GetDiagnostics(cancellationToken: cancellationToken); instead of tryProcessTree call followed by builder.Add. So basically, we are sequentially generating compilation events for all partial trees, but discarding these events so no analyzer callbacks are made for symbolStart analyzers. This pushes the lightbulb time back up to very close to the original code, confirming that the actual analyzer execution is not expensive here. The primary cost is coming from SemanticModel.GetDiagnostics() calls to generate compilation events.
    Original code

    // Next, generate compilation events for each of the partial trees
    // and add the (analysisScope, compilationEvents) tuple for each tree to the builder.
    // Process all trees sequentially: this is required to ensure the appropriate
    // compilation events are mapped to the tree for which they are generated.
    foreach (var partialTree in partialTrees)
    {
    if (tryProcessTree(partialTree, out var analysisScopeAndEvents))
    {
    builder.Add((analysisScopeAndEvents.Value.scope, analysisScopeAndEvents.Value.events));
    }

    Changed code

                     foreach (var partialTree in partialTrees)
                     {
                         compilation.GetSemanticModel(partialTree).GetDiagnostics(cancellationToken: cancellationToken);
                     }
  4. Next, I replaced the above foreach loop with the below code to do this concurrently for all partial trees. This did not seem to give any significant performance boost over the prior sequential event generation.

                     await Task.WhenAll(partialTrees.Select(partialTree =>
                         Task.Run(() =>
                         {
                             compilation.GetSemanticModel(partialTree).GetDiagnostics(cancellationToken: cancellationToken);
                         }, cancellationToken))).ConfigureAwait(false);

Summary

I don't believe there is much we can do here for improving the performance for SymbolStart analyzers, as by design they analyze a potentially bigger analysis scope and hence will be slower for types with partial declarations.

The only other possible approach to improve the performance here (which could benefit all analyzer execution, not just the symbol start ones), is to move away from invoking SemanticModel.GetDiagnostics() to generate compilation events to drive analysis. Instead we can scan the file, look for nodes that declare symbols, synthesize SymbolDeclared events for these, and populate a synthesized event queue and hand it to the driver for analysis. I am going to play around with trying this approach tomorrow, though I am not sure how complex change this would be. I'll share an update early next week on this experiment.

@CyrusNajmabadi
Copy link
Member

CyrusNajmabadi commented Feb 23, 2023

Next, I replaced the above foreach loop with the below code to do this concurrently for all partial trees. This did not seem to give any significant performance boost over the prior sequential event generation

Note: this should be tested on .net core. I've seen a lot of cases where parallel is unhelpful on standard but great on core.

I would also want to understand why this doesn't help. It's something blocking the scaling improvement we would expect here?

For example (though doubtful), if GetDiagnostics immediately took a big fat lock, we would get no benefit being parallel. It would be good to understand this better.

@CyrusNajmabadi
Copy link
Member

Instead we can scan the file, look for nodes that declare symbols, synthesize SymbolDeclared events for these, and populate a synthesized event queue and hand it to the driver for analysis.

This seems very intriguing. I like the idea that we are doing focused driving vs just calling into a general method that may do far to much work in an inefficient fashion.

@CyrusNajmabadi
Copy link
Member

@genlu should be able to help you test .net core for oop.

It would be also good to test server gc for oop.

@mavasani
Copy link
Contributor Author

mavasani commented Feb 24, 2023

Instead we can scan the file, look for nodes that declare symbols, synthesize SymbolDeclared events for these, and populate a synthesized event queue and hand it to the driver for analysis.

This seems very intriguing. I like the idea that we are doing focused driving vs just calling into a general method that may do far to much work in an inefficient fashion.

I tried this experiment with mavasani@afcf909 and it doesn't seem to help much, there is only a slight improvement. Collected trace still shows majority of work happening in CompilationTracker. I'll continue investigating why this difference between InProc and OutOfProc.

IMO, this PR is giving a noticeable improvement for populating initial set of quick actions for both InProc and OutOfProc for types with partial declarations. I searched for CAxxxx and IDExxxx analyzers in roslyn-analyzers repo and roslyn repo and only the following 3 SymbolStart analyzers have code fixes:

  1. IDE0044: Add readonly modifier
  2. IDE0059: Remove unnecessary value assignment (this is also a flow-analysis based analyzer, so is known to be relatively expensive).
  3. CA1822: Mark members as static.

I feel we should take this PR regardless of other performance investigations and improvements in this space. I'll let @arkalyanms make a call here.

@mavasani
Copy link
Contributor Author

Next, I replaced the above foreach loop with the below code to do this concurrently for all partial trees. This did not seem to give any significant performance boost over the prior sequential event generation

Note: this should be tested on .net core. I've seen a lot of cases where parallel is unhelpful on standard but great on core.

I would also want to understand why this doesn't help. It's something blocking the scaling improvement we would expect here?

For example (though doubtful), if GetDiagnostics immediately took a big fat lock, we would get no benefit being parallel. It would be good to understand this better.

I can give this a try, but the traces are clearly indicating the majority of overhead is coming from CompilationTracker and building Skeleton assemblies in OOP. I'll first try and dig into this before trying other optimizations.

@mavasani
Copy link
Contributor Author

mavasani commented Apr 3, 2023

@CyrusNajmabadi unit test added with 09544dd

@mavasani mavasani enabled auto-merge April 3, 2023 17:34
@mavasani mavasani merged commit 7be5509 into dotnet:main Apr 3, 2023
@ghost ghost added this to the Next milestone Apr 3, 2023
@mavasani mavasani deleted the SymbolStartAnalyzers branch April 3, 2023 21:41
@dibarbet dibarbet modified the milestones: Next, 17.7 P1 Apr 25, 2023
@mavasani mavasani added the Performance-Scenario-Diagnostics This issue affects diagnostics computation performance for lightbulb, background analysis, tagger. label May 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-IDE Performance-Scenario-Diagnostics This issue affects diagnostics computation performance for lightbulb, background analysis, tagger.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Lightbulb performance improvements
7 participants