[Lightbulb Perf] Async lightbulb performance improvement #66970

mavasani · 2023-02-21T07:28:36Z

Final piece of planned work for #66968
Closes #66968

This PR aims to improve the async lightbulb performance by moving the relatively expensive SymbolStart/SymbolEnd action and SemanticModel analyzers to the Low priority bucket. We ensure this happens in extremely rare cases by performing following additional checks:

If analyzer has already executed in background analysis, and has cached diagnostics, we retain the normal pri LB order for its code fixes
Otherwise, if the previous snapshot had analyzer diagnostic on the LB span, then it is quite likely that user intends to fix it + it is also pretty likely that the analyzer will report the same diagnostic on edited line. We retain the normal pri LB order and force compute such an analyzer. This check is essentially giving us an implicit syntax filter based on analyzer execution on prior document snapshot.
Otherwise, if both the above are false, but the analyzer does not register any SymbolStart or SemanticModel actions, we retain the normal pri LB order and force complete such analyzers as they have smaller analysis scope and are known not to be observably expensive on line span execution.
Finally, if all the above three are false, and the newly added option LightbulbSkipExecutingDeprioritizedAnalyzers = false we de-prioritize the analyzer down to low-pri bucket. If LightbulbSkipExecutingDeprioritizedAnalyzers = true, then we completely drop this analyzer. This case should be extremely rare now so as to not have any user noticeable impact.

PR has following core implementation pieces:

~~Add support for 'Low' priority bucket for async lightbulb.~~ Done with Simplify logic for lightbulb priority classes #67554
Introduce CodeActionRequestPriorityProvider that exposes the current request priority and also tracks analyzers that are de-prioritized to CodeActionRequestPriority.Low bucket
Add logic in GetDiagnosticsForSpanAsync API to de-prioritize analyzers based on the above conditions.

Performance measurements on large file

Main branch

1-2 progress bar cycles

PR branch

0.5-1 progress bar cycle

Addresses part of dotnet#66968 This PR aims to improve the async lightbulb performance by moving the relatively expensive SymbolStart/SymbolEnd action analyzers to the `Low` priority bucket. See the 2nd observation in the Summary section of dotnet#66968 (comment) for more details. > The overhead coming from SymbolStart/End analyzers needing to run on all partial types seems to add a significant overhead. So I played around with moving all the SymbolStart/End analyzers and corresponding code fixes to a separate CodeActionPriorityRequest.Low bucket for async lightbulb, and this gives significant performance improvements as we populate all the code fixes from non-SymbolStart/End analyzers and code refactorings in about half a progress bar cycle for OOP, and almost instantaneously for InProc. The async lightbulb does take the required additional one cycle to then populate the fixes for SymbolStart/End analyzers and the Suppress/Configure actions, but that seems completely reasonable to me. We only have a handful of SymbolStart/End analyzers and it seems reasonable to me to move them below to improve the overall lightbulb performance.

CyrusNajmabadi · 2023-02-21T16:19:15Z

I'm also hesitant, mostly because I don't understand why symbol start/end (and partial types) are causing such a problem.

This demonstrates that an issue does exist... But I'm this change feels more like a workaround than a fix at the best level.

Do we understand why this sort of analysis is so costly? Could it be because the compiler doesn't know which files to end up analyzing, so it analyzes them all?

mavasani · 2023-02-21T17:17:33Z

I'm also hesitant, mostly because I don't understand why symbol start/end (and partial types) are causing such a problem.

This demonstrates that an issue does exist... But I'm this change feels more like a workaround than a fix at the best level.

Do we understand why this sort of analysis is so costly? Could it be because the compiler doesn't know which files to end up analyzing, so it analyzes them all?

This is by design and nature of SymbolStart/End analyzers. For such analyzer’s diagnostics to be computed for any given line, the analyzer needs to be executed on all symbols/nodes/operations within the containing type symbol of that line, including all partial declarations. So, we basically linearly increase the analyzer execution time with more partial declarations. This is not true for rest of the analyzers as they can compute diagnostics by analyzing just the specific line or entire file at max. These analyzers are expensive by default based on their desired analysis scope, there is nothing we can do to reduce its analysis scope here. Just like we can do nothing to reduce the analysis scope of the most expensive analyzers, which are CompilationEnd analyzers, and hence we don’t support code fixes for CompilationEnd analyzer diagnostics.

IMO moving the expensive lightbulb quick actions, which can only provide quick actions for a very small percentage of lightbulb invocations, down the list doesn’t degrade the experience especially given the big performance gain here for populating rest of the quick actions.

CyrusNajmabadi · 2023-02-21T17:32:42Z

This is by design and nature of SymbolStart/End analyzers. For such analyzer’s diagnostics to be computed for any given line, the analyzer needs to be executed on all symbols/nodes/operations within the containing type symbol of that line, including all partial declarations.

Sorry, i'm not being clear. I'm not asking if we can avoid analyzing those other parts. What i'm asking is:

Is it possible that while we should only be analyzing the other parts, that analysis is not being done smartly? for example, the compiler is analyzing unnecessary files? Or, for example, the compiler keeps creating/throwing-away the semantic models for those other parts?

I only bring that up because it's the type of issue that has caused bad performance in other features in the past when dealing with partials, and i want to distinguish essential cost of doing this work versus wasteful costs that aren't needed.

arunchndr · 2023-02-21T19:17:31Z

I have viewed the after gif about 20 times now and I can pretty much guarantee it will be the highlight of my week. This is great!

arunchndr · 2023-02-21T20:35:23Z

Any telemetry additions that would be needed to minimize risk with taking this change and help with future root causing?

genlu · 2023-02-21T22:32:10Z

Are we taking this change while continue investigating how to improve SymbolStart/End analysis in compiler?

mavasani · 2023-02-22T04:02:52Z

Is it possible that while we should only be analyzing the other parts, that analysis is not being done smartly? for example, the compiler is analyzing unnecessary files? Or, for example, the compiler keeps creating/throwing-away the semantic models for those other parts?

Discussed offline with Cyrus. We are not doing any unnecessary work here, it is just the design of SymbolStart/End analyzers that they need to execute on all partial definitions as well as nested type definitions before SymbolEnd is executed to report diagnostics. The repro case above actually has a containing type with 20 partial definitions, so that is leading to such a large overhead for SymbolStart/End analyzers. If there were no partial definitions, then it would lead to no overhead, but on average we would expect at least some overhead on types with partials. I am going to validate the performance overhead for types with single declaration and also share an update.

Additionally, we discussed that currently the analyzer driver supports running only single file analysis or full compilation analysis, and we run analysis for these files sequentially. I am going to experiment enhancing the driver to support multiple file analysis and execute these concurrently and report back the comparisons.

We will try to get more performance data here and try some more approaches and see if the perf benefit can be gained in way that there is not user experience change.

mavasani · 2023-02-22T10:09:58Z

Update

I verified that the improvement for this PR is only seen when invoking the lightbulb within a type which has multiple partial definitions. The more the number of partials OR the bigger the partial declaratios for non-active document, bigger the performance improvement we see from this PR. Note that the improvement comes for both OutOfProc and InProc analyzer execution.
I collected performance traces for the lightbulb scenario for both OOP and InProc with bits built out of the main branch, using a source file with no partial declarations. I have shared the traces at \\mlangfs1\public\mavasani\LightbulbPerf\Traces. The primary overhead in OOP seems to be coming from building compilations and skeleton references. We are spending almost double the time for it in OOP and consequently spending lesser percentage of time executing analyzers and generating compilation events to drive analysis.

OOP Trace

~38% in CompilationTracker and <20% in AnalyzerDriver/CompilationWithAnalyzers

Name	Exc %	Exc	Inc %
Microsoft.CodeAnalysis.Workspaces!Microsoft.CodeAnalysis.SolutionState+CompilationTracker+<GetOrBuildCompilationInfoAsync>d__30.MoveNext()	0.0	1	37.9
Microsoft.CodeAnalysis.Workspaces!Microsoft.CodeAnalysis.SolutionState+CompilationTracker+<FinalizeCompilationAsync>d__38.MoveNext()	0.0	9	36.5
Microsoft.CodeAnalysis.Workspaces!Microsoft.CodeAnalysis.SolutionState+CompilationTracker+<BuildCompilationInfoAsync>d__31.MoveNext()	0.0	0	34.9
mscorlib!System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[Microsoft.CodeAnalysis.SolutionState+CompilationTracker+CompilationInfo].Start(!!0&)	0.0	0	34.7
Microsoft.CodeAnalysis.Workspaces!Microsoft.CodeAnalysis.SolutionState+CompilationTracker.BuildCompilationInfoAsync(class Microsoft.CodeAnalysis.SolutionState,value class System.Threading.CancellationToken)	0.0	0	34.7
Microsoft.CodeAnalysis.Workspaces!Microsoft.CodeAnalysis.SolutionState+CompilationTracker.GetOrBuildCompilationInfoAsync(class Microsoft.CodeAnalysis.SolutionState,bool,value class System.Threading.CancellationToken)	0.0	0	34.7
Microsoft.CodeAnalysis.Workspaces!Microsoft.CodeAnalysis.SolutionState+CompilationTracker.FinalizeCompilationAsync(class Microsoft.CodeAnalysis.SolutionState,class Microsoft.CodeAnalysis.Compilation,value class CompilationTrackerGeneratorInfo,class Microsoft.CodeAnalysis.Compilation,value class System.Threading.CancellationToken)	0.0	0	33.8
mscorlib.ni!System.Threading.Tasks.Task.Finish(Boolean)	0.0	1	27.9
Microsoft.CodeAnalysis.Workspaces!Microsoft.CodeAnalysis.SolutionState+CompilationTracker+<GetCompilationSlowAsync>d__29.MoveNext()	0.0	0	27.2

Name	Exc %	Exc	Inc %
Microsoft.CodeAnalysis.Workspaces!Microsoft.CodeAnalysis.SolutionState+<GetMetadataReferenceAsync>d__156.MoveNext()	0.0	0	26.6
Microsoft.CodeAnalysis.Workspaces!Microsoft.CodeAnalysis.SolutionState+SkeletonReferenceCache+<GetOrBuildReferenceAsync>d__9.MoveNext()	0.0	0	26.3
Microsoft.CodeAnalysis.Workspaces!Microsoft.CodeAnalysis.SolutionState+SkeletonReferenceCache+<CreateSkeletonReferenceSetAsync>d__11.MoveNext()	0.0	0	26.3
Microsoft.CodeAnalysis.Workspaces!Microsoft.CodeAnalysis.SolutionState+SkeletonReferenceCache+<TryGetOrCreateReferenceSetAsync>d__10.MoveNext()	0.0	0	26.3
mscorlib!System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[Microsoft.CodeAnalysis.Diagnostics.AnalyzerActions].SetResult(!0)	0.0	3	25.6
mscorlib!System.Threading.Tasks.Task`1[Microsoft.CodeAnalysis.Diagnostics.AnalyzerActions].TrySetResult(!0)	0.0	3	25.6
Microsoft.CodeAnalysis!Microsoft.CodeAnalysis.Diagnostics.AnalyzerDriver+<GetAnalyzerActionsAsync>d__150.MoveNext()	0.0	14	25.6
Microsoft.CodeAnalysis!Microsoft.CodeAnalysis.Diagnostics.AnalyzerManager+<GetAnalyzerActionsAsync>d__12.MoveNext()	0.0	20	25.1
Microsoft.CodeAnalysis.Workspaces!Roslyn.Utilities.AsyncLazy`1[System.__Canon].GetValueAsync(value class System.Threading.CancellationToken)	0.0	1	24.9
Microsoft.CodeAnalysis.Workspaces!Roslyn.Utilities.AsyncLazy`1[System.__Canon].StartAsynchronousComputation(value class AsynchronousComputationToStart,class Request,value class System.Threading.CancellationToken)	0.0	1	24.9
Microsoft.CodeAnalysis.Workspaces!Microsoft.CodeAnalysis.SolutionState+CompilationTracker.GetCompilationAsync(class Microsoft.CodeAnalysis.SolutionState,value class System.Threading.CancellationToken)	0.0	1	24.2
Microsoft.CodeAnalysis.Workspaces!Microsoft.CodeAnalysis.SolutionState+CompilationTracker.GetCompilationSlowAsync(class Microsoft.CodeAnalysis.SolutionState,value class System.Threading.CancellationToken)	0.0	0	24.1
Microsoft.CodeAnalysis.Workspaces!SolutionState.GetMetadataReferenceAsync	0.0	0	23.0
Microsoft.CodeAnalysis.Workspaces!SolutionState.GetMetadataReferenceAsync	0.0	0	23.0
Microsoft.CodeAnalysis.Workspaces!Microsoft.CodeAnalysis.SolutionState+SkeletonReferenceCache.CreateSkeletonSet(class Microsoft.CodeAnalysis.Host.SolutionServices,class Microsoft.CodeAnalysis.Compilation,value class System.Threading.CancellationToken)	0.0	0	23.0
Microsoft.CodeAnalysis.Workspaces!Microsoft.CodeAnalysis.SolutionState+SkeletonReferenceCache+<>c__DisplayClass11_1.<CreateSkeletonReferenceSetAsync>b__1(value class System.Threading.CancellationToken)	0.0	0	23.0
Microsoft.CodeAnalysis.Workspaces!Microsoft.CodeAnalysis.SolutionState+SkeletonReferenceCache.TryCreateMetadataStorage(class Microsoft.CodeAnalysis.Host.SolutionServices,class Microsoft.CodeAnalysis.Compilation,value class System.Threading.CancellationToken)	0.0	0	23.0

InProc trace

~40% in AnalyzerDriver/CompilationWithAnalyzers, ~20% in CompilationTracker

Name	Exc %	Exc	Inc %
mscorlib!System.Threading.Tasks.Task`1[Microsoft.CodeAnalysis.Diagnostics.AnalyzerActions].TrySetResult(!0)	0.0	2	41.1
mscorlib!System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[Microsoft.CodeAnalysis.Diagnostics.AnalyzerActions].SetResult(!0)	0.0	1	41.1
Microsoft.CodeAnalysis!Microsoft.CodeAnalysis.Diagnostics.AnalyzerDriver+<GetAnalyzerActionsAsync>d__150.MoveNext()	0.0	25	40.7
Microsoft.CodeAnalysis!Microsoft.CodeAnalysis.Diagnostics.AnalyzerManager+<GetAnalyzerActionsAsync>d__12.MoveNext()	0.0	25	40.6
mscorlib.ni!System.Threading.Tasks.Task`1[System.Threading.Tasks.VoidTaskResult].TrySetResult(System.Threading.Tasks.VoidTaskResult)	0.0	0	34.5
mscorlib.ni!System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[System.Threading.Tasks.VoidTaskResult].SetResult(System.Threading.Tasks.VoidTaskResult)	0.0	0	30.5
mscorlib.ni!System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[System.Threading.Tasks.VoidTaskResult].SetResult(System.Threading.Tasks.Task`1)	0.0	0	30.5
Microsoft.CodeAnalysis!Microsoft.CodeAnalysis.Diagnostics.AnalyzerManager+<GetCompilationAnalysisScopeAsync>d__6.MoveNext()	0.0	8	27.0
Microsoft.CodeAnalysis!Microsoft.CodeAnalysis.Diagnostics.AnalyzerManager+<GetCompilationAnalysisScopeCoreAsync>d__7.MoveNext()	0.0	6	26.8
mscorlib.ni!System.Threading.Tasks.Task.Execute()	0.0	3	25.6
mscorlib.ni!System.Threading.Tasks.Task`1[System.__Canon].InnerInvoke()	0.0	1	25.1
Microsoft.CodeAnalysis!Microsoft.CodeAnalysis.Diagnostics.AnalyzerDriver+<>c__DisplayClass87_0+<<Initialize>b__0>d.MoveNext()	0.0	0	24.5
mscorlib!System.Threading.Tasks.Task`1[System.ValueTuple`2[Microsoft.CodeAnalysis.Diagnostics.AnalyzerActions,System.__Canon]].TrySetResult(!0)	0.0	0	23.8
mscorlib!System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[System.ValueTuple`2[Microsoft.CodeAnalysis.Diagnostics.AnalyzerActions,System.__Canon]].SetResult(!0)	0.0	0	23.8
Microsoft.CodeAnalysis!Microsoft.CodeAnalysis.Diagnostics.CompilationWithAnalyzers+<ComputeAnalyzerDiagnosticsAsync>d__59.MoveNext()	0.0	3	23.6

Name	Exc %	Exc	Inc %
Microsoft.CodeAnalysis.Workspaces!Microsoft.CodeAnalysis.SolutionState+CompilationTracker+<FinalizeCompilationAsync>d__38.MoveNext()	0.0	10	20.9
Microsoft.CodeAnalysis.Workspaces!Microsoft.CodeAnalysis.SolutionState+CompilationTracker+<GetOrBuildCompilationInfoAsync>d__30.MoveNext()	0.0	1	20.3
Microsoft.CodeAnalysis.Workspaces!Microsoft.CodeAnalysis.ProjectState+ProjectSyntaxTreeOptionsProvider.TryGetDiagnosticValue(class Microsoft.CodeAnalysis.SyntaxTree,class System.String,value class System.Threading.CancellationToken,value class Microsoft.CodeAnalysis.ReportDiagnostic&)	3.5	3,689	19.8
mscorlib!System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[System.Collections.Immutable.ImmutableArray`1[System.__Canon]].Start(!!0&)	0.0	2	19.8
Microsoft.CodeAnalysis.Workspaces!Microsoft.CodeAnalysis.SolutionState+CompilationTracker+<BuildCompilationInfoAsync>d__31.MoveNext()	0.0	0	19.7
mscorlib!System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[Microsoft.CodeAnalysis.SolutionState+CompilationTracker+CompilationInfo].Start(!!0&)	0.0	0	19.3
Microsoft.CodeAnalysis.Workspaces!Microsoft.CodeAnalysis.SolutionState+CompilationTracker.BuildCompilationInfoAsync(class Microsoft.CodeAnalysis.SolutionState,value class System.Threading.CancellationToken)	0.0	0	19.3
Microsoft.CodeAnalysis.Workspaces!Microsoft.CodeAnalysis.SolutionState+CompilationTracker.FinalizeCompilationAsync(class Microsoft.CodeAnalysis.SolutionState,class Microsoft.CodeAnalysis.Compilation,value class CompilationTrackerGeneratorInfo,class Microsoft.CodeAnalysis.Compilation,value class System.Threading.CancellationToken)	0.0	0	19.3
Microsoft.CodeAnalysis.Workspaces!Microsoft.CodeAnalysis.SolutionState+CompilationTracker.GetOrBuildCompilationInfoAsync(class Microsoft.CodeAnalysis.SolutionState,bool,value class System.Threading.CancellationToken)	0.0	0	19.2

Name	Exc %	Exc	Inc %
Microsoft.CodeAnalysis.Workspaces!Microsoft.CodeAnalysis.SolutionState+SkeletonReferenceCache+<CreateSkeletonReferenceSetAsync>d__11.MoveNext()	0.0	0	13.9
Microsoft.CodeAnalysis.Workspaces!Microsoft.CodeAnalysis.SolutionState+SkeletonReferenceCache+<TryGetOrCreateReferenceSetAsync>d__10.MoveNext()	0.0	0	13.9
Microsoft.CodeAnalysis.Workspaces!NamedTypeSymbolReferenceFinder.DetermineGlobalAliasesAsync	0.0	0	13.8
Microsoft.CodeAnalysis.Workspaces!SolutionState.GetMetadataReferenceAsync	0.0	0	13.7
Microsoft.CodeAnalysis.Workspaces!SolutionState.GetMetadataReferenceAsync	0.0	0	13.7
Microsoft.CodeAnalysis.Workspaces!Microsoft.CodeAnalysis.SolutionState+SkeletonReferenceCache+<>c__DisplayClass11_1.<CreateSkeletonReferenceSetAsync>b__1(value class System.Threading.CancellationToken)	0.0	0	13.6
Microsoft.CodeAnalysis.Workspaces!Microsoft.CodeAnalysis.SolutionState+SkeletonReferenceCache.GetOrBuildReferenceAsync(class ICompilationTracker,class Microsoft.CodeAnalysis.SolutionState,value class Microsoft.CodeAnalysis.MetadataReferenceProperties,value class System.Threading.CancellationToken)	0.0	0	13.6
Microsoft.CodeAnalysis.Workspaces!Microsoft.CodeAnalysis.SolutionState+SkeletonReferenceCache.CreateSkeletonSet(class Microsoft.CodeAnalysis.Host.SolutionServices,class Microsoft.CodeAnalysis.Compilation,value class System.Threading.CancellationToken)	0.0	0	13.6
Microsoft.CodeAnalysis.Workspaces!Microsoft.CodeAnalysis.SolutionState+SkeletonReferenceCache.TryCreateMetadataStorage(class Microsoft.CodeAnalysis.Host.SolutionServices,class Microsoft.CodeAnalysis.Compilation,value class System.Threading.CancellationToken)	0.0	0	13.6
Microsoft.CodeAnalysis!Compilation.Emit	0.0	0	13.6
Microsoft.CodeAnalysis!Compilation.Emit	0.0	0	13.6
Microsoft.CodeAnalysis.Workspaces!Microsoft.CodeAnalysis.SolutionState+CompilationTracker.GetCompilationAsync(class Microsoft.CodeAnalysis.SolutionState,value class System.Threading.CancellationToken)	0.0	0	13.6
Microsoft.CodeAnalysis.Workspaces!Microsoft.CodeAnalysis.SolutionState+CompilationTracker.GetCompilationSlowAsync(class Microsoft.CodeAnalysis.SolutionState,value class System.Threading.CancellationToken)	0.0	0	13.6
Microsoft.CodeAnalysis.Workspaces!Microsoft.CodeAnalysis.SolutionState+SkeletonReferenceCache.CreateSkeletonReferenceSetAsync(class ICompilationTracker,class Microsoft.CodeAnalysis.SolutionState,value class System.Threading.CancellationToken)	0.0	0	13.6
Microsoft.CodeAnalysis.Workspaces!Microsoft.CodeAnalysis.SolutionState+SkeletonReferenceCache.TryGetOrCreateReferenceSetAsync(class ICompilationTracker,class Microsoft.CodeAnalysis.SolutionState,value class Microsoft.CodeAnalysis.VersionStamp,value class System.Threading.CancellationToken)	0.0	0	13.6

CyrusNajmabadi · 2023-02-22T17:03:20Z

wrt skeletons. Though should only be being built once (unless the project they are built for is changed itself). Are you seeing more than that?

CyrusNajmabadi · 2023-02-22T17:08:32Z

@mavasani Your traces are interesting. When i open the _OutOfProc one i still see a ton of diagnostics work in process:

Taht seems bad.

Second, looking at OOP, i can see a ton of time is being spent because FAR-engine is running (potentially because of codelens or inheritance margin). This is going and doing work that forces source-generators to run (since it needs compilations):

And, as you said, it's also building skeletons, which we know are much much slower now due to SGs as well:

Thsi is the same issue i raised here #63891. But it looks like no traction has been made on it. @chsienki can the compiler do something here to make ref-emit fast with source generators. This is really killing IDE perf and has been doing so for many months now.

mavasani · 2023-02-23T04:26:16Z

wrt skeletons. Though should only be being built once (unless the project they are built for is changed itself). Are you seeing more than that?

I'll debug and try to get more data here.

mavasani · 2023-02-23T04:27:59Z

When i open the _OutOfProc one i still see a ton of diagnostics work in process:

That is coming from PreviewWorkspace diagnostic computation always running InProc. See the first bullet point in #66968 (comment). Note that even if I completely turn off PreviewWorkspace diagnostic computation, it doesn't seem to improve the lightbulb population time (this is as expected as the preview will be shown after quick actions have been populated and user selects one of the items). This still seems something that should be fixed, I filed #67014 to track this work.

mavasani · 2023-02-23T04:29:35Z

Second, looking at OOP, i can see a ton of time is being spent because FAR-engine is running (potentially because of codelens or inheritance margin). This is going and doing work that forces source-generators to run (since it needs compilations)

Wouldn't this need to run InProc when we turn off OOP? Why the slowness in OOP compared to InProc for building skeletons and compilations, running generators, etc.?

mavasani · 2023-02-23T12:40:47Z

Additionally, we discussed that currently the analyzer driver supports running only single file analysis or full compilation analysis, and we run analysis for these files sequentially. I am going to experiment enhancing the driver to support multiple file analysis and execute these concurrently and report back the comparisons.

@CyrusNajmabadi @arkalyanms I experimented a bit today to see if adding more concurrent execution for symbol start analyzers that need to analyze all the partial decl files helps us get the performance improvements that this current PR is giving, and it turns out not to be the case. Details below:

Below is the core method that does all the work for SymbolStart/End analysis, which includes identifying the set of partial decl trees to analyze, generating compilation events for these trees (via SemanticModel.GetDiagnostics() calls) and subsequently populating the given builder so the caller makes analyzer callbacks for these symbolStart analyzers:

roslyn/src/Compilers/Core/Portable/DiagnosticAnalyzer/CompilationWithAnalyzers.cs

Lines 829 to 915 in a6bccda

    
               static void processSymbolStartAnalyzers( 
        
                   SyntaxTree tree, 
        
                   ImmutableArray<CompilationEvent> compilationEventsForTree, 
        
                   ImmutableArray<DiagnosticAnalyzer> symbolStartAnalyzers, 
        
                   Compilation compilation, 
        
                   AnalysisResultBuilder analysisResultBuilder, 
        
                   ArrayBuilder<(AnalysisScope, ImmutableArray<CompilationEvent>)> builder, 
        
                   ImmutableArray<AdditionalText> additionalFiles, 
        
                   bool concurrentAnalysis, 
        
                   CancellationToken cancellationToken) 
        
               { 
        
                   // This method processes all the compilation events generated for the tree in the 
        
                   // original requested analysis scope to identify symbol declared events whose symbol 
        
                   // declarations span across different trees. For the given symbolStartAnalyzers to 
        
                   // report the correct set of diagnostics for the original tree/span, we need to 
        
                   // execute them on all the partial declarations of the symbols across these different trees, 
        
                   // followed by the SymbolEnd action at the end. 
        
                   // This method computes these set of trees with partial declarations, and adds 
        
                   // analysis scopes to the 'builder' for each of these trees, along with the corresponding 
        
                   // compilation events generated for each tree. 
        
                   var partialTrees = PooledHashSet<SyntaxTree>.GetInstance(); 
        
                   partialTrees.Add(tree); 
        
                   try 
        
                   { 
        
                       // Gather all trees with symbol declarations events in original analysis scope, except namespace symbols. 
        
                       foreach (var compilationEvent in compilationEventsForTree) 
        
                       { 
        
                           if (compilationEvent is SymbolDeclaredCompilationEvent symbolDeclaredEvent && 
        
                               symbolDeclaredEvent.Symbol.Kind != SymbolKind.Namespace) 
        
                           { 
        
                               foreach (var location in symbolDeclaredEvent.Symbol.Locations) 
        
                               { 
        
                                   if (location.SourceTree != null) 
        
                                   { 
        
                                       partialTrees.Add(location.SourceTree); 
        
                                   } 
        
                               } 
        
                           } 
        
                       } 
        
                       // Next, generate compilation events for each of the partial trees 
        
                       // and add the (analysisScope, compilationEvents) tuple for each tree to the builder. 
        
                       // Process all trees sequentially: this is required to ensure the appropriate 
        
                       // compilation events are mapped to the tree for which they are generated. 
        
                       foreach (var partialTree in partialTrees) 
        
                       { 
        
                           if (tryProcessTree(partialTree, out var analysisScopeAndEvents)) 
        
                           { 
        
                               builder.Add((analysisScopeAndEvents.Value.scope, analysisScopeAndEvents.Value.events)); 
        
                           } 
        
                       } 
        
                   } 
        
                   finally 
        
                   { 
        
                       partialTrees.Free(); 
        
                   } 
        
                   bool tryProcessTree(SyntaxTree partialTree, [NotNullWhen(true)] out (AnalysisScope scope, ImmutableArray<CompilationEvent> events)? scopeAndEvents) 
        
                   { 
        
                       scopeAndEvents = null; 
        
                       var file = new SourceOrAdditionalFile(partialTree); 
        
                       var analysisScope = new AnalysisScope(symbolStartAnalyzers, file, filterSpan: null, 
        
                           isSyntacticSingleFileAnalysis: false, concurrentAnalysis, categorizeDiagnostics: true); 
        
                       analysisScope = GetPendingAnalysisScope(analysisScope, analysisResultBuilder); 
        
                       if (analysisScope == null) 
        
                           return false; 
        
                       var compilationEvents = GetCompilationEventsForSingleFileAnalysis(compilation, analysisScope, additionalFiles, hasAnyActionsRequiringCompilationEvents: true, cancellationToken); 
        
                       // Include the already generated compilations events for the primary tree. 
        
                       if (partialTree == tree) 
        
                       { 
        
                           compilationEvents = compilationEventsForTree.AddRange(compilationEvents); 
        
                           // We shouldn't have any duplicate events. 
        
                           Debug.Assert(compilationEvents.Distinct().Length == compilationEvents.Length); 
        
                       } 
        
                       scopeAndEvents = (analysisScope, compilationEvents); 
        
                       return true; 
        
                   } 
        
               } 
        
           }

I tweaked this code so that the method returns upfront and becomes a no-op. So basically, no generation of compilation events and no analyzer execution for symbolStart analyzers. As expected, this gives us the same lightbulb performance as this PR does, but without the Add readonly modifier code fix.

Next, I reverted the above change and tweaked this method by changing the below foreach loop so that the body of the loop only contains compilation.GetSemanticModel(partialTree).GetDiagnostics(cancellationToken: cancellationToken); instead of tryProcessTree call followed by builder.Add. So basically, we are sequentially generating compilation events for all partial trees, but discarding these events so no analyzer callbacks are made for symbolStart analyzers. This pushes the lightbulb time back up to very close to the original code, confirming that the actual analyzer execution is not expensive here. The primary cost is coming from SemanticModel.GetDiagnostics() calls to generate compilation events.
Original code

roslyn/src/Compilers/Core/Portable/DiagnosticAnalyzer/CompilationWithAnalyzers.cs

Lines 871 to 880 in a6bccda

    
           // Next, generate compilation events for each of the partial trees 
        
           // and add the (analysisScope, compilationEvents) tuple for each tree to the builder. 
        
           // Process all trees sequentially: this is required to ensure the appropriate 
        
           // compilation events are mapped to the tree for which they are generated. 
        
           foreach (var partialTree in partialTrees) 
        
           { 
        
               if (tryProcessTree(partialTree, out var analysisScopeAndEvents)) 
        
               { 
        
                   builder.Add((analysisScopeAndEvents.Value.scope, analysisScopeAndEvents.Value.events)); 
        
               }

Changed code

                 foreach (var partialTree in partialTrees)
                 {
                     compilation.GetSemanticModel(partialTree).GetDiagnostics(cancellationToken: cancellationToken);
                 }

Next, I replaced the above foreach loop with the below code to do this concurrently for all partial trees. This did not seem to give any significant performance boost over the prior sequential event generation.

                 await Task.WhenAll(partialTrees.Select(partialTree =>
                     Task.Run(() =>
                     {
                         compilation.GetSemanticModel(partialTree).GetDiagnostics(cancellationToken: cancellationToken);
                     }, cancellationToken))).ConfigureAwait(false);

Summary

I don't believe there is much we can do here for improving the performance for SymbolStart analyzers, as by design they analyze a potentially bigger analysis scope and hence will be slower for types with partial declarations.

The only other possible approach to improve the performance here (which could benefit all analyzer execution, not just the symbol start ones), is to move away from invoking SemanticModel.GetDiagnostics() to generate compilation events to drive analysis. Instead we can scan the file, look for nodes that declare symbols, synthesize SymbolDeclared events for these, and populate a synthesized event queue and hand it to the driver for analysis. I am going to play around with trying this approach tomorrow, though I am not sure how complex change this would be. I'll share an update early next week on this experiment.

CyrusNajmabadi · 2023-02-23T13:43:46Z

Next, I replaced the above foreach loop with the below code to do this concurrently for all partial trees. This did not seem to give any significant performance boost over the prior sequential event generation

Note: this should be tested on .net core. I've seen a lot of cases where parallel is unhelpful on standard but great on core.

I would also want to understand why this doesn't help. It's something blocking the scaling improvement we would expect here?

For example (though doubtful), if GetDiagnostics immediately took a big fat lock, we would get no benefit being parallel. It would be good to understand this better.

CyrusNajmabadi · 2023-02-23T13:45:23Z

Instead we can scan the file, look for nodes that declare symbols, synthesize SymbolDeclared events for these, and populate a synthesized event queue and hand it to the driver for analysis.

This seems very intriguing. I like the idea that we are doing focused driving vs just calling into a general method that may do far to much work in an inefficient fashion.

CyrusNajmabadi · 2023-02-23T13:47:31Z

@genlu should be able to help you test .net core for oop.

It would be also good to test server gc for oop.

mavasani · 2023-02-24T11:22:03Z

Instead we can scan the file, look for nodes that declare symbols, synthesize SymbolDeclared events for these, and populate a synthesized event queue and hand it to the driver for analysis.

This seems very intriguing. I like the idea that we are doing focused driving vs just calling into a general method that may do far to much work in an inefficient fashion.

I tried this experiment with mavasani@afcf909 and it doesn't seem to help much, there is only a slight improvement. Collected trace still shows majority of work happening in CompilationTracker. I'll continue investigating why this difference between InProc and OutOfProc.

IMO, this PR is giving a noticeable improvement for populating initial set of quick actions for both InProc and OutOfProc for types with partial declarations. I searched for CAxxxx and IDExxxx analyzers in roslyn-analyzers repo and roslyn repo and only the following 3 SymbolStart analyzers have code fixes:

IDE0044: Add readonly modifier
IDE0059: Remove unnecessary value assignment (this is also a flow-analysis based analyzer, so is known to be relatively expensive).
CA1822: Mark members as static.

I feel we should take this PR regardless of other performance investigations and improvements in this space. I'll let @arkalyanms make a call here.

mavasani · 2023-02-24T11:23:32Z

Next, I replaced the above foreach loop with the below code to do this concurrently for all partial trees. This did not seem to give any significant performance boost over the prior sequential event generation

Note: this should be tested on .net core. I've seen a lot of cases where parallel is unhelpful on standard but great on core.

I would also want to understand why this doesn't help. It's something blocking the scaling improvement we would expect here?

For example (though doubtful), if GetDiagnostics immediately took a big fat lock, we would get no benefit being parallel. It would be good to understand this better.

I can give this a try, but the traces are clearly indicating the majority of overhead is coming from CompilationTracker and building Skeleton assemblies in OOP. I'll first try and dig into this before trying other optimizations.

Simplify

mavasani · 2023-04-03T07:23:58Z

@CyrusNajmabadi unit test added with 09544dd

src/EditorFeatures/Core.Wpf/Suggestions/SuggestedActionPriorityProvider.cs

src/EditorFeatures/Test/CodeFixes/CodeFixServiceTests.cs

src/Features/Core/Portable/CodeFixesAndRefactorings/CodeActionRequestPriorityProvider.cs

src/Features/LanguageServer/Protocol/Features/Options/DiagnosticOptionsStorage.cs

src/VisualStudio/Core/Def/Options/VisualStudioOptionStorage.cs

mavasani added the Area-IDE label Feb 21, 2023

mavasani requested review from a team as code owners February 21, 2023 07:28

mavasani requested review from CyrusNajmabadi, sharwell, genlu, arunchndr and akhera99 February 21, 2023 07:28

This comment was marked as outdated.

Sign in to view

mavasani mentioned this pull request Feb 23, 2023

Move PreviewWorkspace diagnostic computation to OOP #67014

Closed

CyrusNajmabadi and others added 2 commits April 2, 2023 20:40

Example of how i would prefer priority providers to work

25fd9e7

Simplify

Add unit test for analyzer de-prioritization logic

09544dd

mavasani added 2 commits April 3, 2023 00:25

Rename test

e0f396b

Add detailed comment about the test

86f0672

build-analysis bot mentioned this pull request Apr 3, 2023

NuGet failing with Response status code does not indicate success: 503 (Service Unavailable) dotnet/arcade#11723

Open

5 tasks