-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Lightbulb Perf] Async lightbulb performance improvement #66970
Conversation
Addresses part of dotnet#66968 This PR aims to improve the async lightbulb performance by moving the relatively expensive SymbolStart/SymbolEnd action analyzers to the `Low` priority bucket. See the 2nd observation in the Summary section of dotnet#66968 (comment) for more details. > The overhead coming from SymbolStart/End analyzers needing to run on all partial types seems to add a significant overhead. So I played around with moving all the SymbolStart/End analyzers and corresponding code fixes to a separate CodeActionPriorityRequest.Low bucket for async lightbulb, and this gives significant performance improvements as we populate all the code fixes from non-SymbolStart/End analyzers and code refactorings in about half a progress bar cycle for OOP, and almost instantaneously for InProc. The async lightbulb does take the required additional one cycle to then populate the fixes for SymbolStart/End analyzers and the Suppress/Configure actions, but that seems completely reasonable to me. We only have a handful of SymbolStart/End analyzers and it seems reasonable to me to move them below to improve the overall lightbulb performance.
This comment was marked as outdated.
This comment was marked as outdated.
I'm also hesitant, mostly because I don't understand why symbol start/end (and partial types) are causing such a problem. This demonstrates that an issue does exist... But I'm this change feels more like a workaround than a fix at the best level. Do we understand why this sort of analysis is so costly? Could it be because the compiler doesn't know which files to end up analyzing, so it analyzes them all? |
This is by design and nature of SymbolStart/End analyzers. For such analyzer’s diagnostics to be computed for any given line, the analyzer needs to be executed on all symbols/nodes/operations within the containing type symbol of that line, including all partial declarations. So, we basically linearly increase the analyzer execution time with more partial declarations. This is not true for rest of the analyzers as they can compute diagnostics by analyzing just the specific line or entire file at max. These analyzers are expensive by default based on their desired analysis scope, there is nothing we can do to reduce its analysis scope here. Just like we can do nothing to reduce the analysis scope of the most expensive analyzers, which are CompilationEnd analyzers, and hence we don’t support code fixes for CompilationEnd analyzer diagnostics. IMO moving the expensive lightbulb quick actions, which can only provide quick actions for a very small percentage of lightbulb invocations, down the list doesn’t degrade the experience especially given the big performance gain here for populating rest of the quick actions. |
Sorry, i'm not being clear. I'm not asking if we can avoid analyzing those other parts. What i'm asking is: Is it possible that while we should only be analyzing the other parts, that analysis is not being done smartly? for example, the compiler is analyzing unnecessary files? Or, for example, the compiler keeps creating/throwing-away the semantic models for those other parts? I only bring that up because it's the type of issue that has caused bad performance in other features in the past when dealing with partials, and i want to distinguish essential cost of doing this work versus wasteful costs that aren't needed. |
I have viewed the after gif about 20 times now and I can pretty much guarantee it will be the highlight of my week. This is great! |
Any telemetry additions that would be needed to minimize risk with taking this change and help with future root causing? |
Are we taking this change while continue investigating how to improve SymbolStart/End analysis in compiler? |
Discussed offline with Cyrus. We are not doing any unnecessary work here, it is just the design of SymbolStart/End analyzers that they need to execute on all partial definitions as well as nested type definitions before SymbolEnd is executed to report diagnostics. The repro case above actually has a containing type with 20 partial definitions, so that is leading to such a large overhead for SymbolStart/End analyzers. If there were no partial definitions, then it would lead to no overhead, but on average we would expect at least some overhead on types with partials. I am going to validate the performance overhead for types with single declaration and also share an update. Additionally, we discussed that currently the analyzer driver supports running only single file analysis or full compilation analysis, and we run analysis for these files sequentially. I am going to experiment enhancing the driver to support multiple file analysis and execute these concurrently and report back the comparisons. We will try to get more performance data here and try some more approaches and see if the perf benefit can be gained in way that there is not user experience change. |
Update
OOP Trace~38% in CompilationTracker and <20% in AnalyzerDriver/CompilationWithAnalyzers
InProc trace~40% in AnalyzerDriver/CompilationWithAnalyzers, ~20% in CompilationTracker
|
wrt skeletons. Though should only be being built once (unless the project they are built for is changed itself). Are you seeing more than that? |
@mavasani Your traces are interesting. When i open the _OutOfProc one i still see a ton of diagnostics work in process: Taht seems bad. Second, looking at OOP, i can see a ton of time is being spent because FAR-engine is running (potentially because of codelens or inheritance margin). This is going and doing work that forces source-generators to run (since it needs compilations): And, as you said, it's also building skeletons, which we know are much much slower now due to SGs as well: Thsi is the same issue i raised here #63891. But it looks like no traction has been made on it. @chsienki can the compiler do something here to make ref-emit fast with source generators. This is really killing IDE perf and has been doing so for many months now. |
I'll debug and try to get more data here. |
That is coming from PreviewWorkspace diagnostic computation always running InProc. See the first bullet point in #66968 (comment). Note that even if I completely turn off PreviewWorkspace diagnostic computation, it doesn't seem to improve the lightbulb population time (this is as expected as the preview will be shown after quick actions have been populated and user selects one of the items). This still seems something that should be fixed, I filed #67014 to track this work. |
Wouldn't this need to run InProc when we turn off OOP? Why the slowness in OOP compared to InProc for building skeletons and compilations, running generators, etc.? |
@CyrusNajmabadi @arkalyanms I experimented a bit today to see if adding more concurrent execution for symbol start analyzers that need to analyze all the partial decl files helps us get the performance improvements that this current PR is giving, and it turns out not to be the case. Details below:
SummaryI don't believe there is much we can do here for improving the performance for SymbolStart analyzers, as by design they analyze a potentially bigger analysis scope and hence will be slower for types with partial declarations. The only other possible approach to improve the performance here (which could benefit all analyzer execution, not just the symbol start ones), is to move away from invoking |
Note: this should be tested on .net core. I've seen a lot of cases where parallel is unhelpful on standard but great on core. I would also want to understand why this doesn't help. It's something blocking the scaling improvement we would expect here? For example (though doubtful), if GetDiagnostics immediately took a big fat lock, we would get no benefit being parallel. It would be good to understand this better. |
This seems very intriguing. I like the idea that we are doing focused driving vs just calling into a general method that may do far to much work in an inefficient fashion. |
@genlu should be able to help you test .net core for oop. It would be also good to test server gc for oop. |
I tried this experiment with mavasani@afcf909 and it doesn't seem to help much, there is only a slight improvement. Collected trace still shows majority of work happening in CompilationTracker. I'll continue investigating why this difference between InProc and OutOfProc. IMO, this PR is giving a noticeable improvement for populating initial set of quick actions for both InProc and OutOfProc for types with partial declarations. I searched for CAxxxx and IDExxxx analyzers in roslyn-analyzers repo and roslyn repo and only the following 3 SymbolStart analyzers have code fixes:
I feel we should take this PR regardless of other performance investigations and improvements in this space. I'll let @arkalyanms make a call here. |
I can give this a try, but the traces are clearly indicating the majority of overhead is coming from CompilationTracker and building Skeleton assemblies in OOP. I'll first try and dig into this before trying other optimizations. |
@CyrusNajmabadi unit test added with 09544dd |
Final piece of planned work for #66968
Closes #66968
This PR aims to improve the async lightbulb performance by moving the relatively expensive SymbolStart/SymbolEnd action and SemanticModel analyzers to the
Low
priority bucket. We ensure this happens in extremely rare cases by performing following additional checks:LightbulbSkipExecutingDeprioritizedAnalyzers = false
we de-prioritize the analyzer down to low-pri bucket. IfLightbulbSkipExecutingDeprioritizedAnalyzers = true
, then we completely drop this analyzer. This case should be extremely rare now so as to not have any user noticeable impact.PR has following core implementation pieces:
Add support for 'Low' priority bucket for async lightbulb.Done with Simplify logic for lightbulb priority classes #67554CodeActionRequestPriorityProvider
that exposes the current request priority and also tracks analyzers that are de-prioritized toCodeActionRequestPriority.Low
bucketGetDiagnosticsForSpanAsync
API to de-prioritize analyzers based on the above conditions.Performance measurements on large file
Main branch
1-2 progress bar cycles
PR branch
0.5-1 progress bar cycle