[release/9.0] Reduce funceval abort #108256
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Backport of #108220 to release/9.0
/cc @noahfalk
Customer Impact
When debugging with Visual Studio (or another debugger that uses func-evals), the debugger may fail to complete some evals because the debuggee code deadlocks. Unless the user chooses to slip all threads they will be unable to see the eval results in the debugger. This issue is timing dependent so statistically some fraction of eval attempts fail without being able to easily predict which ones.
Historically this chance of failure has always been part of func-eval design, but we try to keep the success rate reasonably high by investigating and resolving any deadlocks that are encountered frequently in testing.
Regression
The deadlock being fixed here has always been possible but the frequency of hitting it has increased in .NET 9. We don't know what caused the increase in frequency - it could be many different factors since the underlying issue requires holding a specific lock during a race window.
Testing
I verified in a debugger that the locks are now taken in the order we desire, preventing a deadlock waiting on this particular lock. Tom re-ran the Visual Studio tests and confirmed this issue no longer occurred. He did encounter another issue which we believe is pre-existing but he is investigating further to confirm that.
Risk
Low. The change touches very little code and swaps the order we take two locks. The ReadyToRun lock where the ordering is being changed is not acquired in any other code path so there is no opportunity for a 2nd path to use a different ordering and create a deadlock.