-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[release/9.0] Backport "JIT: Run single EH region repair pass after layout" #108715
[release/9.0] Backport "JIT: Run single EH region repair pass after layout" #108715
Conversation
@AndyAyersMS PTAL, thanks! |
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. we will take for consideration in 9 GA
@amanasifkhalid can you take a look at the pr failures? |
can we include the test from #108608? |
@jeffschwMSFT sure, the only failure is in |
Sure, I can open a follow-up PR for that. Do we want it in release/9.0's CI, too? |
@jeffschwMSFT looks like the failure cleared up |
Backport of #108634 to release/9.0
Customer Impact
CoreCLR's exception model requires that EH regions (try blocks, catch blocks, etc.) are contiguous in memory. Thus, the JIT keeps track of each EH region at the basic block level by maintaining a table of pointers to each region's first and last block -- this information is eventually reported to the VM. When reordering blocks to optimize code layout, EH regions will remain contiguous, but the JIT needs to ensure each region's last block pointer is updated. This is usually trivial, though nested EH regions can complicate this bookkeeping. Previously, we would walk the block list looking for new EH region ends, and then propagate the updated information to the EH table starting with most-nested regions, and "bubbling up" the end of the nested region if it's at the end of its parent region. This works well, unless a nested region at the end of a parent region is immediately preceded by another sibling region; we recently discovered that the JIT fails to determine the current nested region is at the end of its parent in such cases. This means the JIT incorrectly reports the nested relationship between these EH regions, which could break stack walking if the method throws.
The new implementation walks the block list in reverse, which vastly simplifies identifying and propagating EH region ends: By iterating backwards, there's no need to determine/guess if the nested EH region is at the end of its parent region, because reaching it before its parent guarantees it concludes the parent region.
Regression
The flawed EH fixup logic was introduced in .NET 9.
Testing
This flaw was revealed by a test generated by one of our fuzzing tools. The fixed JIT logic now reports EH regions correctly for this case, and it has zero diffs with the previous implementation in EH information reported across our SuperPMI collections, which suggests we were getting most cases correct already. Outerloop tests and fuzzing pipelines did not reveal any flaws with the new implementation.
Risk
The total LOC changed suggests some risk, though most of this churn is from deleting the previous implementation, which was quite a bit more verbose. Attempts to surgically fix this issue with the old implementation incurred codegen diffs, since some block reordering logic uses temporary EH region ends as insertion points for other blocks. Churning codegen at this point isn't ideal, and attempts to tweak the old EH fixup logic revealed it to be quite fragile. The new implementation has no codegen diffs, only seems to have diffs in EH region info reported to the VM when the old strategy was getting it wrong, and is overall much easier (in my opinion) to understand. Thus, I consider this fix low-risk relative to any solution that keeps the old implementation around.