Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[release/9.0] Backport "JIT: Run single EH region repair pass after layout" #108715

Merged
merged 5 commits into from
Oct 10, 2024

Conversation

amanasifkhalid
Copy link
Member

@amanasifkhalid amanasifkhalid commented Oct 9, 2024

Backport of #108634 to release/9.0

Customer Impact

  • Customer reported
  • Found internally

CoreCLR's exception model requires that EH regions (try blocks, catch blocks, etc.) are contiguous in memory. Thus, the JIT keeps track of each EH region at the basic block level by maintaining a table of pointers to each region's first and last block -- this information is eventually reported to the VM. When reordering blocks to optimize code layout, EH regions will remain contiguous, but the JIT needs to ensure each region's last block pointer is updated. This is usually trivial, though nested EH regions can complicate this bookkeeping. Previously, we would walk the block list looking for new EH region ends, and then propagate the updated information to the EH table starting with most-nested regions, and "bubbling up" the end of the nested region if it's at the end of its parent region. This works well, unless a nested region at the end of a parent region is immediately preceded by another sibling region; we recently discovered that the JIT fails to determine the current nested region is at the end of its parent in such cases. This means the JIT incorrectly reports the nested relationship between these EH regions, which could break stack walking if the method throws.

The new implementation walks the block list in reverse, which vastly simplifies identifying and propagating EH region ends: By iterating backwards, there's no need to determine/guess if the nested EH region is at the end of its parent region, because reaching it before its parent guarantees it concludes the parent region.

Regression

  • Yes
  • No

The flawed EH fixup logic was introduced in .NET 9.

Testing

This flaw was revealed by a test generated by one of our fuzzing tools. The fixed JIT logic now reports EH regions correctly for this case, and it has zero diffs with the previous implementation in EH information reported across our SuperPMI collections, which suggests we were getting most cases correct already. Outerloop tests and fuzzing pipelines did not reveal any flaws with the new implementation.

Risk

The total LOC changed suggests some risk, though most of this churn is from deleting the previous implementation, which was quite a bit more verbose. Attempts to surgically fix this issue with the old implementation incurred codegen diffs, since some block reordering logic uses temporary EH region ends as insertion points for other blocks. Churning codegen at this point isn't ideal, and attempts to tweak the old EH fixup logic revealed it to be quite fragile. The new implementation has no codegen diffs, only seems to have diffs in EH region info reported to the VM when the old strategy was getting it wrong, and is overall much easier (in my opinion) to understand. Thus, I consider this fix low-risk relative to any solution that keeps the old implementation around.

@amanasifkhalid
Copy link
Member Author

@AndyAyersMS PTAL, thanks!

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Oct 9, 2024
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Copy link
Member

@jeffschwMSFT jeffschwMSFT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. we will take for consideration in 9 GA

@jeffschwMSFT jeffschwMSFT added the Servicing-consider Issue for next servicing release review label Oct 9, 2024
@jeffschwMSFT jeffschwMSFT added this to the 9.0.0 milestone Oct 9, 2024
@jeffschwMSFT jeffschwMSFT added Servicing-approved Approved for servicing release and removed Servicing-consider Issue for next servicing release review labels Oct 10, 2024
@jeffschwMSFT
Copy link
Member

@amanasifkhalid can you take a look at the pr failures?

@kunalspathak
Copy link
Member

can we include the test from #108608?

@amanasifkhalid
Copy link
Member Author

@jeffschwMSFT sure, the only failure is in System.Security.Cryptography.Tests, which has historically been flaky. I'll rerun this leg and see if it reproes.

@amanasifkhalid
Copy link
Member Author

can we include the test from #108608?

Sure, I can open a follow-up PR for that. Do we want it in release/9.0's CI, too?

@amanasifkhalid
Copy link
Member Author

@jeffschwMSFT looks like the failure cleared up

@jeffschwMSFT jeffschwMSFT merged commit 8ea49ab into dotnet:release/9.0 Oct 10, 2024
10 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Nov 10, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI Servicing-approved Approved for servicing release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants