[release/9.0] Backport "JIT: Run single EH region repair pass after layout" #108715

amanasifkhalid · 2024-10-09T15:26:02Z

Backport of #108634 to release/9.0

Customer Impact

Customer reported
Found internally

CoreCLR's exception model requires that EH regions (try blocks, catch blocks, etc.) are contiguous in memory. Thus, the JIT keeps track of each EH region at the basic block level by maintaining a table of pointers to each region's first and last block -- this information is eventually reported to the VM. When reordering blocks to optimize code layout, EH regions will remain contiguous, but the JIT needs to ensure each region's last block pointer is updated. This is usually trivial, though nested EH regions can complicate this bookkeeping. Previously, we would walk the block list looking for new EH region ends, and then propagate the updated information to the EH table starting with most-nested regions, and "bubbling up" the end of the nested region if it's at the end of its parent region. This works well, unless a nested region at the end of a parent region is immediately preceded by another sibling region; we recently discovered that the JIT fails to determine the current nested region is at the end of its parent in such cases. This means the JIT incorrectly reports the nested relationship between these EH regions, which could break stack walking if the method throws.

The new implementation walks the block list in reverse, which vastly simplifies identifying and propagating EH region ends: By iterating backwards, there's no need to determine/guess if the nested EH region is at the end of its parent region, because reaching it before its parent guarantees it concludes the parent region.

Regression

Yes
No

The flawed EH fixup logic was introduced in .NET 9.

Testing

This flaw was revealed by a test generated by one of our fuzzing tools. The fixed JIT logic now reports EH regions correctly for this case, and it has zero diffs with the previous implementation in EH information reported across our SuperPMI collections, which suggests we were getting most cases correct already. Outerloop tests and fuzzing pipelines did not reveal any flaws with the new implementation.

Risk

The total LOC changed suggests some risk, though most of this churn is from deleting the previous implementation, which was quite a bit more verbose. Attempts to surgically fix this issue with the old implementation incurred codegen diffs, since some block reordering logic uses temporary EH region ends as insertion points for other blocks. Churning codegen at this point isn't ideal, and attempts to tweak the old EH fixup logic revealed it to be quite fragile. The new implementation has no codegen diffs, only seems to have diffs in EH region info reported to the VM when the old strategy was getting it wrong, and is overall much easier (in my opinion) to understand. Thus, I consider this fix low-risk relative to any solution that keeps the old implementation around.

amanasifkhalid · 2024-10-09T15:26:10Z

@AndyAyersMS PTAL, thanks!

dotnet-policy-service · 2024-10-09T15:26:54Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

jeffschwMSFT

lgtm. we will take for consideration in 9 GA

jeffschwMSFT · 2024-10-10T17:19:37Z

@amanasifkhalid can you take a look at the pr failures?

kunalspathak · 2024-10-10T17:54:58Z

can we include the test from #108608?

amanasifkhalid · 2024-10-10T17:58:22Z

@jeffschwMSFT sure, the only failure is in System.Security.Cryptography.Tests, which has historically been flaky. I'll rerun this leg and see if it reproes.

amanasifkhalid · 2024-10-10T18:55:15Z

can we include the test from #108608?

Sure, I can open a follow-up PR for that. Do we want it in release/9.0's CI, too?

amanasifkhalid · 2024-10-10T19:08:08Z

@jeffschwMSFT looks like the failure cleared up

amanasifkhalid added 4 commits October 9, 2024 11:01

Add EH region ends pass

27d8ca1

Remove old EH fixup logic

85646f5

Style

1bb52ec

Use null EH clause pointers as set indicators

d254188

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Oct 9, 2024

dotnet-policy-service bot assigned amanasifkhalid Oct 9, 2024

AndyAyersMS approved these changes Oct 9, 2024

View reviewed changes

jeffschwMSFT approved these changes Oct 9, 2024

View reviewed changes

jeffschwMSFT added the Servicing-consider Issue for next servicing release review label Oct 9, 2024

jeffschwMSFT added this to the 9.0.0 milestone Oct 9, 2024

build-analysis bot mentioned this pull request Oct 9, 2024

The Operation will be canceled. The next steps may not contain expected logs. dotnet/dnceng#3008

Open

3 tasks

jeffschwMSFT added Servicing-approved Approved for servicing release and removed Servicing-consider Issue for next servicing release review labels Oct 10, 2024

Merge branch 'release/9.0' into backport-108634

2ef148c

jeffschwMSFT merged commit 8ea49ab into dotnet:release/9.0 Oct 10, 2024
10 checks passed

This was referenced Oct 10, 2024

Checkout failure: "Git fetch failed with exit code 128" dotnet/arcade#9009

Open

Unable to checkout dotnet/dnceng#4115

Open

github-actions bot locked and limited conversation to collaborators Nov 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[release/9.0] Backport "JIT: Run single EH region repair pass after layout" #108715

[release/9.0] Backport "JIT: Run single EH region repair pass after layout" #108715

amanasifkhalid commented Oct 9, 2024 •

edited

Loading

amanasifkhalid commented Oct 9, 2024

dotnet-policy-service bot commented Oct 9, 2024

jeffschwMSFT left a comment

jeffschwMSFT commented Oct 10, 2024

kunalspathak commented Oct 10, 2024

amanasifkhalid commented Oct 10, 2024

amanasifkhalid commented Oct 10, 2024

amanasifkhalid commented Oct 10, 2024

[release/9.0] Backport "JIT: Run single EH region repair pass after layout" #108715

[release/9.0] Backport "JIT: Run single EH region repair pass after layout" #108715

Conversation

amanasifkhalid commented Oct 9, 2024 • edited Loading

Customer Impact

Regression

Testing

Risk

amanasifkhalid commented Oct 9, 2024

dotnet-policy-service bot commented Oct 9, 2024

jeffschwMSFT left a comment

Choose a reason for hiding this comment

jeffschwMSFT commented Oct 10, 2024

kunalspathak commented Oct 10, 2024

amanasifkhalid commented Oct 10, 2024

amanasifkhalid commented Oct 10, 2024

amanasifkhalid commented Oct 10, 2024

amanasifkhalid commented Oct 9, 2024 •

edited

Loading