Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix FP state restore on macOS exception forwarding #109458

Conversation

janvorli
Copy link
Member

@janvorli janvorli commented Nov 1, 2024

The change that enabled AVX512 support in the past has introduced a subtle issue in restoring context for forwarding hardware exceptions that occur in 3rd party non-managed code. In that case, the restored floating point state is garbled.

The problem is due to the fact that we pass a x86_avx512_state context to the thread_set_state. That context contains a header field describing the format of the context (it can be AVX, AVX512, 32 or 64 bit, ...). That is then followed by the actual context structure. This style of context is identified e.g. by x86_AVX_STATE flavor. The header field contains the specific flavor, which would be x86_AVX_STATE64 or x86_AVX512_STATE64. The thread_set_state uses the flavor to detect whether the context passed to it is this combined one or just x86_AVX_STATE64 or x86_AVX512_STATE64 which doesn't have the header field.
The issue was that while we were passing in the combined context, we were passing in the flavor extracted from its header. So the thread_set_state used the header as part of the context. That resulted e.g. in xmm registers contents being shifted by 8 bytes, thus garbling the state.

The change that enabled AVX512 support in the past has introduced a subtle
issue in restoring context for forwarding hardware exceptions that occur
in 3rd party non-managed code. In that case, the restored floating point
state is garbled.

The problem is due to the fact that we pass a x86_avx512_state context to
the thread_set_state. That context contains a header field describing the
format of the context (it can be AVX, AVX512, 32 or 64 bit, ...). That is
then followed by the actual context structure. This style of context is
identified e.g. by x86_AVX_STATE flavor. The header field contains the
specific flavor, which would be x86_AVX_STATE64 or x86_AVX512_STATE64.
The thread_set_state uses the flavor to detect whether the context passed
to it is this combined one or just x86_AVX_STATE64 or x86_AVX512_STATE64
which doesn't have the header field.
The issue was that while we were passing in the combined context, we were
passing in the flavor extracted from its header. So the thread_set_state
used the header as part of the context. That resulted e.g. in xmm register
contents being shifted by 8 bytes, thus garbling the state.
@janvorli janvorli added this to the 8.0.x milestone Nov 1, 2024
@janvorli janvorli requested a review from VSadov November 1, 2024 17:54
@janvorli janvorli self-assigned this Nov 1, 2024
@VSadov
Copy link
Member

VSadov commented Nov 1, 2024

Is there a way to write a test for this? - something that would certainly crash or produce a wrong result without the fix.

@janvorli
Copy link
Member Author

janvorli commented Nov 1, 2024

I can think of a handwritten assembly that would set xmm registers to certain values and then have a loop that would induce a sigsegv and then check the register values and keep looping several times. It would install a sigsegv handler that would move the context past the instruction that triggered the sigsegv or update a register in the context to be able to repeat the instruction this time without sigsegv and return. That would be able to detect this problem.

@janvorli
Copy link
Member Author

janvorli commented Nov 1, 2024

FYI: @jonpryor - this is the extra fix needed on top of the backport to completely fix the problem you were hitting.

@VSadov
Copy link
Member

VSadov commented Nov 1, 2024

I can think of a handwritten assembly that would set xmm registers to certain values and then have a loop that would induce a sigsegv and then check the register values and keep looping several times. It would install a sigsegv handler that would move the context past the instruction that triggered the sigsegv or update a register in the context to be able to repeat the instruction this time without sigsegv and return. That would be able to detect this problem.

Sounds very complicated for a macOS+x64 specific bug. Perhaps not worth it.

(I kind of hoped it could be as simple as throwing an exception in the presence of some vectorized code and see then observing in the catch that some locals have wrong values..)

@VSadov
Copy link
Member

VSadov commented Nov 2, 2024

Documentation on the OS APIs used seem fairly scarce, but the change makes sense, just from looking at the shape of FloatState

Copy link
Member

@VSadov VSadov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@janvorli
Copy link
Member Author

janvorli commented Nov 2, 2024

I kind of hoped it could be as simple as throwing an exception in the presence of some vectorized code and see then observing in the catch that some locals have wrong values

That might work, but it would really depend on how the jit allocated the locals, if it kept them in registers etc, so it would not be a reliable test I guess.
Nevertheless, I actually went the way of that specialized asm test, I have an OSX / Linux x64 version working and will add arm64 ones too. Might be useful to catch possible future EH related issues around float state restoration on other targets too. I'll probably get the test in separately from this PR to get it in faster, this change needs to be backported to .NET 8 as there are internal customers hitting this.

@janvorli janvorli closed this Nov 4, 2024
@janvorli janvorli reopened this Nov 4, 2024
@janvorli janvorli closed this Nov 5, 2024
@janvorli janvorli reopened this Nov 5, 2024
@janvorli janvorli merged commit 44115f2 into dotnet:main Nov 5, 2024
90 checks passed
janvorli added a commit to janvorli/runtime that referenced this pull request Nov 6, 2024
@janvorli
Copy link
Member Author

/backport to release/9.0

Copy link
Contributor

Started backporting to release/9.0: https://github.com/dotnet/runtime/actions/runs/12018281726

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants