-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Re-add Stack Overflow handling in NativeAOT with larger alternate stack #95808
Conversation
Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas Issue DetailsWant to test this in CI with a larger alternate stack
|
/azp run runtime-nativeaot-outerloop |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run runtime-nativeaot-outerloop |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run runtime-nativeaot-outerloop |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run runtime-nativeaot-outerloop |
Azure Pipelines successfully started running 1 pipeline(s). |
@MichalStrehovsky @jkotas @janvorli @VSadov Could you take a look when you have a chance? |
I think that we cannot just run the hardware exception handler on the alternate stack like we do now. In coreclr, we switch off the alternate stack as soon as we figure out it is not a stack overflow. Here we don't, so if the catch handler ends up calling a deep call chain, we would get stack overflow. |
Draft Pull Request was automatically closed for 30 days of inactivity. Please let us know if you'd like to reopen it. |
It looks like the issue with the previous handling of stack overflow was that the alternate stack was too small. The guard page was included in the size of the stack when it should have been added to the size of the stack. This allocates an additional page for the alternate stack and seems to fix the issue for me.
I wasn't able to reproduce the issues in local builds, but I downloaded the exes from the failing CI tests and they segfaulted every 10 runs or so. After editing the binary to allocate more for the signal stack, I didn't see any more segfaults.
It looked like the issue was that we would hit a segfault and enter the SIGSEGV handler right as a GC pause was sending SIG34. Control would go to the GC signal handler first and would mess with the stack (I couldn't figure out exactly what it did though) and when control returned to the SIGSEGV handler, the first
pushq
in the method prelude caused another segfault and the program crashed.