Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NativeAOT] Stackoverflow reporting on Linux #82334

Open
jkotas opened this issue Feb 18, 2023 · 4 comments · Fixed by #94485
Open

[NativeAOT] Stackoverflow reporting on Linux #82334

jkotas opened this issue Feb 18, 2023 · 4 comments · Fixed by #94485
Assignees
Milestone

Comments

@jkotas
Copy link
Member

jkotas commented Feb 18, 2023

Repro

Recursion(1);

void Recursion(int x)
{
    Recursion(x+1);
    Recursion(x+1);
}

Actual result

Segmentation fault

Expected result

Process is terminating due to StackOverflowException

(Reported by partner team.)

@jkotas jkotas added this to the 8.0.0 milestone Feb 18, 2023
@ghost
Copy link

ghost commented Feb 18, 2023

Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas
See info in area-owners.md if you want to be subscribed.

Issue Details

Repro

Recursion(1);

void Recursion(int x)
{
    Recursion(x+1);
    Recursion(x+1);
}

Actual result

Segmentation fault

Expected result

Process is terminating due to StackOverflowException

(Reported by partner team.)

Author: jkotas
Assignees: -
Labels:

area-NativeAOT-coreclr

Milestone: 8.0.0

@agocke agocke added this to AppModel Mar 6, 2023
@agocke agocke modified the milestones: 8.0.0, 9.0.0 Aug 9, 2023
@jtschuster jtschuster self-assigned this Oct 25, 2023
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Oct 25, 2023
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Nov 13, 2023
@jkotas
Copy link
Member Author

jkotas commented Nov 29, 2023

The change was reverted by #95415

@jkotas jkotas reopened this Nov 29, 2023
@jtschuster
Copy link
Member

It looks like all the crashes occurred when the SIGSEGV was hit while the GC was trying to suspend all threads. I'm not sure why that was causing crashes with the alternate stack but isn't causing crashes when it uses the regular stack.

@janvorli
Copy link
Member

janvorli commented Dec 7, 2023

My (wild) guess is that it might be due to libunwind not being able to walk over the SIGSEGV frame when the handler is running on a different stack from the code where the sigsegv occured. In coreclr, we actually don't rely on libunwind over that boundary, we explicitly skip it using a context that we store in the sigsegv handler. See

// Check if the PC is the return address from the SEHProcessException.
// If that's the case, extract its local variable containing a pointer to the windows style context of the hardware
// exception and return that. This skips the hardware signal handler trampoline that the libunwind
// cannot cross on some systems. On macOS, it skips a similar trampoline we create in HijackFaultingThread.
if ((void*)curPc == g_SEHProcessExceptionReturnAddress)
{
CONTEXT* exceptionContext = *(CONTEXT**)(CONTEXTGetFP(context) + g_hardware_exception_context_locvar_offset);
memcpy_s(context, sizeof(CONTEXT), exceptionContext, sizeof(CONTEXT));
return TRUE;
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment