-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make SIGSEGV handler emit nicer backtraces #113565
Make SIGSEGV handler emit nicer backtraces #113565
Conversation
emitted output now looks like:
...this goes on for a bit...
I don't know if this error message can be localized compatibly with our Fluent infrastructure, because of the exceedingly particular requirements around stack/heap handling inside a signal handler. Will have to look into that. |
Ah. impl<'args> TranslateError<'args> {
// some code
// then...
pub fn fluent(
id: &'args Cow<'args, str>,
args: &'args FluentArgs<'args>,
errs: Vec<FluentError>,
) -> Self { I looked into it. The answer is it should almost certainly not be without changes to the Fluent API surface, which uses Vecs and the like. We do not want to touch those while inside a signal handler. Allocating or deallocating anything is forbidden. We probably don't want to even think about Boxes. ( Sorry Boxy. ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, that already looks better
366afb0
to
ebf1bfc
Compare
The tail of the error now looks like:
|
New output now looks like... no trimming, this was previously a large backtrace that maxed out the 256 frames via stack overflow:
|
This comment has been minimized.
This comment has been minimized.
sa.sa_sigaction = print_stack_trace as libc::sighandler_t; | ||
sa.sa_flags = libc::SA_NODEFER | libc::SA_RESETHAND | libc::SA_ONSTACK; | ||
libc::sigemptyset(&mut sa.sa_mask); | ||
libc::sigaction(libc::SIGSEGV, &sa, ptr::null_mut()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably backtrace on SIGBUS too, since wild pointers can trigger sigbus just as easily (okay, maybe not just as easily, but, easily).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was going to look into what sort of signals we wanted to extend to after, since what I really started digging into the signal handler for is to set up a SIGILL handler for when the compiler needs a hot pad and some soup LLVM, running in our process space, runs into one of its llvm_unreachable
s that on x86-64, thanks to the CMake options we build LLVM with, are compiled into ud2
instead of UB (because they're @llvm.trap()
).
/// When SIGSEGV is delivered to the process, print a stack trace and then exit. | ||
pub(super) fn install() { | ||
unsafe { | ||
let alt_stack_size: usize = min_sigstack_size() + 64 * 1024; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is 64k still definitely enough stack space for this more complicated handler to run in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So far on my machine yes, but then again my machine is probably answering getauxval(AT_MINSIGSTKSZ)
with a nonzero value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a good question to ask because I found a spot where a small change removes an entire panic branch (and simplifies the code in general).
ce21d91
to
e756ccd
Compare
We should also consider having an env var1 or something you can set to symbolicate that output with (Note that we should just leak the result of Footnotes
|
I was thinking about making that the "oh you want the ENTIRE backtrace, do you? well..." |
e756ccd
to
7f3bc3b
Compare
This comment has been minimized.
This comment has been minimized.
I'm sorry? |
Oh, HEAD is fucked again. |
...to both compiler contributors and other rustc invokers. The previous error messaging was extremely unfriendly, and potentially both confusing and alarming. The entire modules is extracted into a new file to help ease of reading, and plenty of comments get layered in. The design of the messaging is focused on preventing the overall purpose of the output from being too opaque in common cases, so users understand why it is there. There's not an immediate need for it being actionable, but some suggestions are offered anyways.
This cleans up the formatting of the backtrace considerably, dodging the fact that a backtrace with recursion normally emits hundreds of lines. Instead, simply write repeated symbols, and add how many times the recursion occurred. Also, it makes it look more like the "normal" backtrace. Some fine details in the code are important for reducing the amount of possible panic branches.
First, we reuse the `MAX_FRAMES` constant. Co-authored-by: erikdesjardins <[email protected]> Then we choose an arbitrary recursion depth for the other case. In this case, I used 2d20. Honest.
86bbb9d
to
3dee977
Compare
Nils is /afk |
orz |
r? @pnkfelix |
// Begin elaborating return addrs into symbols and writing them directly to stderr | ||
// Most backtraces are stack overflow, most stack overflows are from recursion | ||
// Check for cycles before writing 250 lines of the same ~5 symbols | ||
let cycled = |(runner, walker)| runner == walker; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is interesting... I'm trying to decide whether there's any world where the "normal" stack backtrace mechanism could benefit from an attempt to compress its presentation in the same manner that you are doing here. (e.g. if code is tracking a depth counter or some other exhaustion-prevention mechanism, and then panicking explicitly rather than allowing the stack to actually overflow...)
But I suspect that this code belongs here and only here because in vast majority of cases, that kind of unrestricted recursion is going to end with a SIGSEGV and handled here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could lift cycle-detection algorithms up into other parts of rustc, since in many cases we run afoul of essentially unbounded recursion, but it's probably not the case that this one is sufficiently general yet to be used thusly.
// impossible. | ||
return; | ||
}; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the code inside here might benefit from an ascii-art diagram or something.
My main concern is that even though I have passing familiarity with Floyd's tortoise and hare algorithm, I nonetheless cannot immediately tell whether you are e.g. reporting the minimal period λ, or if you are only guaranteeing that you are reporting some period ν that is a multiple of λ. (See the linked wikipedia page for where those greek letter cames from.)
... or better still: Would it make any sense to factor this out into its own generic routine that can have concrete unit tests, and then instantiate it here?
... anyway, on the other hand, this code has waited long enough to land, and it certainly looks like an improvement. Further improvements like what I am suggesting here can be follow-on PR's.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! I want rustc to have a SIGILL handler as well, so I'm definitely going to be doing some followups here and imposing better factorization. It just seemed premature to do anything like that, or at least I wasn't entirely sure what would be best to factor.
@@ -87,14 +87,15 @@ extern "C" fn print_stack_trace(_: libc::c_int) { | |||
raw_errln!(""); | |||
written += rem.len() + 1; | |||
|
|||
if cyclic || stack.len() == 256 { | |||
let random_depth = || 8 * 16; // chosen by random diceroll (2d20) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(... I don't understand why this is a closure versus just a let-bound usize.)
@bors r+ rollup |
…iaskrgr Rollup of 7 pull requests Successful merges: - rust-lang#113565 (Make SIGSEGV handler emit nicer backtraces) - rust-lang#114704 (parser: not insert dummy field in struct) - rust-lang#115272 (miri/diagnostics: don't forget to print_backtrace when ICEing on unexpected errors) - rust-lang#115313 (Make `get_return_block()` return `Some` only for HIR nodes in body) - rust-lang#115347 (suggest removing `impl` in generic trait bound position) - rust-lang#115355 (new solver: handle edge case of a recursion limit of 0) - rust-lang#115363 (Don't suggest adding parentheses to call an inaccessible method.) r? `@ghost` `@rustbot` modify labels: rollup
This annotates the code heavily with comments to explain what is going on, for the benefit of other compiler contributors. The backtrace also emits appropriate comments to clarify, to a programmer who may not know why a bunch of file paths and hexadecimal blather was just dumped into stderr, what is going on. Finally, it detects cycles and uses their regularity to avoid repeating a bunch of text. The previous backtraces we were emitting was extremely unfriendly, potentially confusing, and often alarming, and this makes things almost "nice".
We can't necessarily make them much nicer than this, because a signal handler must use "signal-safe" functions. This precludes conveniences like dynamic allocations. Fortunately, Rust's stdlib has allocation-free formatting, but it may hinder integrating this error with our localization middleware, as I wasn't able to clearly ascertain, at a glance, whether there was a zero-alloc path through it.
r? @Nilstrieb