Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make SIGSEGV handler emit nicer backtraces #113565

Merged
merged 3 commits into from
Aug 30, 2023

Conversation

workingjubilee
Copy link
Member

@workingjubilee workingjubilee commented Jul 11, 2023

This annotates the code heavily with comments to explain what is going on, for the benefit of other compiler contributors. The backtrace also emits appropriate comments to clarify, to a programmer who may not know why a bunch of file paths and hexadecimal blather was just dumped into stderr, what is going on. Finally, it detects cycles and uses their regularity to avoid repeating a bunch of text. The previous backtraces we were emitting was extremely unfriendly, potentially confusing, and often alarming, and this makes things almost "nice".

We can't necessarily make them much nicer than this, because a signal handler must use "signal-safe" functions. This precludes conveniences like dynamic allocations. Fortunately, Rust's stdlib has allocation-free formatting, but it may hinder integrating this error with our localization middleware, as I wasn't able to clearly ascertain, at a glance, whether there was a zero-alloc path through it.

r? @Nilstrieb

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jul 11, 2023
@workingjubilee
Copy link
Member Author

workingjubilee commented Jul 11, 2023

emitted output now looks like:

error: rustc interrupted by SIGSEGV, printing stack trace:

./build/x86_64-unknown-linux-gnu/stage1/bin/../lib/librustc_driver-edec4982b354a535.so(+0x106c9f6)[0x7f6e63b4d9f6]
/usr/lib/libc.so.6(+0x39ab0)[0x7f6e6274bab0]
./build/x86_64-unknown-linux-gnu/stage1/bin/../lib/librustc_driver-edec4982b354a535.so(+0x650251a)[0x7f6e68fe351a]
./build/x86_64-unknown-linux-gnu/stage1/bin/../lib/librustc_driver-edec4982b354a535.so(+0x64fb1f3)[0x7f6e68fdc1f3]
./build/x86_64-unknown-linux-gnu/stage1/bin/../lib/librustc_driver-edec4982b354a535.so(+0x654343e)[0x7f6e6902443e]

...this goes on for a bit...

./build/x86_64-unknown-linux-gnu/stage1/bin/../lib/librustc_driver-edec4982b354a535.so(+0x6543868)[0x7f6e69024868]

please report a bug to https://github.com/rust-lang/rust

I don't know if this error message can be localized compatibly with our Fluent infrastructure, because of the exceedingly particular requirements around stack/heap handling inside a signal handler. Will have to look into that.

@workingjubilee
Copy link
Member Author

Ah.

impl<'args> TranslateError<'args> {
    // some code
    // then...
    pub fn fluent(
        id: &'args Cow<'args, str>,
        args: &'args FluentArgs<'args>,
        errs: Vec<FluentError>,
    ) -> Self {

I looked into it. The answer is it should almost certainly not be without changes to the Fluent API surface, which uses Vecs and the like. We do not want to touch those while inside a signal handler. Allocating or deallocating anything is forbidden. We probably don't want to even think about Boxes. ( Sorry Boxy. )

Copy link
Member

@Noratrieb Noratrieb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, that already looks better

compiler/rustc_driver_impl/src/lib.rs Outdated Show resolved Hide resolved
compiler/rustc_driver_impl/src/lib.rs Outdated Show resolved Hide resolved
@workingjubilee workingjubilee force-pushed the better-signal-handler-message branch 2 times, most recently from 366afb0 to ebf1bfc Compare July 11, 2023 15:21
@workingjubilee
Copy link
Member Author

The tail of the error now looks like:


error: stack trace dumped due to SIGSEGV, possible stack overflow
error: please report a bug to https://github.com/rust-lang/rust
Segmentation fault (core dumped)

@workingjubilee workingjubilee linked an issue Jul 11, 2023 that may be closed by this pull request
@workingjubilee workingjubilee changed the title Explain SIGSEGV backtrace handler Make SIGSEGV handler emit nicer backtraces Jul 13, 2023
@workingjubilee
Copy link
Member Author

New output now looks like... no trimming, this was previously a large backtrace that maxed out the 256 frames via stack overflow:

$ ./build/x86_64-unknown-linux-gnu/stage1/bin/rustc overflowy.rs
error: rustc interrupted by SIGSEGV, printing backtrace

./build/x86_64-unknown-linux-gnu/stage1/bin/../lib/librustc_driver-edec4982b354a535.so(+0x8ca553)[0x7f394028a553]
/usr/lib/libc.so.6(+0x39ab0)[0x7f393f628ab0]
./build/x86_64-unknown-linux-gnu/stage1/bin/../lib/librustc_driver-edec4982b354a535.so(+0x24db98a)[0x7f3941e9b98a]

### cycle encountered after 3 frames with period 5
./build/x86_64-unknown-linux-gnu/stage1/bin/../lib/librustc_driver-edec4982b354a535.so(+0x24d4663)[0x7f3941e94663]
./build/x86_64-unknown-linux-gnu/stage1/bin/../lib/librustc_driver-edec4982b354a535.so(+0x251c8ae)[0x7f3941edc8ae]
./build/x86_64-unknown-linux-gnu/stage1/bin/../lib/librustc_driver-edec4982b354a535.so(+0x251ccd8)[0x7f3941edccd8]
./build/x86_64-unknown-linux-gnu/stage1/bin/../lib/librustc_driver-edec4982b354a535.so(+0x251cda3)[0x7f3941edcda3]
./build/x86_64-unknown-linux-gnu/stage1/bin/../lib/librustc_driver-edec4982b354a535.so(+0x24db9a4)[0x7f3941e9b9a4]
### recursed 50 times

./build/x86_64-unknown-linux-gnu/stage1/bin/../lib/librustc_driver-edec4982b354a535.so(+0x24d4663)[0x7f3941e94663]
./build/x86_64-unknown-linux-gnu/stage1/bin/../lib/librustc_driver-edec4982b354a535.so(+0x251c8ae)[0x7f3941edc8ae]
./build/x86_64-unknown-linux-gnu/stage1/bin/../lib/librustc_driver-edec4982b354a535.so(+0x251ccd8)[0x7f3941edccd8]

note: rustc unexpectedly overflowed its stack! this is a bug
note: maximum backtrace depth reached, frames may have been lost
note: we would appreciate a report at https://github.com/rust-lang/rust
Segmentation fault (core dumped)

@rust-log-analyzer

This comment has been minimized.

sa.sa_sigaction = print_stack_trace as libc::sighandler_t;
sa.sa_flags = libc::SA_NODEFER | libc::SA_RESETHAND | libc::SA_ONSTACK;
libc::sigemptyset(&mut sa.sa_mask);
libc::sigaction(libc::SIGSEGV, &sa, ptr::null_mut());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably backtrace on SIGBUS too, since wild pointers can trigger sigbus just as easily (okay, maybe not just as easily, but, easily).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to look into what sort of signals we wanted to extend to after, since what I really started digging into the signal handler for is to set up a SIGILL handler for when the compiler needs a hot pad and some soup LLVM, running in our process space, runs into one of its llvm_unreachables that on x86-64, thanks to the CMake options we build LLVM with, are compiled into ud2 instead of UB (because they're @llvm.trap()).

/// When SIGSEGV is delivered to the process, print a stack trace and then exit.
pub(super) fn install() {
unsafe {
let alt_stack_size: usize = min_sigstack_size() + 64 * 1024;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is 64k still definitely enough stack space for this more complicated handler to run in?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far on my machine yes, but then again my machine is probably answering getauxval(AT_MINSIGSTKSZ) with a nonzero value.

Copy link
Member Author

@workingjubilee workingjubilee Jul 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a good question to ask because I found a spot where a small change removes an entire panic branch (and simplifies the code in general).

@workingjubilee workingjubilee force-pushed the better-signal-handler-message branch from ce21d91 to e756ccd Compare July 14, 2023 01:17
@thomcc
Copy link
Member

thomcc commented Jul 14, 2023

We should also consider having an env var1 or something you can set to symbolicate that output with backtrace_symbols, since it's often enough fine.

(Note that we should just leak the result of backtrace_symbols though -- technically you can free it with free but this crashes a shocking amount of the time, expecially if you're using an overridden malloc impl)

Footnotes

  1. Ofc, we should read it when installing the handler, not inside the handler.

@workingjubilee
Copy link
Member Author

workingjubilee commented Jul 14, 2023

I was thinking about making that the RUST_BACKTRACE=full behavior.

"oh you want the ENTIRE backtrace, do you? well..."

@workingjubilee workingjubilee force-pushed the better-signal-handler-message branch from e756ccd to 7f3bc3b Compare July 14, 2023 01:37
@rust-log-analyzer

This comment has been minimized.

@workingjubilee
Copy link
Member Author

I'm sorry?

@workingjubilee
Copy link
Member Author

Oh, HEAD is fucked again.

workingjubilee and others added 3 commits July 19, 2023 00:15
...to both compiler contributors and other rustc invokers.
The previous error messaging was extremely unfriendly,
and potentially both confusing and alarming.

The entire modules is extracted into a new file to help
ease of reading, and plenty of comments get layered in.
The design of the messaging is focused on preventing
the overall purpose of the output from being too opaque
in common cases, so users understand why it is there.
There's not an immediate need for it being actionable,
but some suggestions are offered anyways.
This cleans up the formatting of the backtrace considerably,
dodging the fact that a backtrace with recursion normally
emits hundreds of lines. Instead, simply write repeated symbols,
and add how many times the recursion occurred.
Also, it makes it look more like the "normal" backtrace.

Some fine details in the code are important for reducing
the amount of possible panic branches.
First, we reuse the `MAX_FRAMES` constant.

Co-authored-by: erikdesjardins <[email protected]>

Then we choose an arbitrary recursion depth for
the other case. In this case, I used 2d20. Honest.
@workingjubilee workingjubilee force-pushed the better-signal-handler-message branch from 86bbb9d to 3dee977 Compare July 19, 2023 07:22
@workingjubilee
Copy link
Member Author

Nils is /afk
r? compiler

@workingjubilee
Copy link
Member Author

orz
r? compiler

@rustbot rustbot assigned cjgillot and unassigned eholk Aug 3, 2023
@pnkfelix
Copy link
Member

r? @pnkfelix

@rustbot rustbot assigned pnkfelix and unassigned cjgillot Aug 17, 2023
// Begin elaborating return addrs into symbols and writing them directly to stderr
// Most backtraces are stack overflow, most stack overflows are from recursion
// Check for cycles before writing 250 lines of the same ~5 symbols
let cycled = |(runner, walker)| runner == walker;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting... I'm trying to decide whether there's any world where the "normal" stack backtrace mechanism could benefit from an attempt to compress its presentation in the same manner that you are doing here. (e.g. if code is tracking a depth counter or some other exhaustion-prevention mechanism, and then panicking explicitly rather than allowing the stack to actually overflow...)

But I suspect that this code belongs here and only here because in vast majority of cases, that kind of unrestricted recursion is going to end with a SIGSEGV and handled here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could lift cycle-detection algorithms up into other parts of rustc, since in many cases we run afoul of essentially unbounded recursion, but it's probably not the case that this one is sufficiently general yet to be used thusly.

// impossible.
return;
};

Copy link
Member

@pnkfelix pnkfelix Aug 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the code inside here might benefit from an ascii-art diagram or something.

My main concern is that even though I have passing familiarity with Floyd's tortoise and hare algorithm, I nonetheless cannot immediately tell whether you are e.g. reporting the minimal period λ, or if you are only guaranteeing that you are reporting some period ν that is a multiple of λ. (See the linked wikipedia page for where those greek letter cames from.)

... or better still: Would it make any sense to factor this out into its own generic routine that can have concrete unit tests, and then instantiate it here?

... anyway, on the other hand, this code has waited long enough to land, and it certainly looks like an improvement. Further improvements like what I am suggesting here can be follow-on PR's.

Copy link
Member Author

@workingjubilee workingjubilee Aug 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! I want rustc to have a SIGILL handler as well, so I'm definitely going to be doing some followups here and imposing better factorization. It just seemed premature to do anything like that, or at least I wasn't entirely sure what would be best to factor.

@@ -87,14 +87,15 @@ extern "C" fn print_stack_trace(_: libc::c_int) {
raw_errln!("");
written += rem.len() + 1;

if cyclic || stack.len() == 256 {
let random_depth = || 8 * 16; // chosen by random diceroll (2d20)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(... I don't understand why this is a closure versus just a let-bound usize.)

@pnkfelix
Copy link
Member

@bors r+ rollup

@bors
Copy link
Contributor

bors commented Aug 30, 2023

📌 Commit 3dee977 has been approved by pnkfelix

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 30, 2023
bors added a commit to rust-lang-ci/rust that referenced this pull request Aug 30, 2023
…iaskrgr

Rollup of 7 pull requests

Successful merges:

 - rust-lang#113565 (Make SIGSEGV handler emit nicer backtraces)
 - rust-lang#114704 (parser: not insert dummy field in struct)
 - rust-lang#115272 (miri/diagnostics: don't forget to print_backtrace when ICEing on unexpected errors)
 - rust-lang#115313 (Make `get_return_block()` return `Some` only for HIR nodes in body)
 - rust-lang#115347 (suggest removing `impl` in generic trait bound position)
 - rust-lang#115355 (new solver: handle edge case of a recursion limit of 0)
 - rust-lang#115363 (Don't suggest adding parentheses to call an inaccessible method.)

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit dafea5f into rust-lang:master Aug 30, 2023
@rustbot rustbot added this to the 1.74.0 milestone Aug 30, 2023
@workingjubilee workingjubilee deleted the better-signal-handler-message branch August 30, 2023 22:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Excessively chaining operators causes a stack overflow in rustc rustc segfault (on ridiculous unlikely code)
10 participants