Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework handling of recursive panics #110975

Merged
merged 2 commits into from
May 27, 2023
Merged

Rework handling of recursive panics #110975

merged 2 commits into from
May 27, 2023

Conversation

Amanieu
Copy link
Member

@Amanieu Amanieu commented Apr 29, 2023

This PR makes 2 changes to how recursive panics works (a panic while handling a panic).

  1. The panic count is no longer used to determine whether to force an immediate abort. This allows code like the following to work without aborting the process immediately:
struct Double;

impl Drop for Double {
    fn drop(&mut self) {
        // 2 panics are active at once, but this is fine since it is caught.
        std::panic::catch_unwind(|| panic!("twice"));
    }
}

let _d = Double;

panic!("once");

Rustc already generates appropriate code so that any exceptions escaping out of a Drop called in the unwind path will immediately abort the process.

  1. Any panics while the panic hook is executing will force an immediate abort. This is necessary to avoid potential deadlocks like rustc hangs after ICEing due to memory limit #110771 where a panic happens while holding the backtrace lock. We don't even try to print the panic message in this case since the panic may have been caused by Display impls.

Fixes #110771

@rustbot
Copy link
Collaborator

rustbot commented Apr 29, 2023

r? @joshtriplett

(rustbot has picked a reviewer for you, use r? to override)

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Apr 29, 2023
@rustbot
Copy link
Collaborator

rustbot commented Apr 29, 2023

Hey! It looks like you've submitted a new PR for the library teams!

If this PR contains changes to any rust-lang/rust public library APIs then please comment with @rustbot label +T-libs-api -T-libs to tag it appropriately. If this PR contains changes to any unstable APIs please edit the PR description to add a link to the relevant API Change Proposal or create one if you haven't already. If you're unsure where your change falls no worries, just leave it as is and the reviewer will take a look and make a decision to forward on if necessary.

Examples of T-libs-api changes:

  • Stabilizing library features
  • Introducing insta-stable changes such as new implementations of existing stable traits on existing stable types
  • Introducing new or changing existing unstable library APIs (excluding permanently unstable features / features without a tracking issue)
  • Changing public documentation in ways that create new stability guarantees
  • Changing observable runtime behavior of library APIs

@rust-log-analyzer

This comment has been minimized.

@rustbot
Copy link
Collaborator

rustbot commented Apr 29, 2023

The Miri subtree was changed

cc @rust-lang/miri

@rust-log-analyzer

This comment has been minimized.

@Amanieu
Copy link
Member Author

Amanieu commented Apr 29, 2023

I don't know why the test is failing. It claims that panic in a function that cannot unwind is not found in stderr despite it clearly being there!

@nbdd0121
Copy link
Contributor

The change in panic runtime looks correct to me.

@bors
Copy link
Contributor

bors commented May 1, 2023

☔ The latest upstream changes (presumably #111036) made this pull request unmergeable. Please resolve the merge conflicts.

Copy link
Member

@RalfJung RalfJung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 on the Miri changes

@Amanieu Amanieu force-pushed the panic_count branch 2 times, most recently from 8244990 to 02a0faa Compare May 6, 2023 08:27
@bors
Copy link
Contributor

bors commented May 11, 2023

☔ The latest upstream changes (presumably #111454) made this pull request unmergeable. Please resolve the merge conflicts.

@joshtriplett
Copy link
Member

@bors r+ rollup

@bors
Copy link
Contributor

bors commented May 15, 2023

📌 Commit 9c58f91 has been approved by joshtriplett

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 15, 2023
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request May 15, 2023
Rework handling of recursive panics

This PR makes 2 changes to how recursive panics works (a panic while handling a panic).

1. The panic count is no longer used to determine whether to force an immediate abort. This allows code like the following to work without aborting the process immediately:

```rust
struct Double;

impl Drop for Double {
    fn drop(&mut self) {
        // 2 panics are active at once, but this is fine since it is caught.
        std::panic::catch_unwind(|| panic!("twice"));
    }
}

let _d = Double;

panic!("once");
```

Rustc already generates appropriate code so that any exceptions escaping out of a `Drop` called in the unwind path will immediately abort the process.

2. Any panics while the panic hook is executing will force an immediate abort. This is necessary to avoid potential deadlocks like rust-lang#110771 where a panic happens while holding the backtrace lock. We don't even try to print the panic message in this case since the panic may have been caused by `Display` impls.

Fixes rust-lang#110771
@bors bors added the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label May 16, 2023
@Noratrieb
Copy link
Member

@bors rollup=iffy

@Amanieu
Copy link
Member Author

Amanieu commented May 17, 2023

Should be fixed now.

@rustbot ready

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels May 17, 2023
@Amanieu
Copy link
Member Author

Amanieu commented May 27, 2023

@bors r=joshtriplett

@bors
Copy link
Contributor

bors commented May 27, 2023

📌 Commit de607f1 has been approved by joshtriplett

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 27, 2023
@bors
Copy link
Contributor

bors commented May 27, 2023

⌛ Testing commit de607f1 with merge f91b634...

@bors
Copy link
Contributor

bors commented May 27, 2023

☀️ Test successful - checks-actions
Approved by: joshtriplett
Pushing f91b634 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label May 27, 2023
@bors bors merged commit f91b634 into rust-lang:master May 27, 2023
@rustbot rustbot added this to the 1.71.0 milestone May 27, 2023
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (f91b634): comparison URL.

Overall result: ❌ regressions - no action needed

@rustbot label: -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.5% [0.5%, 0.5%] 1
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.5% [0.5%, 0.5%] 1

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
2.1% [2.1%, 2.1%] 1
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 2.1% [2.1%, 2.1%] 1

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 645.972s -> 646.05s (0.01%)

@correabuscar
Copy link

correabuscar commented Mar 21, 2024

This PR seems to have introduced a regression, namely this playground example mentioned here no longer shows any stacktraces as it used to do when the comment was made back in 2022. EDIT2: ok, the reason it doesn't show is because now rust_panic_with_hook is only allowed to be entered once instead of twice, then(ie. any more enters and) it will abort. But if we make the example infinitely panic, it won't show stacktrace even with that version of rust because it hits abort just like current rust does.

As this seem to be by design(Any panics while the panic hook is executing will force an immediate abort.), I guess there's no point in opening a new issue...(EDIT1: or perhaps since the user custom panic hook used to run before eg. with rustc 1.71.0-nightly (1a5f8bce7 2023-05-26) and now it doesn't, this should be an issue)

Either way, the double panic is caused by the payload.get() there:

Hook::Default => {
info.set_payload(payload.get());
default_hook(&info);
}
Hook::Custom(ref hook) => {
info.set_payload(payload.get());
hook(&info);
}

and thus doesn't even get to execute a potential user panic hook, or the default panic hook.

...though it's by design: We don't even try to print the panic message in this case since the panic may have been caused by Display impls.
EDIT3: this makes sense ^, however what about printing just the location ?

@Amanieu
Copy link
Member Author

Amanieu commented Mar 22, 2024

Printing just the location should be fine since this doesn't execute any user-controlled code. Feel free to submit a PR for this.

@correabuscar
Copy link

correabuscar commented Mar 22, 2024

Printing just the location should be fine since this doesn't execute any user-controlled code. Feel free to submit a PR for this.

EDIT4: I'm wrong here because we'd be using rtprintpanic! instead of println! and thus wouldn't alloc.
Apparently, that wouldn't work either, because:

"In Rust, println! or any other printing function typically involves memory allocation, especially if you are printing dynamic content or formatting strings."

some not relevant info

and I've stumbled upon that as I've tried to do something else:

"Therefore, directly calling println! within the alloc and dealloc methods of the allocator could cause recursion or other unexpected behavior due to additional allocations."

(those are from chatgpt btw)

EDIT: lookslike, in my particular example with alloc, I can use a static AtomicBool (not thread local tho) to guard the println! so it doesn't infinitely recurse the alloc.(EDIT2: actually AtomicBools isn't doing it right for me, i don't have guaranteed atomicity during .compare_exchange() EDIT3: it appears chatgpt misguided me here, compare_exchange is atomic) But this isn't relevant here.

Note that I'm rather noobish at this and I'm just trying to learn things as I go along, which is surprisingly way more fun than just reading the rust book(which I've failed to do many times over the years)

@RalfJung
Copy link
Member

RalfJung commented Mar 22, 2024 via email

@correabuscar
Copy link

correabuscar commented Mar 22, 2024

tl;dr: rtprintpanic! doesn't alloc, like println! would(the stdout buffer the first time).

Do you have a reliable source for that information? ChatGPT is just as likely to hallucinate some nonsense as it is to provide useful information. I also fail to see how it is relevant; we are talking about panics during panic handling, not about allocation failures. The vast majority of panics is unrelated to allocation failure, including the examples shared here.

You may be right, because an allocation(1024 bytes EDIT1: that's apparently happening due to a let o=std::io::stdout(); somewhere inside println!) happens only the first time ever println! is used, but not any subsequent times. (presumably it gets reused any subsequent times?)
I used a global allocator override to tell me about allocations.
https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=231ed1c9c63b5e325c985b9143044d33
If you comment out all println! in main() it won't alloc that 1024 bytes.
EDIT1: so this is why 1024 bytes get allocated

pub fn stdout() -> Stdout {
    Stdout {
        inner: STDOUT
            .get_or_init(|| ReentrantMutex::new(RefCell::new(LineWriter::new(stdout_raw())))),
    }
}

I guess then println! doesn't otherwise alloc, and chatgpt hallucinated as you said.(happens way too often)
EDIT2: Looks like rtprintpanic! here:

macro_rules! rtprintpanic {
($($t:tt)*) => {
if let Some(mut out) = crate::sys::stdio::panic_output() {
let _ = crate::io::Write::write_fmt(&mut out, format_args!($($t)*));
}
}
}

Calls that panic_output() which does NOT alloc:
pub fn panic_output() -> Option<impl io::Write> {
Some(Stderr::new())
}

because StdErr::new() doesn't alloc:
pub const fn new() -> Stderr {
Stderr(())
}

or maybe it does alloc on Windows?
impl Stderr {
pub const fn new() -> Stderr {
Stderr { incomplete_utf8: IncompleteUtf8::new() }
it doesn't:
pub const fn new() -> IncompleteUtf8 {
IncompleteUtf8 { bytes: [0; 4], len: 0 }

Side note: eprintln!() doesn't alloc that stderr() 1024 byte buffer, like println! does with the stdout() one. ie. let e=std::io::stderr(); doesn't alloc any bytes!

As to why it is relevant, it's because we can't do new allocations there in panic handling code because:

// Memory allocations performed in a child created with libc::fork are undefined behavior in most operating systems.

from: https://github.com/rust-lang/rust/pull/110975/files#diff-88e2a536317b831c2e958b9205fde12f5edaabefba963bdd3a7503bbdedf8da9R325-R326

@RalfJung
Copy link
Member

RalfJung commented Mar 23, 2024

Code that forks has to set always_abort. Otherwise it is not safe to panic after fork. This then executes this code:

// Unfortunately, this does not print a backtrace, because creating
// a `Backtrace` will allocate, which we must to avoid here.
let panicinfo = PanicInfo::internal_constructor(
message,
location,
can_unwind,
force_no_backtrace,
);
rtprintpanic!("{panicinfo}\npanicked after panic::always_abort(), aborting.\n");

So it's already a separate codepath and in the other codepaths, we are in principle free to use allocation (and existing code relies on that).

But anyway, closed PRs are a bad place to discuss anything. If you think something needs to be improved or something is wrong with when we do or do not allocate, please file an issue. The problem of not having enough panic location information in some cases is tracked at #97181. If you want to learn why the current code makes sense, I suggest asking on Zulip.

@correabuscar
Copy link

correabuscar commented Mar 23, 2024

Thanks.

I just want to add, that at line 755 there, message being user controlled (ie. Display implementation) they could even alloc there, or panic. In this latter case, if they do panic, control will be back toNOPE

panic_count::MustAbort::PanicInHook => {
// Don't try to print the message in this case
// - perhaps that is causing the recursive panics.
rtprintpanic!("thread panicked while processing panic. aborting.\n");
}

this is why I was worried about allocating when printing the location there, but since rtprintpanic! there doesn't allocate(I later found out), it's not a problem.

Thanks for making the PR, I couldn't have done it in time, I don't have the environment set up for a git version of rust that I could test that all its test cases wouldn't break because of introducing the change.

(and I will look into Zulip, thx)

EDIT: Actually, looks like I'm wrong because line 394 there says it always returns the same MustAbort::AlwaysAbort

pub fn increase(run_panic_hook: bool) -> Option<MustAbort> {
let global_count = GLOBAL_PANIC_COUNT.fetch_add(1, Ordering::Relaxed);
if global_count & ALWAYS_ABORT_FLAG != 0 {
return Some(MustAbort::AlwaysAbort);
}
LOCAL_PANIC_COUNT.with(|c| {
let (count, in_panic_hook) = c.get();
if in_panic_hook {
return Some(MustAbort::PanicInHook);
}
c.set((count + 1, run_panic_hook));
None
})
}

this means it's a bug(in the context that i've framed it), right?
EDIT2: kind of, if infinite panic nesting is encountered, then it's a bug, like this:
https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021&gist=ee739d9534ba3924e6113bfa79f60fbf
but comment out the std::panic::always_abort(); line and it's not infinite recursion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

rustc hangs after ICEing due to memory limit