-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Report I/O errors from rmeta encoding with emit_fatal #119510
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We generally try to avoid fatal errors when possible, but here it looks very reasonable.
I'm not waffle and you asked for it but r=me if that works for you too
I've been sending the out-of-disk PRs to Waffle because I keep getting good reviews back, so in this case thank you for taking a look, but I will wait for Waffle until the tree opens (I have no idea when that would be, but any kind of rush in these circumstances seems silly). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me too, r=WaffleLapkin,Nilstrieb
once the tree feels better.
@@ -56,16 +56,14 @@ where | |||
} | |||
Err(err) if err.kind() == io::ErrorKind::NotFound => (), | |||
Err(err) => { | |||
sess.dcx().emit_err(errors::DeleteOld { name, path: path_buf, err }); | |||
return; | |||
sess.dcx().emit_fatal(errors::DeleteOld { name, path: path_buf, err }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can drop the ;
here, but it does not matter. (I prefer using expressions instead of blocks where possible, but yeah, not important...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was actually thinking about this as I made the change
d844f17
to
94c43cc
Compare
Marking as
S-blocked
|
Hm, it's
S-waiting-on-review
|
@bors r=WaffleLapkin,Nilstrieb rollup |
🌲 The tree is currently closed for pull requests below priority 100. This pull request will be tested once the tree is reopened. |
…Lapkin,Nilstrieb Report I/O errors from rmeta encoding with emit_fatal rust-lang#119456 reminded me that I never did systematic testing to provoke the out-of-disk ICEs so I grepped through a recent crater run (rust-lang#119440 (comment)) for more out-of-disk ICEs on current master and yep there's 2 in there. So I finally cooked up a way to provoke for these crashes. I wrote a little `cdylib` crate that has a `#[no_mangle] pub extern "C" fn write` which occasionally reports `ENOSPC`, and prints a backtrace when it does. <details><summary><strong>code for the dylib</strong></summary> <p> ```rust // cargo add libc rand backtrace use rand::Rng; #[no_mangle] pub extern "C" fn write( fd: libc::c_int, buf: *const libc::c_void, count: libc::size_t, ) -> libc::ssize_t { if fd > 2 && rand::thread_rng().gen::<u8>() == 0 { let mut count = 0; backtrace::trace(|frame| { backtrace::resolve_frame(frame, |symbol| { if let Some(name) = symbol.name() { if count > 3 { eprintln!("{}", name); } } count += 1; }); true }); unsafe { *libc::__errno_location() = libc::ENOSPC; } return -1; } else { unsafe { let res = libc::syscall(libc::SYS_write, fd as usize, buf as usize, count as usize) as isize; if res < 0 { *libc::__errno_location() = -res as i32; -1 } else { res } } } } ``` </p> </details> Then `LD_PRELOAD` that dylib and repeatedly build a big project until it ICEs, such as with this: ```bash while true; do cargo clean LD_PRELOAD=/home/ben/evil/target/release/libevil.so cargo +stage1 check 2> errors if grep "thread 'rustc' panicked" errors; then break fi done ``` My "big project" for testing was an otherwise-empty project with `cargo add axum`. Before this PR, the above procedure finds a crash in between 1 and 15 minutes. With this PR, I have not found a crash in 30 minutes, and I'll be leaving this to run overnight (starting now). (A night has now passed, no crashes were found) I believe the problem is that even though since rust-lang#117301 we correctly check `FileEncoder` for errors on all paths, we use `emit_err`, so there is a window of time between the call to `emit_err` and the full error reporting where rustc believes it has emitted a valid rmeta file and will permit Cargo to launch a build for a dependent crate. Changing these calls to `emit_fatal` closes that window. I think there are a number of other cases where `emit_err` has been used instead of the more-correct `emit_fatal` such as https://github.com/rust-lang/rust/blob/e51e98dde6a60637b6a71b8105245b629ac3fe77/compiler/rustc_codegen_ssa/src/back/write.rs#L542 but unlike rmeta encoding I am not aware of those cases of those causing problems. r? `@WaffleLapkin`
Rollup of 21 pull requests Successful merges: - rust-lang#119086 (Query panic!() to useful diagnostic) - rust-lang#119239 (Remove unnecessary arm in `check_expr_yield`) - rust-lang#119298 (suppress change-tracker warnings in CI containers) - rust-lang#119319 (Document that File does not buffer reads/writes) - rust-lang#119434 (rc: Take *const T in is_dangling) - rust-lang#119444 (Rename `TyCtxt::is_closure` to `TyCtxt::is_closure_or_coroutine`) - rust-lang#119474 (Update tracking issue of naked_functions) - rust-lang#119476 (Pretty-print always-const trait predicates correctly) - rust-lang#119477 (rustdoc ui: adjust tooltip z-index to be above sidebar) - rust-lang#119479 (Remove two unused feature gates from rustc_query_impl) - rust-lang#119487 (Minor improvements in comment on `freshen.rs`) - rust-lang#119492 (Update books) - rust-lang#119494 (Deny defaults for higher-ranked generic parameters) - rust-lang#119498 (Update deadlinks of `strict_provenance` lints) - rust-lang#119505 (Don't synthesize host effect params for trait associated functions marked const) - rust-lang#119510 (Report I/O errors from rmeta encoding with emit_fatal) - rust-lang#119512 (Mark myself as back from leave) - rust-lang#119514 (coverage: Avoid a query stability hazard in `function_coverage_map`) - rust-lang#119523 (llvm: Allow `noundef` in codegen tests) - rust-lang#119534 (Update `thread_local` examples to use `local_key_cell_methods`) - rust-lang#119544 (Fix: Properly set vendor in i686-win7-windows-msvc target) r? `@ghost` `@rustbot` modify labels: rollup
Rollup merge of rust-lang#119510 - saethlin:fatal-io-errors, r=WaffleLapkin,Nilstrieb Report I/O errors from rmeta encoding with emit_fatal rust-lang#119456 reminded me that I never did systematic testing to provoke the out-of-disk ICEs so I grepped through a recent crater run (rust-lang#119440 (comment)) for more out-of-disk ICEs on current master and yep there's 2 in there. So I finally cooked up a way to provoke for these crashes. I wrote a little `cdylib` crate that has a `#[no_mangle] pub extern "C" fn write` which occasionally reports `ENOSPC`, and prints a backtrace when it does. <details><summary><strong>code for the dylib</strong></summary> <p> ```rust // cargo add libc rand backtrace use rand::Rng; #[no_mangle] pub extern "C" fn write( fd: libc::c_int, buf: *const libc::c_void, count: libc::size_t, ) -> libc::ssize_t { if fd > 2 && rand::thread_rng().gen::<u8>() == 0 { let mut count = 0; backtrace::trace(|frame| { backtrace::resolve_frame(frame, |symbol| { if let Some(name) = symbol.name() { if count > 3 { eprintln!("{}", name); } } count += 1; }); true }); unsafe { *libc::__errno_location() = libc::ENOSPC; } return -1; } else { unsafe { let res = libc::syscall(libc::SYS_write, fd as usize, buf as usize, count as usize) as isize; if res < 0 { *libc::__errno_location() = -res as i32; -1 } else { res } } } } ``` </p> </details> Then `LD_PRELOAD` that dylib and repeatedly build a big project until it ICEs, such as with this: ```bash while true; do cargo clean LD_PRELOAD=/home/ben/evil/target/release/libevil.so cargo +stage1 check 2> errors if grep "thread 'rustc' panicked" errors; then break fi done ``` My "big project" for testing was an otherwise-empty project with `cargo add axum`. Before this PR, the above procedure finds a crash in between 1 and 15 minutes. With this PR, I have not found a crash in 30 minutes, and I'll be leaving this to run overnight (starting now). (A night has now passed, no crashes were found) I believe the problem is that even though since rust-lang#117301 we correctly check `FileEncoder` for errors on all paths, we use `emit_err`, so there is a window of time between the call to `emit_err` and the full error reporting where rustc believes it has emitted a valid rmeta file and will permit Cargo to launch a build for a dependent crate. Changing these calls to `emit_fatal` closes that window. I think there are a number of other cases where `emit_err` has been used instead of the more-correct `emit_fatal` such as https://github.com/rust-lang/rust/blob/e51e98dde6a60637b6a71b8105245b629ac3fe77/compiler/rustc_codegen_ssa/src/back/write.rs#L542 but unlike rmeta encoding I am not aware of those cases of those causing problems. r? ``@WaffleLapkin``
…etrochenkov Detect truncated DepGraph files I suspect that the following issues are caused by truncated incr comp files: * rust-lang#120582 * rust-lang#121499 * rust-lang#122210 We fail with an allocation failure or capacity overflow in this case because we assume that the ending bytes of an DepGraph file are the lengths of arrays. If the file has somehow been truncated then the ending bytes are probably some of our varint encoding, which tries to eliminate zero bytes, so interpreting a random 8 bytes as an array length has a very high chance of producing a byte capacity over `isize::MAX`. Now theoretically since rust-lang#119510 merged I have fixed the out-of-disk issues and yet in rust-lang#120894 (comment) I still see some decoding failures that look like out-of-disk ICEs, for example https://crater-reports.s3.amazonaws.com/beta-1.77-1/beta-2024-02-10/gh/scottfones.aoc_2022/log.txt So this PR should ensure that we get an ICE that clearly identifies if the file in question is truncated.
Rollup merge of rust-lang#122245 - saethlin:check-dep-graph-size, r=petrochenkov Detect truncated DepGraph files I suspect that the following issues are caused by truncated incr comp files: * rust-lang#120582 * rust-lang#121499 * rust-lang#122210 We fail with an allocation failure or capacity overflow in this case because we assume that the ending bytes of an DepGraph file are the lengths of arrays. If the file has somehow been truncated then the ending bytes are probably some of our varint encoding, which tries to eliminate zero bytes, so interpreting a random 8 bytes as an array length has a very high chance of producing a byte capacity over `isize::MAX`. Now theoretically since rust-lang#119510 merged I have fixed the out-of-disk issues and yet in rust-lang#120894 (comment) I still see some decoding failures that look like out-of-disk ICEs, for example https://crater-reports.s3.amazonaws.com/beta-1.77-1/beta-2024-02-10/gh/scottfones.aoc_2022/log.txt So this PR should ensure that we get an ICE that clearly identifies if the file in question is truncated.
#119456 reminded me that I never did systematic testing to provoke the out-of-disk ICEs so I grepped through a recent crater run (#119440 (comment)) for more out-of-disk ICEs on current master and yep there's 2 in there.
So I finally cooked up a way to provoke for these crashes. I wrote a little
cdylib
crate that has a#[no_mangle] pub extern "C" fn write
which occasionally reportsENOSPC
, and prints a backtrace when it does.code for the dylib
Then
LD_PRELOAD
that dylib and repeatedly build a big project until it ICEs, such as with this:My "big project" for testing was an otherwise-empty project with
cargo add axum
.Before this PR, the above procedure finds a crash in between 1 and 15 minutes. With this PR, I have not found a crash in 30 minutes, and I'll be leaving this to run overnight (starting now). (A night has now passed, no crashes were found)
I believe the problem is that even though since #117301 we correctly check
FileEncoder
for errors on all paths, we useemit_err
, so there is a window of time between the call toemit_err
and the full error reporting where rustc believes it has emitted a valid rmeta file and will permit Cargo to launch a build for a dependent crate. Changing these calls toemit_fatal
closes that window.I think there are a number of other cases where
emit_err
has been used instead of the more-correctemit_fatal
such asrust/compiler/rustc_codegen_ssa/src/back/write.rs
Line 542 in e51e98d
r? @WaffleLapkin