Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std::io doesn't work when stdout/stderr has O_NONBLOCK set #100673

Closed
uarif1 opened this issue Aug 17, 2022 · 14 comments
Closed

std::io doesn't work when stdout/stderr has O_NONBLOCK set #100673

uarif1 opened this issue Aug 17, 2022 · 14 comments
Labels
C-bug Category: This is a bug. O-unix Operating system: Unix-like T-libs Relevant to the library team, which will review and decide on the PR/issue.

Comments

@uarif1
Copy link
Contributor

uarif1 commented Aug 17, 2022

When stdout/stderr uses tty and has O_NONBLOCK set, the kernel driver returns -EAGAIN (https://elixir.bootlin.com/linux/v5.19/source/drivers/tty/tty_io.c#L952). This causes rust to panic, rather than try to print the same buffer again.

An example of when O_NONBLOCK is set on stdio is when using qemu serial.

I made an attempt to handle EAGAIN for std::io(#100594), but it was on generic code so not the right approach. Maybe there is a better approach to handle EAGAIN specifically for std::io and not panic?

To test this just change the stdout to O_NONBLOCK and then create 10 threads to print to stdout, C++ is able to print and finish to completion, but rust panics and crashes, so I think glibc also probably has similar support for O_NONBLOCK for stdout.

Thanks

@the8472
Copy link
Member

the8472 commented Aug 17, 2022

C++ is able to print and finish to completion, but rust panics and crashes, so I think glibc also probably has similar support for O_NONBLOCK for stdout.

C++ stdio streams don't support nonblocking IO. I suspect they silently drop the writes because fail() wasn't checked?


re @famz (#100594 (comment)): I think it's better to continue the discussion here since changing behavior as in that PR is not necessarily the right outcome

For example if the main project is designed to set and use stdout in NON_BLOCKING for reasons, any linked library crate that calls println! could crash, which IMO is not ideal situation, and is quite hard to fix.

If something manipulates the properties of the stdio file descriptors then it must have exclusive access to those descriptors because it's obviously not compatible with libraries that assume blocking IO. Which in turn means other things should not be printing to that descriptor because it's in exclusive use.

For example if something is printing JSON output then stray stdout writes by another thread will break the json output. Patching the non-blocking issue does not fix the ownership assumptions the code is making.

as it's out of control; or it could be that O_NONBLOCK is intended by the developer.

Which means he's assuming that's ok to do, which usually is only the case when being the sole user of an FD.

@the8472 the8472 added T-libs Relevant to the library team, which will review and decide on the PR/issue. C-bug Category: This is a bug. O-unix Operating system: Unix-like I-libs-nominated Nominated for discussion during a libs team meeting. labels Aug 17, 2022
@the8472
Copy link
Member

the8472 commented Aug 17, 2022

Nominating for discussion. Do we want a) document that nonblocking stdio is unsupported b) add polling fallbacks. In the latter case the question is whether it should be added only for the stdio wrappers or for all FD wrappers.

Recent changes to windows (#98950) point in the direction of adding a fallback?

@uarif1
Copy link
Contributor Author

uarif1 commented Aug 17, 2022

C++ is able to print and finish to completion, but rust panics and crashes, so I think glibc also probably has similar support for O_NONBLOCK for stdout.

C++ stdio streams don't support nonblocking IO. I suspect they silently drop the writes because fail() wasn't checked?

I just tried the example in stack overflow and cout.fail is 0, but as mentioned in the article, it seems to only affect macOS and not linux, and I am testing on linux. The output for me looks like:

stdout O_NONBLOCK is: 2048
stdin O_NONBLOCK is: 2048
fgvbhln;j;jblhcbwnoi wiejnocje ocweo cewoj coj weoj cowejojewohewouspuweoche njwpihbeh wjo erouhep hv reough er
fgvbhln;j;jblhcbwnoi wiejnocje ocweo cewoj coj weoj cowejojewohewouspuweoche njwpihbeh wjo erouhep hv reough er
fgvbhln;j;jblhcbwnoi wiejnocje ocweo cewoj coj weoj cowejojewohewouspuweoche njwpihbeh wjo erouhep hv reough er
fgvbhln;j;jblhcbwnoi wiejnocje ocweo cewoj coj weoj cowejojewohewouspuweoche njwpihbeh wjo erouhep hv reough er
fgvbhln;j;jblhcbwnoi wiejnocje ocweo cewoj coj weoj cowejojewohewouspuweoche njwpihbeh wjo erouhep hv reough er
fgvbhln;j;jblhcbwnoi wiejnocje ocweo cewoj coj weoj cowejojewohewouspuweoche njwpihbeh wjo erouhep hv reough er
fgvbhln;j;jblhcbwnoi wiejnocje ocweo cewoj coj weoj cowejojewohewouspuweoche njwpihbeh wjo erouhep hv reough er
fgvbhln;j;jblhcbwnoi wiejnocje ocweo cewoj coj weoj cowejojewohewouspuweoche njwpihbeh wjo erouhep hv reough er
fgvbhln;j;jblhcbwnoi wiejnocje ocweo cewoj coj weoj cowejojewohewouspuweoche njwpihbeh wjo erouhep hv reough er
fgvbhln;j;jblhcbwnoi wiejnocje ocweo cewoj coj weoj cowejojewohewouspuweoche njwpihbeh wjo erouhep hv reough er
fgvbhln;j;jblhcbwnoi wiejnocje ocweo cewoj coj weoj cowejojewohewouspuweoche njwpihbeh wjo erouhep hv reough er
fgvbhln;j;jblhcbwnoi wiejnocje ocweo cewoj coj weoj cowejojewohewouspuweoche njwpihbeh wjo erouhep hv reough er
fgvbhln;j;jblhcbwnoi wiejnocje ocweo cewoj coj weoj cowejojewohewouspuweoche njwpihbeh wjo erouhep hv reough er
fgvbhln;j;jblhcbwnoi wiejnocje ocweo cewoj coj weoj cowejojewohewouspuweoche njwpihbeh wjo erouhep hv reough er
fgvbhln;j;jblhcbwnoi wiejnocje ocweo cewoj coj weoj cowejojewohewouspuweoche njwpihbeh wjo erouhep hv reough er
fgvbhln;j;jblhcbwnoi wiejnocje ocweo cewoj coj weoj cowejojewohewouspuweoche njwpihbeh wjo erouhep hv reough er

cerr.fail():0 cout.fail():0

@uarif1
Copy link
Contributor Author

uarif1 commented Aug 17, 2022

Nominating for discussion. Do we want a) document that nonblocking stdio is unsupported b) add polling fallbacks. In the latter case the question is whether it should be added only for the stdio wrappers or for all FD wrappers.

A couple of points for considering between stdio fd only vs all fd:

  • Currently, we make an exception of retrying for ErrorKind::Interrupted for all fds.
  • ErrorKind::WouldBlock is just a suggestion from kernel to try again later, so maybe it should be for all FD wrappers, so that it doesnt panic when the kernel returns -EAGAIN?

@famz
Copy link

famz commented Aug 17, 2022

Whether or not sharing the same stdio is usually a sysadmin decision rather than program developers', e.g.:

(
/usr/bin/yes &
/usr/local/bin/yes &
some_other_program
) | grep "some interesting pattern"

In this case 3 subprocesses in the subshell share the same stdout.

If /usr/local/bin/yes was developed in Rust and used println, and some_other_program happens to set stdout to non-block, we will see a panic.

So, who is here to blame? The sysadmin who wrote this script? The developer who used Rust println!, or the developer of some_other_program who thought it's okay to change stdout to non-blocking? I don't think that's easy to judge, but if only println! didn't have that panic!, everybody is more or less happy.

@joshtriplett
Copy link
Member

I think we could make println do polling if it gets an EAGAIN, but I'm not sure we should. We shouldn't try to make write and similar Just Work (they should return an error that the user will have to handle). I think having stdout or stderr in non-blocking mode is an unusual configuration that we shouldn't be expected to transparently handle.

@famz
Copy link

famz commented Aug 17, 2022

Hi Josh! From your reply I assume it's not a bad idea if we make println! return errors that's ready for the user to handle, instead of panic? Then the user can define another macro that "retries" upon EAGAIN, println_all.

Or, such a wrapper can be a good addition to std too?

By then, why is it so important that we don't just call it println?

I understand changing behaviour of what's widely used is sometimes not desired, Hyrum's Law and all that. But in this specific case panic! upon something that is out of control is probably not a "relied on" feature.

Or am I wrong and we cannot do anything to stop println!() from panic in this "unsupported" case?

@uarif1
Copy link
Contributor Author

uarif1 commented Aug 17, 2022

But in this specific case panic! upon something that is out of control is probably not a "relied on" feature.

I agree with this, and I would say not only for stdio but maybe for all fds, If a write returns -EAGAIN, it seems like an extreme reaction to panic!, when kernel is requesting to try again.

@joshtriplett
Copy link
Member

From your reply I assume it's not a bad idea if we make println! return errors that's ready for the user to handle, instead of panic

That would be a backwards-incompatible change. If you want to handle errors, call writeln! instead.

Then the user can define another macro that "retries" upon EAGAIN, println_all.

I would suggest that having stdout/stderr in non-blocking mode is an unusual configuration that the user shouldn't do without expecting to handle it.

@uarif1 write/writeln don't panic; they return the error for the user to handle. println is defined to panic if write returns an error.

@joshtriplett
Copy link
Member

That said, a PR adding documentation to both the std::io module and to println/print/eprintln/eprint` (the latter being one sentence and a link to the former which could have a longer explanation) would be welcome.

@famz
Copy link

famz commented Aug 17, 2022

Well, this is not that hard to hit. Just start this in the background (cargo run &)

fn main() {
loop {
println!("Hello, world!");
}
}

Then start a tmux session in foreground.

Then the hello world program crashes.

One would think a relatively quiet app may be fine to run in the background where tmux runs in foreground, expecting a line or two outputs to slightly clutter the tmux display (easy to fix with Ctrl-L), but that is not the case here, the background rust program would just panic due to O_NONBLOCK.

@the8472
Copy link
Member

the8472 commented Aug 17, 2022

If tmux wants to use O_NONBLOCK and at the same time support other processes accessing the underlying TTY then it should reopen the stdio file descriptors so it has separate file descriptions which don't share the fcntl state.

Concidentally someone submitted a patch to qemu that would have protected the caller process from such confusion, but it looks like it wasn't applied.

GNU coreutils don't seem to support O_NONBLOCK either.

And we definitely can't modify read implementations because that would be a breaking change (callers might rely on WouldBlock being returned). Perhaps write_all or println could be changed, but that doesn't mean it's a good idea to do so. As explained earlier, something that sets O_NONBLOCK on a file descriptor should only do so in a situation where it can assume exclusive access to that descriptor. The panics you're seeing surfaces bugs where that assumption is violated.

expecting a line or two outputs to slightly clutter the tmux display (easy to fix with Ctrl-L)

Easy to fix is still slightly broken. So something is wrong and we shouldn't be silently papering over that issue. The panic you're seeing surfaces that error.
You can use writeln! instead and handle the error deliberately.

@uarif1
Copy link
Contributor Author

uarif1 commented Sep 4, 2022

I have created a PR with the documentation https://github.com/rust-lang/rust/pull/101416/commits

@piegamesde
Copy link
Contributor

I got bitten by this too, by running ssh and having .stderr(std::process::Stdio::inherit()) (it sets O_NONBLOCK on all its file descriptors, and they are shared because Unix).

Would it be a good idea to add a warning to Stdio::inherit() too, that the child process may modify the file descriptor?

@dtolnay dtolnay removed the I-libs-nominated Nominated for discussion during a libs team meeting. label Sep 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug. O-unix Operating system: Unix-like T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

6 participants