Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement epoll_wait #2764

Merged
merged 1 commit into from
Feb 3, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 57 additions & 1 deletion src/shims/unix/linux/fd.rs
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@ use socketpair::SocketPair;

use shims::unix::fs::EvalContextExt as _;

use std::cell::Cell;

pub mod epoll;
pub mod event;
pub mod socketpair;
Expand Down Expand Up @@ -101,6 +103,60 @@ pub trait EvalContextExt<'mir, 'tcx: 'mir>: crate::MiriInterpCxExt<'mir, 'tcx> {
}
}

/// The `epoll_wait()` system call waits for events on the `Epoll`
/// instance referred to by the file descriptor `epfd`. The buffer
/// pointed to by `events` is used to return information from the ready
/// list about file descriptors in the interest list that have some
/// events available. Up to `maxevents` are returned by `epoll_wait()`.
/// The `maxevents` argument must be greater than zero.

/// The `timeout` argument specifies the number of milliseconds that
/// `epoll_wait()` will block. Time is measured against the
/// CLOCK_MONOTONIC clock.

/// A call to `epoll_wait()` will block until either:
/// • a file descriptor delivers an event;
/// • the call is interrupted by a signal handler; or
/// • the timeout expires.

/// Note that the timeout interval will be rounded up to the system
/// clock granularity, and kernel scheduling delays mean that the
/// blocking interval may overrun by a small amount. Specifying a
/// timeout of -1 causes `epoll_wait()` to block indefinitely, while
/// specifying a timeout equal to zero cause `epoll_wait()` to return
/// immediately, even if no events are available.
///
/// On success, `epoll_wait()` returns the number of file descriptors
/// ready for the requested I/O, or zero if no file descriptor became
/// ready during the requested timeout milliseconds. On failure,
/// `epoll_wait()` returns -1 and errno is set to indicate the error.
///
/// <https://man7.org/linux/man-pages/man2/epoll_wait.2.html>
Copy link
Member

@RalfJung RalfJung Feb 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our implementation is very, very far from this documentation, right? How's that not a huge problem? If a random program calls epoll_wait (keep in mind that tokio is not the only thing that can call this function), then one of two things must happen:

  • either it behaves in some reasonable way
  • or it stops execution (ideally via throw_unsupported, or an ICE if need be)

But here we just silently ignore the timeout and a bunch of other things, so execution will just silently go wrong, right? That's not something we want Miri to ever do.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is all an ongoing implementation where each new test fleshes out the implementation details. I don't think it would be good to require testing and implementing the entire thing in one go, as that PR would become unreviewable.

We could hide all of it behind a -Zmiri-experimental-epoll to make it clear this is unfinished

Copy link
Member

@RalfJung RalfJung Feb 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is all an ongoing implementation where each new test fleshes out the implementation details. I don't think it would be good to require testing and implementing the entire thing in one go, as that PR would become unreviewable.

That's not what I asked for.

I think ideally, the supported part of the function is explicitly carved out, by aggressively doing throw_unsup for any parameter combination that is potentially not implemented correctly yet. That doesn't mean implementing the entire function, not at all! It could mean doing throw_unsup on literally every input, for instance. That is how we have worked so far for other shims and I think it has worked well. Wouldn't that work here as well?

I find this helps quite a bit with reviewing, since then one can immediately see which things the function intends to support, and can check if those make sense. Right now that seems to be a secret between you and @DebugSteven, and it's not even documented anywhere what the current status is. That is not great. It should be clear from the code where the gaps are, and which things are expected to work correctly. Right now I look at this and I can't see how this works correctly for any input (given that it doesn't fill the events buffer), but I can't even tell if that is a bug since I don't know for which inputs it is supposed to work.

We could hide all of it behind a -Zmiri-experimental-epoll to make it clear this is unfinished

That would be a backup plan, but from my perspective it would mean that when reviewing the PR which removes this flag, I do have to review the entire functionality of all gated shims at once, since incremental review of the previous PRs is not possible. I would prefer to avoid that.

fn epoll_wait(
&mut self,
epfd: &OpTy<'tcx, Provenance>,
events: &OpTy<'tcx, Provenance>,
maxevents: &OpTy<'tcx, Provenance>,
timeout: &OpTy<'tcx, Provenance>,
) -> InterpResult<'tcx, Scalar<Provenance>> {
let this = self.eval_context_mut();

let epfd = this.read_scalar(epfd)?.to_i32()?;
let _events = this.read_scalar(events)?.to_pointer(this)?;
let _maxevents = this.read_scalar(maxevents)?.to_i32()?;
let _timeout = this.read_scalar(timeout)?.to_i32()?;

let numevents = 0;
if let Some(epfd) = this.machine.file_handler.handles.get_mut(&epfd) {
let _epfd = epfd.as_epoll_handle()?;

// FIXME return number of events ready when scheme for marking events ready exists
Ok(Scalar::from_i32(numevents))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this tell the caller that numevents events are ready without actually initializing the buffer? That sounds pretty bad, the user will then see UB when trying to access the info in the buffer.

} else {
Ok(Scalar::from_i32(this.handle_not_found()?))
}
}

/// This function creates an `Event` that is used as an event wait/notify mechanism by
/// user-space applications, and by the kernel to notify user-space applications of events.
/// The `Event` contains an `u64` counter maintained by the kernel. The counter is initialized
Expand Down Expand Up @@ -142,7 +198,7 @@ pub trait EvalContextExt<'mir, 'tcx: 'mir>: crate::MiriInterpCxExt<'mir, 'tcx> {
}

let fh = &mut this.machine.file_handler;
let fd = fh.insert_fd(Box::new(Event { val }));
let fd = fh.insert_fd(Box::new(Event { val: Cell::new(val.into()) }));
Ok(Scalar::from_i32(fd))
}

Expand Down
35 changes: 33 additions & 2 deletions src/shims/unix/linux/fd/event.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ use crate::shims::unix::fs::FileDescriptor;

use rustc_const_eval::interpret::InterpResult;

use std::cell::Cell;
use std::io;

/// A kind of file descriptor created by `eventfd`.
Expand All @@ -13,7 +14,9 @@ use std::io;
/// <https://man.netbsd.org/eventfd.2>
#[derive(Debug)]
pub struct Event {
pub val: u32,
/// The object contains an unsigned 64-bit integer (uint64_t) counter that is maintained by the
/// kernel. This counter is initialized with the value specified in the argument initval.
pub val: Cell<u64>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But what does the value mean? This documentation is rather mysterious.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original POSIX documentation is not very clear either ^^ the value can mean all kinds of things depending on the usage pattern. All that the storage should know is that it's a 64 bit int

Copy link
Member

@RalfJung RalfJung Feb 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be good to have an example then for how the Miri-implemented shims use this. POSIX docs being bad is no excuse for our own docs being bad. ;)

}

impl FileDescriptor for Event {
Expand All @@ -22,7 +25,7 @@ impl FileDescriptor for Event {
}

fn dup(&mut self) -> io::Result<Box<dyn FileDescriptor>> {
Ok(Box::new(Event { val: self.val }))
Ok(Box::new(Event { val: self.val.clone() }))
}

fn is_tty(&self) -> bool {
Expand All @@ -35,4 +38,32 @@ impl FileDescriptor for Event {
) -> InterpResult<'tcx, io::Result<i32>> {
Ok(Ok(0))
}

/// A write call adds the 8-byte integer value supplied in
/// its buffer to the counter. The maximum value that may be
/// stored in the counter is the largest unsigned 64-bit value
/// minus 1 (i.e., 0xfffffffffffffffe). If the addition would
/// cause the counter's value to exceed the maximum, then the
/// write either blocks until a read is performed on the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mention of reads is confusing. Reads don't seem to do anything with the counter?

/// file descriptor, or fails with the error EAGAIN if the
/// file descriptor has been made nonblocking.

/// A write fails with the error EINVAL if the size of the
/// supplied buffer is less than 8 bytes, or if an attempt is
/// made to write the value 0xffffffffffffffff.
///
/// FIXME: use endianness
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't land code that just silently does the wrong thing on some targets. At the very least it must ICE. Nothing is worse than if Miri pretends to be able to execute something but then executes it incorrectly. That must never happen.

fn write<'tcx>(
&self,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to change write to take &mut self and then avoid the Cell?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, we looked into this, and the issue is that preexisting code needs to write access memory while using the file descriptors. It would be possible, but it would make all code using writes very roundabout. Also I think other write impls already use interior mutability.

I can open a PR if you'd like to see the workarounds needed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I think other write impls already use interior mutability.

I don't recall that.

But if you tried the alternative and considered it worse than this, that's enough of an argument. :) Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So... one write impl that "accidentally" uses interior mutability is our actual File handles. Write is implemented for &File.

The reason we can't easily switch to &mut is this piece of code:

https://github.com/rust-lang/miri/blob/master/src/shims/unix/fs.rs#L814-L819

We could handle all of this correctly, but it would require cloning somewhere, as we can't hold the mutable reference to the file descriptor while also reading from memory.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we're doing something rather clever there, directly copying from the machine memory to the syscall.

_communicate_allowed: bool,
bytes: &[u8],
) -> InterpResult<'tcx, io::Result<usize>> {
let v1 = self.val.get();
// FIXME handle blocking when addition results in exceeding the max u64 value
// or fail with EAGAIN if the file descriptor is nonblocking.
let v2 = v1.checked_add(u64::from_be_bytes(bytes.try_into().unwrap())).unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leave a FIXME to handle these error cases to behave as described in the doc comment

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI this case is hit by the test suites of a handful of crates. So far I know of aqueue 1.2.10, async-io-typed 3.0.0, terminus-store 0.20.0, quic-rpc 0.3.2, and snocat 0.8.0-alpha.1.

self.val.set(v2);
assert_eq!(8, bytes.len());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be checked before converting the value to an integer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unwrap two lines above already does this check, we can remove the assert

Ok(Ok(8))
}
}
6 changes: 6 additions & 0 deletions src/shims/unix/linux/foreign_items.rs
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,12 @@ pub trait EvalContextExt<'mir, 'tcx: 'mir>: crate::MiriInterpCxExt<'mir, 'tcx> {
let result = this.epoll_ctl(epfd, op, fd, event)?;
this.write_scalar(result, dest)?;
}
"epoll_wait" => {
let [epfd, events, maxevents, timeout] =
this.check_shim(abi, Abi::C { unwind: false }, link_name, args)?;
let result = this.epoll_wait(epfd, events, maxevents, timeout)?;
this.write_scalar(result, dest)?;
}
"eventfd" => {
let [val, flag] =
this.check_shim(abi, Abi::C { unwind: false }, link_name, args)?;
Expand Down
14 changes: 14 additions & 0 deletions tests/pass-dep/tokio/sleep.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
//@compile-flags: -Zmiri-disable-isolation -Zmiri-permissive-provenance -Zmiri-backtrace=full
//@only-target-x86_64-unknown-linux: support for tokio only on linux and x86

use tokio::time::{sleep, Duration, Instant};

#[tokio::main]
async fn main() {
let start = Instant::now();
sleep(Duration::from_secs(1)).await;
// It takes 96 millisecond to sleep for 1 millisecond
// It takes 1025 millisecond to sleep for 1 second
let time_elapsed = &start.elapsed().as_millis();
assert!(time_elapsed > &1000, "{}", time_elapsed);
}
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
// Need to disable preemption to stay on the supported MVP codepath in mio.
//@compile-flags: -Zmiri-disable-isolation -Zmiri-permissive-provenance -Zmiri-preemption-rate=0
//@compile-flags: -Zmiri-disable-isolation -Zmiri-permissive-provenance
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing the -Zmiri-preemption-rate=0 here was premature, this is causing issues again. I'll add the flag back.

//@only-target-x86_64-unknown-linux: support for tokio exists only on linux and x86

#[tokio::main]
Expand Down