-
Notifications
You must be signed in to change notification settings - Fork 357
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement epoll_wait #2764
Implement epoll_wait #2764
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -7,6 +7,8 @@ use socketpair::SocketPair; | |
|
||
use shims::unix::fs::EvalContextExt as _; | ||
|
||
use std::cell::Cell; | ||
|
||
pub mod epoll; | ||
pub mod event; | ||
pub mod socketpair; | ||
|
@@ -101,6 +103,60 @@ pub trait EvalContextExt<'mir, 'tcx: 'mir>: crate::MiriInterpCxExt<'mir, 'tcx> { | |
} | ||
} | ||
|
||
/// The `epoll_wait()` system call waits for events on the `Epoll` | ||
/// instance referred to by the file descriptor `epfd`. The buffer | ||
/// pointed to by `events` is used to return information from the ready | ||
/// list about file descriptors in the interest list that have some | ||
/// events available. Up to `maxevents` are returned by `epoll_wait()`. | ||
/// The `maxevents` argument must be greater than zero. | ||
|
||
/// The `timeout` argument specifies the number of milliseconds that | ||
/// `epoll_wait()` will block. Time is measured against the | ||
/// CLOCK_MONOTONIC clock. | ||
|
||
/// A call to `epoll_wait()` will block until either: | ||
/// • a file descriptor delivers an event; | ||
/// • the call is interrupted by a signal handler; or | ||
/// • the timeout expires. | ||
|
||
/// Note that the timeout interval will be rounded up to the system | ||
/// clock granularity, and kernel scheduling delays mean that the | ||
/// blocking interval may overrun by a small amount. Specifying a | ||
/// timeout of -1 causes `epoll_wait()` to block indefinitely, while | ||
/// specifying a timeout equal to zero cause `epoll_wait()` to return | ||
/// immediately, even if no events are available. | ||
/// | ||
/// On success, `epoll_wait()` returns the number of file descriptors | ||
/// ready for the requested I/O, or zero if no file descriptor became | ||
/// ready during the requested timeout milliseconds. On failure, | ||
/// `epoll_wait()` returns -1 and errno is set to indicate the error. | ||
/// | ||
/// <https://man7.org/linux/man-pages/man2/epoll_wait.2.html> | ||
fn epoll_wait( | ||
&mut self, | ||
epfd: &OpTy<'tcx, Provenance>, | ||
events: &OpTy<'tcx, Provenance>, | ||
maxevents: &OpTy<'tcx, Provenance>, | ||
timeout: &OpTy<'tcx, Provenance>, | ||
) -> InterpResult<'tcx, Scalar<Provenance>> { | ||
let this = self.eval_context_mut(); | ||
|
||
let epfd = this.read_scalar(epfd)?.to_i32()?; | ||
let _events = this.read_scalar(events)?.to_pointer(this)?; | ||
let _maxevents = this.read_scalar(maxevents)?.to_i32()?; | ||
let _timeout = this.read_scalar(timeout)?.to_i32()?; | ||
|
||
let numevents = 0; | ||
if let Some(epfd) = this.machine.file_handler.handles.get_mut(&epfd) { | ||
let _epfd = epfd.as_epoll_handle()?; | ||
|
||
// FIXME return number of events ready when scheme for marking events ready exists | ||
Ok(Scalar::from_i32(numevents)) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Doesn't this tell the caller that |
||
} else { | ||
Ok(Scalar::from_i32(this.handle_not_found()?)) | ||
} | ||
} | ||
|
||
/// This function creates an `Event` that is used as an event wait/notify mechanism by | ||
/// user-space applications, and by the kernel to notify user-space applications of events. | ||
/// The `Event` contains an `u64` counter maintained by the kernel. The counter is initialized | ||
|
@@ -142,7 +198,7 @@ pub trait EvalContextExt<'mir, 'tcx: 'mir>: crate::MiriInterpCxExt<'mir, 'tcx> { | |
} | ||
|
||
let fh = &mut this.machine.file_handler; | ||
let fd = fh.insert_fd(Box::new(Event { val })); | ||
let fd = fh.insert_fd(Box::new(Event { val: Cell::new(val.into()) })); | ||
Ok(Scalar::from_i32(fd)) | ||
} | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,6 +2,7 @@ use crate::shims::unix::fs::FileDescriptor; | |
|
||
use rustc_const_eval::interpret::InterpResult; | ||
|
||
use std::cell::Cell; | ||
use std::io; | ||
|
||
/// A kind of file descriptor created by `eventfd`. | ||
|
@@ -13,7 +14,9 @@ use std::io; | |
/// <https://man.netbsd.org/eventfd.2> | ||
#[derive(Debug)] | ||
pub struct Event { | ||
pub val: u32, | ||
/// The object contains an unsigned 64-bit integer (uint64_t) counter that is maintained by the | ||
/// kernel. This counter is initialized with the value specified in the argument initval. | ||
pub val: Cell<u64>, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But what does the value mean? This documentation is rather mysterious. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The original POSIX documentation is not very clear either ^^ the value can mean all kinds of things depending on the usage pattern. All that the storage should know is that it's a 64 bit int There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Might be good to have an example then for how the Miri-implemented shims use this. POSIX docs being bad is no excuse for our own docs being bad. ;) |
||
} | ||
|
||
impl FileDescriptor for Event { | ||
|
@@ -22,7 +25,7 @@ impl FileDescriptor for Event { | |
} | ||
|
||
fn dup(&mut self) -> io::Result<Box<dyn FileDescriptor>> { | ||
Ok(Box::new(Event { val: self.val })) | ||
Ok(Box::new(Event { val: self.val.clone() })) | ||
} | ||
|
||
fn is_tty(&self) -> bool { | ||
|
@@ -35,4 +38,32 @@ impl FileDescriptor for Event { | |
) -> InterpResult<'tcx, io::Result<i32>> { | ||
Ok(Ok(0)) | ||
} | ||
|
||
/// A write call adds the 8-byte integer value supplied in | ||
/// its buffer to the counter. The maximum value that may be | ||
/// stored in the counter is the largest unsigned 64-bit value | ||
/// minus 1 (i.e., 0xfffffffffffffffe). If the addition would | ||
/// cause the counter's value to exceed the maximum, then the | ||
/// write either blocks until a read is performed on the | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This mention of reads is confusing. Reads don't seem to do anything with the counter? |
||
/// file descriptor, or fails with the error EAGAIN if the | ||
/// file descriptor has been made nonblocking. | ||
|
||
/// A write fails with the error EINVAL if the size of the | ||
/// supplied buffer is less than 8 bytes, or if an attempt is | ||
/// made to write the value 0xffffffffffffffff. | ||
/// | ||
/// FIXME: use endianness | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please don't land code that just silently does the wrong thing on some targets. At the very least it must ICE. Nothing is worse than if Miri pretends to be able to execute something but then executes it incorrectly. That must never happen. |
||
fn write<'tcx>( | ||
&self, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it possible to change There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nope, we looked into this, and the issue is that preexisting code needs to write access memory while using the file descriptors. It would be possible, but it would make all code using writes very roundabout. Also I think other write impls already use interior mutability. I can open a PR if you'd like to see the workarounds needed There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I don't recall that. But if you tried the alternative and considered it worse than this, that's enough of an argument. :) Thanks! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So... one write impl that "accidentally" uses interior mutability is our actual The reason we can't easily switch to https://github.com/rust-lang/miri/blob/master/src/shims/unix/fs.rs#L814-L819 We could handle all of this correctly, but it would require cloning somewhere, as we can't hold the mutable reference to the file descriptor while also reading from memory. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah we're doing something rather clever there, directly copying from the machine memory to the syscall. |
||
_communicate_allowed: bool, | ||
bytes: &[u8], | ||
) -> InterpResult<'tcx, io::Result<usize>> { | ||
let v1 = self.val.get(); | ||
// FIXME handle blocking when addition results in exceeding the max u64 value | ||
// or fail with EAGAIN if the file descriptor is nonblocking. | ||
let v2 = v1.checked_add(u64::from_be_bytes(bytes.try_into().unwrap())).unwrap(); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Leave a FIXME to handle these error cases to behave as described in the doc comment There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. FYI this case is hit by the test suites of a handful of crates. So far I know of aqueue 1.2.10, async-io-typed 3.0.0, terminus-store 0.20.0, quic-rpc 0.3.2, and snocat 0.8.0-alpha.1. |
||
self.val.set(v2); | ||
assert_eq!(8, bytes.len()); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This should be checked before converting the value to an integer. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The unwrap two lines above already does this check, we can remove the assert |
||
Ok(Ok(8)) | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
//@compile-flags: -Zmiri-disable-isolation -Zmiri-permissive-provenance -Zmiri-backtrace=full | ||
//@only-target-x86_64-unknown-linux: support for tokio only on linux and x86 | ||
|
||
use tokio::time::{sleep, Duration, Instant}; | ||
|
||
#[tokio::main] | ||
async fn main() { | ||
let start = Instant::now(); | ||
sleep(Duration::from_secs(1)).await; | ||
// It takes 96 millisecond to sleep for 1 millisecond | ||
// It takes 1025 millisecond to sleep for 1 second | ||
let time_elapsed = &start.elapsed().as_millis(); | ||
assert!(time_elapsed > &1000, "{}", time_elapsed); | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,5 @@ | ||
// Need to disable preemption to stay on the supported MVP codepath in mio. | ||
//@compile-flags: -Zmiri-disable-isolation -Zmiri-permissive-provenance -Zmiri-preemption-rate=0 | ||
//@compile-flags: -Zmiri-disable-isolation -Zmiri-permissive-provenance | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Removing the |
||
//@only-target-x86_64-unknown-linux: support for tokio exists only on linux and x86 | ||
|
||
#[tokio::main] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our implementation is very, very far from this documentation, right? How's that not a huge problem? If a random program calls epoll_wait (keep in mind that tokio is not the only thing that can call this function), then one of two things must happen:
But here we just silently ignore the timeout and a bunch of other things, so execution will just silently go wrong, right? That's not something we want Miri to ever do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is all an ongoing implementation where each new test fleshes out the implementation details. I don't think it would be good to require testing and implementing the entire thing in one go, as that PR would become unreviewable.
We could hide all of it behind a
-Zmiri-experimental-epoll
to make it clear this is unfinishedThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not what I asked for.
I think ideally, the supported part of the function is explicitly carved out, by aggressively doing throw_unsup for any parameter combination that is potentially not implemented correctly yet. That doesn't mean implementing the entire function, not at all! It could mean doing throw_unsup on literally every input, for instance. That is how we have worked so far for other shims and I think it has worked well. Wouldn't that work here as well?
I find this helps quite a bit with reviewing, since then one can immediately see which things the function intends to support, and can check if those make sense. Right now that seems to be a secret between you and @DebugSteven, and it's not even documented anywhere what the current status is. That is not great. It should be clear from the code where the gaps are, and which things are expected to work correctly. Right now I look at this and I can't see how this works correctly for any input (given that it doesn't fill the events buffer), but I can't even tell if that is a bug since I don't know for which inputs it is supposed to work.
That would be a backup plan, but from my perspective it would mean that when reviewing the PR which removes this flag, I do have to review the entire functionality of all gated shims at once, since incremental review of the previous PRs is not possible. I would prefer to avoid that.