-
-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add asynchronous File IO #128
Comments
|
We've talked about |
I do not think libuv is the right approach for us. I think something along the lines of Erlang's "dirty schedulers" would be the correct approach. I reserve the right to change my opinion later. |
@SeanTAllen or @slfritchie could you elaborate on the concept of dirty schedulers? Would that basically mean, we flag behaviours based on whether they do blocking IO and depending on that we schedule them on a special scheduler pool? Advantage here would be, we could keep the blocking APIs synchronous, thus simple (e.g. like the current files API). Would that actually be the case? |
The Erlang BEAM VM scheduler differs from Pony's in a couple of significant ways: BEAM's is preemptive and BEAM's avoids using wall clock time (or any other traditional notion of time) when making preemption decisions. Preemption can be triggered by: a). reduction count (roughly equivalent to function call count), VM internal trap, or blocked message receive (mailbox is empty or selective receive pattern match fails on all queued messages). The addition of NIFs (native implemented functions), which are written in C but appear to the Erlang programmer to be Erlang, can cause a big problem with the reduction count method. Steve Vinoski was a primary author of the NIF scheme. In https://github.com/vinoski/bitwise/blob/master/vinoski-schedulers.pdf notes a problem with a NIF that implements an XOR function:
That causes all kinds of havoc with the schedulers. It's more "hilarious"(*) when schedulers start going to sleep due to mis-counting of reductions and then never bother waking up, despite huge demand to schedule runnable processes. Note also that performing I/O isn't necessary: anything that blocks a return of control to the scheduler is fair game, including XOR calculations on GBytes of data or simply calling Nowadays, a NIF can have metadata associated with it to mark it as "dirty". Execution of dirty NIFs are transferred over to a dedicated set of Pthreads, the dirty thread pool. There's a non-zero overhead for switching threads, naturally, but it's far better than angering the usual schedulers' way of doing things. With the Pony runtime's cooperative scheduling approach, I'm not aware of too many choices. One would be to always run an actor that might block the Pthread to run via a separate Pthread pool. Another is a message-passing approach: send a message to a dedicated thread or thread pool that executes the desired operation and then sends the result back. The latter is the method that Erlang's original file I/O subsystem operated, but I see no easy way to fit that scheme into Pony's runtime today without lots of other side-effects and consequences. @mfelsche's idea of using the separate pool only for behaviors that are "known" to do blocking stuff. I hadn't thought that of that, silly me. It's a nifty idea and probably deserves a lot more pondering. BEAM references for the curious:
(*) Where "hilarious" means "terrible things happen at weird times or the worst possible high-demand times". |
Leaving aside the "how do we know something will block". I think what we would want is...
|
I'm sure some of you have heard about the new asynchronous I/O interface in Linux 5.1, Here's a document that goes into detail about the new interface: http://kernel.dk/io_uring.pdf Under section
It sounds like in the future, the interface may support asynchronous network I/O as well. On Windows, IOCP exists for asynchronous I/O. |
This issue tries to spark the discussion around 1. the need for asynchronous file IO, 2. The possible implementations thereof and 3. The new look and feel of such an asynchronous file API for pony. The new asynchronous file IO could be added alongside the existing blocking file io apis.
Current File operations in Pony use standard POSIX file operations like write/writev, read etc. which are all possibly blocking. This means that on performing such an operation on a file, one scheduler thread will be blocked during that operation. This can be a great performance problem. This is the reason I am bringing this up.
This is the actually tricky part. Afaik ASIO which is used for all other networking, pipe, stdstream IO will not work on regular files. Winfows has some kind of asynchronous file IO which i know nothing about, if anyone could shed some light on this, that would be great. Posix offers the aio_* apis, basically offloading file IO to a separate threadpool in userland. This API, i think, is a good candidate due to cross-platform compatibility. Another one would be libuv which is completely cross platform and offers async name resolution as well. It does file io in a conceptually similar manner than the aio api such that it uses blocking file apis but executed them on a separate threadpool. It seems a bit overkill for the problem at hand and possibly it makes most sense to completely move all io operations to libuv instead of adding it alongside asio.
The text was updated successfully, but these errors were encountered: