Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for an MVP of mmap #304

Open
alexcrichton opened this issue Jul 29, 2020 · 8 comments
Open

Support for an MVP of mmap #304

alexcrichton opened this issue Jul 29, 2020 · 8 comments
Labels
feature-request Requests for new WASI APIs

Comments

@alexcrichton
Copy link
Contributor

I didn't see a previously filed issue about this so I wanted to make sure that one was filed here. In the applications I've seen mmap (or the equivalent for Windows) is a pretty common operation and would be nice to support in WASI. I believe, though, that mmap covers quite a large range of functionality so this initially is a very large feature request, so I'd like to winnow it down a bit more to see if others feel like it might be possible to add to WASI.

I think a pretty useful (but still minimal) implementation of mmap might be to allow to map files into memory as read-write, but not allowing changes to get persisted back to disk. This encompasses a pretty common usage of mmap, used to read files (e.g. text searching, parsing, etc), where the contents need to be readable but not writable.

One important thing to consider with mmap I think as well is the compatibility or ability to implement this on the web. The web (and wasm in general) don't have a way of making a region of memory read-only, for example, so I don't think it's possible to expose the full power of mmap.

A possible proposal for this would be:

(module $wasi_ephemeral_fd
  ;;; Returns the initial current directory of the application
  (@interface func (export "mmap")
    (param $fd $fd)
    (param $map_base (@witx pointer char8))
    (param $map_len $size)
    (result $error $errno)
  )
)

This would mmap a file descriptor into the process address space at the (required) $map_base and $map_len parameters. This is different from the mmap syscall found on Unix in a number of ways:

  • This requires a file descriptor rather than allowing an optional -1 file descriptor. This is because memory allocation in wasm happens through memory.grow, not through this API.
  • A base/len must be provided, requiring the calling application to manage where (in the address space) to put the mapped file. This allows memory regions to be recycled (since wasm has no way to shrink memory) and places the burden of location an allocation on the caller.
  • Features like readonly, read/write, read/execute, etc, are not supported. It may be useful to have a flags parameter here at some point but the intention is that this is only pulling a file into the address space and nothing else. Writes to the memory region will not be persisted back to disk (where that would be quite difficult to do on the web).

The goal here is that engines (at least out-of-browser ones) could implement this with OS-level mmap calls. With proper alignment requirements on $map_base and $map_len this could translate to a host-level mmap to actually map the contents of a file into a wasm address space natively. This should bring all the expected performance wins of mmap to a wasm guest as well.

I'm curious about what others think of this strawman, or if others know where other discussions have happened where this could be linked to as well.

@programmerjake
Copy link
Contributor

It might be a good idea to add an argument for the offset in the file where mapping should start from -- that's supported on POSIX and Win32. The offset will probably need to be page-aligned. This will also allow reading from files that are bigger than 4G.

@sunfishcode
Copy link
Member

As an aside, we do have a simple emulated mmap in wasi-libc, that does malloc + read to copy the file contents into memory, so the main advantage here would be to let implementations which can optimize this do so.

A potentially tricky issue is what happens to the map if a file is truncated or deleted while mapped. In POSIX, accessing the memory gets a SIGBUS. Implementations that lack mmap and are emulating it won't always have a way to get notified when this happens, which suggests that the behavior should be that the data just persists in memory, however implementations that have an mmap will only get the SIGBUS once it's too late to copy the data into memory so it can persist.

@alexcrichton
Copy link
Contributor Author

Ah that's a good point that this'd need an offset as well, and seems reasonable to provide! Also thanks for pointing out the emulation, I had actually missed that but see the way to get it now (-D_WASI_EMULATED_MMAN is required as a compilation flag).

It may be reasonabley to say that in a situation like where the file is concurrently deleted that it's somewhat unspecified what happens, where it may no be the most portable of behaviors. Engines could presumably catch the SIGBUS and fill in at least zeros themselves so wasm doesn't trap, but beyond that there's not a whole lot we can do if this API is added?

@sunfishcode
Copy link
Member

Yeah -D_WASI_EMULATED_MMAN is an attempt to balance between users who want to know they're doing something that doesn't exactly follow POSIX and that WASI doesn't really support, and users that just want as much as possible to work.

Silently filling in zeros is a little risky, because then applications could be fooled into thinking they read the actual file contents when they didn't. For example, a serialized NUL-terminated string could be truncated if the file is deleted while the application is in the middle of reading it, and a naive application wouldn't suspect anything was wrong. I don't yet have an opinion about whether this is a show-stopper; I'm just bringing it up to be considered.

@jaykrell
Copy link

jaykrell commented Mar 4, 2021

The I/O can also fail. i.e. if the network goes down.
mmap is great and horrible at the same time.
I am curious. If you do this via host mmap (instead of malloc + read), would you map it into the wasm memory space? Therefore gain bounds checking?
(I really cannot wait for wasm64.)
Is there any provision now or forthcoming to surface the SIGBUS into the wasm guest?
(On Windows host it is a structured exception, which would also be nice, if we could program against that in WebAssembly, perhaps elevating it to portable!)

@sunfishcode
Copy link
Member

In theory, yes, an mmap interface like the above could be mapped directly into the wasm linear memory. I'm not aware of anyone working on a design for exposing SIGBUS to wasm yet. It could be done as a kind of I/O that wasm code could poll for, however a complication with that approach is that it wouldn't be directly compatible with existing POSIX applications that either expect to get a SIGBUS or that don't anticipate failure and are saved by the SIGBUS in practice.

@ncruces
Copy link

ncruces commented Apr 16, 2024

Just because people might find it interesting, I had a requirement to map files read-write to guest linear memory, for my Go bindings to SQLite using wazero.

This was achieved, and is working on Linux/macOS by using a custom linear memory allocator, then doing an aligned_alloc with wasi-libc (to hide some memory from malloc) and mmapping the file into that memory.

@affinage-digital
Copy link

Is there a solution for Windows and C++?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request Requests for new WASI APIs
Projects
None yet
Development

No branches or pull requests

6 participants