-
Notifications
You must be signed in to change notification settings - Fork 57
Allocator function considerations #25
Comments
This is an important consideration, and I agree with your observation. As a general rule, no access to a module should ever be defined in terms of indices. Indices are internal names inside a module. They are an implementation detail with no meaning outside. Furthermore, they cannot be expected to be stable, e.g., when a module is recompiled by a producer or extended. In a similar vein, no assumptions should be made about the availability of internal entities that are not exported. The Wasm module system provides proper encapsulation, and an engine is free to take advantage of this property as it sees fit, including completely optimising away internal entities from a module instance. Thus, all module access must only happen through export names. These principles should be followed by a bindings specification as well. Otherwise, as you observe, it would require magic abilities that can neither be defined nor implemented by reduction to regular client code -- but I would expect that webidl-bindings are merely a convenience mechanism to generate code that you could equivalently write by hand. The easiest fix would be replacing the alloc-func-idx in the current explainer with an export name. |
Good points; "can I write a polyfill that doesn't need to rewrite the To wit, beyond just functions, with GC types and type imports, binding operators (e.g., The original problem I saw with requiring the allocator function to be exported is that it forces the allocator function into the "public" (post-binding) interface of the module, which also seems to break encapsulation. As a new idea: what do you think about allowing bindings to "hide" an export (simply removing it from the actual exports)? |
I think the route around this conundrum is clear if you accept one condition: that managed memory has already been implemented. In that scenario one would simply not expose any kind of allocator. (Personally, I do not much like the idea of hidden exports) |
The thing about forcing the allocator to be public post-binding is, 1) morally how different is it to be visible to just the embedder, vs visible to everyone? and 2) even if you don't export your allocator you do need to share your memory (as in import or export, not necessarily threaded shared memory) to actually write the incoming value to the allocated buffer. The other suggestion I'd had in mind was to say "we could punt on this and not offer these for now", but a module author / toolchain can always implement that policy by not using any of these bindings, if they care more about hiding their memory/allocator. As a single data point, Emscripten today imports its memory buffers from JS, and exports malloc to be called from JS library code. I'm ambivalent about the idea of hidden exports. It's polyfillable (delete the exported property after grabbing a handle to it in the wrapping code), and it limits the scope of the allocator in a non-magic sort of way. But it doesn't let us encapsulate memory, which is probably even more important. Are there benefits to having hidden allocator but public memory that I'm missing? But yeah, export-by-name sounds like the way to go here. |
exposing an allocator can lead to all kinds of trouble. In a GCed language
especially there are often constraints on when allocation can take place.
What would happen if a particular API call required multiple allocations?
…On Wed, Apr 17, 2019 at 10:48 AM Jacob Gravelle ***@***.***> wrote:
The thing about forcing the allocator to be public post-binding is, 1)
morally how different is it to be visible to just the embedder, vs visible
to everyone? and 2) even if you don't export your allocator you do need to
share your memory (as in import or export, not necessarily threaded shared
memory) to actually write the incoming value to the allocated buffer.
The other suggestion I'd had in mind was to say "we could punt on this and
not offer these for now", but a module author / toolchain can always
implement that policy by not using any of these bindings, if they care more
about hiding their memory/allocator. As a single data point, Emscripten
today imports its memory buffers from JS, and exports malloc to be called
from JS library code.
I believe managed memory, specifically memory slices, allows for not
sharing memory or allocators. I think we'd want to allow for both types of
bindings.
I'm ambivalent about the idea of hidden exports. It's polyfillable (delete
the exported property after grabbing a handle to it in the wrapping code),
and it limits the scope of the allocator in a non-magic sort of way. But it
doesn't let us encapsulate memory, which is probably even more important.
Are there benefits to having hidden allocator but public memory that I'm
missing?
But yeah, export-by-name sounds like the way to go here.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#25 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ACAL0Fkshp_SRzjegaujkLetf0uD-Nymks5vh16FgaJpZM4c0Cnf>
.
--
Francis McCabe
SWE
|
Missed this bit Do they though? My understanding was that the Though looking more closely, the type section today is only for functypes, meaning either A) we'll need to rethink what a wasm-type immediate looks like in the presence of user-defined types, or B) only allow inline, structural type definitions, preventing us from using nominal typing. |
FWIW, Are we concerned about the code size footprint export string names have over indices? |
@fgmccabe but surely exposing the entire contents of one's memory can lead to a strict superset of those kinds of trouble? Regardless of how it allocates the buffer, even if we invert the control or even preallocate space, the binding needs to write the resulting bytes into the buffer. This means the buffer needs to be writable externally, which means the outside world can break any of those constraints arbitrarily at any time. @fitzgen In principle we can dedupe these locally, by having a string table subsection in the bindings section. We should have only a single allocator function, so having to repeat |
@jgravelle-google Actually yeah, I suppose if one is trying to present a minimal/idiomatic interface to JS from wasm (in which one would want to hide the allocator), one could always wrap the wasm module with a JS module that simply re-exported everything except the allocator function. Because of the magic of how ESM exports work, the re-exports will be the actual (Web IDL-bound) wasm exported function and thus the wrapper would have no call overhead. So maybe that's good enough and I'll retract the "export hiding" idea. |
Personally, I do happen to view exposing the memory as pretty unfortunate
too. However, the situation I am referring to is that a language has found
a way of managing its linear memory with a gc-based model. Such languages
would not/should not expose their allocator. Even if the 'outside party'
has access to all of memory,
it still must decide which memory to 'infect'; and to communicate that to
the wasm module. You simply cannot do that reliably for all memory
management schemes. (The embedder should not have to know how the wasm
module manages its memory.)
…On Wed, Apr 17, 2019 at 11:11 AM Jacob Gravelle ***@***.***> wrote:
@fgmccabe <https://github.com/fgmccabe> but surely exposing the entire
contents of one's memory can lead to a strict superset of those kinds of
trouble? Regardless of how it allocates the buffer, even if we invert the
control or even preallocate space, the binding needs to write the resulting
bytes into the buffer. This means the buffer needs to be writable
externally, which means the outside world can break any of those
constraints arbitrarily at any time.
Now full managed memory allows the external world to return a result as a
memory slice, which makes the allocation and writing orthogonal to what's
happening internal to the module, but absent that I don't see how having
allocator+memory exported is worse than just memory exported.
@fitzgen <https://github.com/fitzgen> In principle we can dedupe these
locally, by having a string table subsection in the bindings section. We
should have only a single allocator function, so having to repeat
"my_awesome_allocator" N times is pretty obviously wasteful.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#25 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ACAL0ORwwUHnS10iCwVQluqWXZH2_H4Zks5vh2PHgaJpZM4c0Cnf>
.
--
Francis McCabe
SWE
|
I too feel uncomfortable with exposing an allocator, regardless of how it's done. I think we end up with an unfortunate dependency on the memory for doing basic things. For managed languages implemented on top of Wasm, almost certainly the first thing they would do is allocate their own (perhaps internal) string object and copy the bytes into that. It seems cleaner and more future compatible to have such APIs continue to return typed, managed, opaque strings (either anyref or an imported This also has the advantage that in a hypothetical "all managed" world, memory and memory allocation would not be necessary. |
So there is already (in the explainer, using the The only reason to even consider the |
After some offline discussion, the thing that got me to think we should avoid specifying allocators at all in the MVP version of WebIDL bindings is the problem of failure. What do we do in the event of a failed allocation? How does the wasm allocator function indicate to the host that it has failed? What should the host do with that? What does the host return to the wasm call? A way to avoid that is to trap on failure to allocate. Does that mean we should depend on the exception handling proposal? Should those traps be catchable? The part where it really really falls over is in the presence of compound types. Say we want to return an array of 10 strings, and we fail to allocate the 4th string. How can we free the allocated strings? In what order? Etc. I don't think most of those have obvious answers, so I think deferring until after we launch an initial version of webidl bindings is really reasonable here. |
I'm very sympathetic to carving out small kernels of functionality we can ship sooner rather than later. But I'd argue that avoiding JS string allocations is a fundamental goal of the webidl bindings, and that it is a must-have and not a nice-to-have to be added later. That is, an "MVP" implementation of webidl bindings that doesn't include support for passing and receiving strings would not satisfy the definition of MVP since it would not prove that it can achieve its main goal. The very first sentence of the explainer lays out this goal:
I've been doing quite a bit of Wasm-and-DOM-interaction profiling recently, and even with interning and caching common strings sent from Wasm to JS, I'm seeing overheads on the order of 10% of total time dedicated just to creating JS strings from Wasm that are then passed directly into DOM methods. I know many others have seen similar overheads in their applications as well.
We could use But I think trapping is simpler, and is good enough to satisfy an MVP.
Not in the initial MVP. Once exception handling ships and a webidl bindings MVP ships, then lets add catchability.
FWIW, freeing them in reverse order of allocation would work out nicely for LIFO allocators. But I think maybe that strings in compound types could be deferred as a post-MVP feature so long as sending and receiving bare strings was supported in the MVP. |
Another realization: to implement a polyfill, the memory also needs to be exported, not just the allocator function. (I think we've not noticed this since lld apparently always exports the memory with a canonical name.) This runs against an independent goal I've had which is that, with Web IDL Bindings, it should be possible to have 2 modules, each with their own encapsulated memory, using just Web IDL Bindings (and no GC allocation) to copy a sequence of bytes from one module to the other. ("Encapsulated" here means the memory is neither imported nor exported.) Given the directly-conflicting (and also attractive) goal that it should be possible to implement Web IDL Bindings in a purely layered manner (semantically and also literally, as a polyfill), this makes me go back to the earlier idea that, just as Web IDL Bindings would be able to wrap the core instance exports to produce new, bound, exports, Web IDL Bindings should also be able to hide exports entirely. If we view bindings as just a layer around a core wasm module, this doesn't seem unnatural. |
@lukewagner Would it make sense to think of a "polyfill" (in a broad sense) including generated JS glue code and some simple manipulations of the Wasm module? |
@littledan, if you mean the incoming Wasm module then that would make me very uneasy, because it would create a slippery slope towards arbitrary manipulation. I think what @lukewagner has in mind is merely "wrapping" the resulting JS export object, which could mean producing an object with fewer properties. |
@rossberg Well, if the change from here is to require the memory that the allocator references be exported, that sounds fine to me. |
Right: I'm imagining a pure "wrapping" JS polyfill (of both imports and exports) which does no |
I've been assuming that for the purposes of polyfilling, though I think it should be modeled in the bindings section explicitly. Two other ways to skin this cat:
Orthogonally, we can simplify the handling of |
Closing as out-of-date: the proposal has undergone significant revision in the direction that @lukewagner outlined, and most of the discussion here is no longer relevant. |
Some of the incoming binding expressions reference an allocator function (namely,
alloc‑utf8‑str
andalloc‑copy
), and there's a few considerations for the design.First is: how should this be polyfilled? By referencing an allocator function by index, we can't access it directly via JS.
Given that a polyfill can be assumed to be part of the instantiation code, in theory we could say that it could modify the incoming wasm bytes to export the given function. However that would make the polyfill modify the underlying wasm module, exporting the allocator for the world to see and use.
Second: in the non-polyfill case, how odd is it that the embedder can call a non-exported function? Even though it's opt-in (if you don't want the function to be called, don't specify it as an allocator function in the webidl-binding section), it's still unusual to have a non-exported function being called from the outside world.
To simplify those, I think a reasonable thing to do is to have the allocator functions specify exports, either by name or by index.
Thoughts? Other considerations I'm missing?
The text was updated successfully, but these errors were encountered: