Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Low-Level Memory Layout in OCaml's Cmm vs. Higher-Level Memory Model of the WASM GC Extension #3

Open
sabine opened this issue Oct 24, 2019 · 2 comments

Comments

@sabine
Copy link
Owner

sabine commented Oct 24, 2019

I'm keeping track of my current understanding and thoughts here. Please comment if you can clarify or correct something.

In the current WASM GC proposal draft, we have a rather high level memory model. The main reason for this high level memory model appears to be:

  • having one representation for (a) references to the Embedder (in browsers, JavaScript) side of the world and (b) heap blocks allocated by the "WASM GC" (which effectively, in browsers, appears to be the same (or a part of the) GC that is used for JavaScript).

In the current Cmm intermediate representation of the OCaml compiler, we have a rather low level memory model. Low-level in the sense that integers are tagged, that the memory layout of the module being compiled is visible, and that the content of a heap block to be allocated is described by a list of expressions.

When references to the heap are represented by values of a non-numeric type (as opposed to being actual addresses of heap blocks, i.e. numbers), this either leads to a model where every heap blocks has a type (as opposed to being linear memories from which we can read numbers that either represent ints/floats or heap pointers) or a model where a heap block consists of a byte-array and an array of references to other heap blocks. Why? Because we need the capability to store references to other heap blocks inside a heap block and because we cannot store the heap references directly in linear memory.

What is the point of i31ref?
In my mental model of the world, integer/float tagging is being done in garbage-collected languages in order to be able to distinguish heap pointers and primitive values when crawling a heap to find the live blocks. But if WASM heap blocks are typed structs/arrays, we don't need to tag, we just look at the type and know whether something is a heap pointer or a primitive value. But in turn, the WASM emitter (compiler backend of a programming language) needs to provide individual types for all the heap blocks it generates -> so there is no uniform representation for values that would be needed to implement polymorphic functions.
So, apparently, as a proposed workaround, one can supposedly type heap blocks as anyref arrays and then use typecasts from there. In this special case, i31ref is a tagged integer that the WASM GC can distinguish from real heap references. Which is essentially the same technique that the OCaml GC uses.
I am not sure that this is the only feasible model that makes a uniform representation for values possible.

Hypothesis 1): In the current WASM GC spec, using anyref heap structs and i31ref is the only feasible way to create a uniform representation for values. This was clearly wrong, as @wingo pointed out. We can allocate integers in a runtime that lives in the JS-side of the world and get anyrefs into our WASM program this way.
It still seems useful to modify the OCaml compiler and make a variant of Cmm that does not tag integers, since the JS integers we create are tagged by their dynamic type. Similarly, when i31ref arrives, which will likely come with improved performance over this solution, having untagged integers in Cmm seems more natural.

@sabine
Copy link
Owner Author

sabine commented Nov 4, 2019

Thank you @wingo for linking how Schism implements its uniform value representation with the JavaScript GC in WebAssembly/gc#53 (comment).

@gasche
Copy link

gasche commented Jun 28, 2020

You mention the interest of a "version of Cmm where integers are not yet tagged". What about the Clambda representation, which is just one step up in terms of abstraction? Some of the Clambda->WASM steps would be similar to the Clambda->Cmm lowering, but you can inspire yourself from cmmgen.ml to do this. I'm not very familiar with the WASM specification, but it sounds like it could be easier, at least on the OCaml side, than starting with a modified Cmm representation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants