You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm keeping track of my current understanding and thoughts here. Please comment if you can clarify or correct something.
In the current WASM GC proposal draft, we have a rather high level memory model. The main reason for this high level memory model appears to be:
having one representation for (a) references to the Embedder (in browsers, JavaScript) side of the world and (b) heap blocks allocated by the "WASM GC" (which effectively, in browsers, appears to be the same (or a part of the) GC that is used for JavaScript).
In the current Cmm intermediate representation of the OCaml compiler, we have a rather low level memory model. Low-level in the sense that integers are tagged, that the memory layout of the module being compiled is visible, and that the content of a heap block to be allocated is described by a list of expressions.
When references to the heap are represented by values of a non-numeric type (as opposed to being actual addresses of heap blocks, i.e. numbers), this either leads to a model where every heap blocks has a type (as opposed to being linear memories from which we can read numbers that either represent ints/floats or heap pointers) or a model where a heap block consists of a byte-array and an array of references to other heap blocks. Why? Because we need the capability to store references to other heap blocks inside a heap block and because we cannot store the heap references directly in linear memory.
What is the point of i31ref?
In my mental model of the world, integer/float tagging is being done in garbage-collected languages in order to be able to distinguish heap pointers and primitive values when crawling a heap to find the live blocks. But if WASM heap blocks are typed structs/arrays, we don't need to tag, we just look at the type and know whether something is a heap pointer or a primitive value. But in turn, the WASM emitter (compiler backend of a programming language) needs to provide individual types for all the heap blocks it generates -> so there is no uniform representation for values that would be needed to implement polymorphic functions.
So, apparently, as a proposed workaround, one can supposedly type heap blocks as anyref arrays and then use typecasts from there. In this special case, i31ref is a tagged integer that the WASM GC can distinguish from real heap references. Which is essentially the same technique that the OCaml GC uses.
I am not sure that this is the only feasible model that makes a uniform representation for values possible.
Hypothesis 1): In the current WASM GC spec, using anyref heap structs and i31ref is the only feasible way to create a uniform representation for values. This was clearly wrong, as @wingo pointed out. We can allocate integers in a runtime that lives in the JS-side of the world and get anyrefs into our WASM program this way.
It still seems useful to modify the OCaml compiler and make a variant of Cmm that does not tag integers, since the JS integers we create are tagged by their dynamic type. Similarly, when i31ref arrives, which will likely come with improved performance over this solution, having untagged integers in Cmm seems more natural.
The text was updated successfully, but these errors were encountered:
You mention the interest of a "version of Cmm where integers are not yet tagged". What about the Clambda representation, which is just one step up in terms of abstraction? Some of the Clambda->WASM steps would be similar to the Clambda->Cmm lowering, but you can inspire yourself from cmmgen.ml to do this. I'm not very familiar with the WASM specification, but it sounds like it could be easier, at least on the OCaml side, than starting with a modified Cmm representation.
I'm keeping track of my current understanding and thoughts here. Please comment if you can clarify or correct something.
In the current WASM GC proposal draft, we have a rather high level memory model. The main reason for this high level memory model appears to be:
In the current Cmm intermediate representation of the OCaml compiler, we have a rather low level memory model. Low-level in the sense that integers are tagged, that the memory layout of the module being compiled is visible, and that the content of a heap block to be allocated is described by a list of expressions.
When references to the heap are represented by values of a non-numeric type (as opposed to being actual addresses of heap blocks, i.e. numbers), this either leads to a model where every heap blocks has a type (as opposed to being linear memories from which we can read numbers that either represent ints/floats or heap pointers) or a model where a heap block consists of a byte-array and an array of references to other heap blocks. Why? Because we need the capability to store references to other heap blocks inside a heap block and because we cannot store the heap references directly in linear memory.
What is the point of
i31ref
?In my mental model of the world, integer/float tagging is being done in garbage-collected languages in order to be able to distinguish heap pointers and primitive values when crawling a heap to find the live blocks. But if WASM heap blocks are typed structs/arrays, we don't need to tag, we just look at the type and know whether something is a heap pointer or a primitive value. But in turn, the WASM emitter (compiler backend of a programming language) needs to provide individual types for all the heap blocks it generates -> so there is no uniform representation for values that would be needed to implement polymorphic functions.
So, apparently, as a proposed workaround, one can supposedly type heap blocks as
anyref
arrays and then use typecasts from there. In this special case,i31ref
is a tagged integer that the WASM GC can distinguish from real heap references. Which is essentially the same technique that the OCaml GC uses.I am not sure that this is the only feasible model that makes a uniform representation for values possible.
Hypothesis 1): In the current WASM GC spec, using
anyref
heap structs andi31ref
is the only feasible way to create a uniform representation for values. This was clearly wrong, as @wingo pointed out. We can allocate integers in a runtime that lives in the JS-side of the world and getanyref
s into our WASM program this way.It still seems useful to modify the OCaml compiler and make a variant of Cmm that does not tag integers, since the JS integers we create are tagged by their dynamic type. Similarly, when
i31ref
arrives, which will likely come with improved performance over this solution, having untagged integers in Cmm seems more natural.The text was updated successfully, but these errors were encountered: