Local Discardable Heaps #149

RossTate · 2020-10-07T18:35:01Z

In #121 (specifically here), there is discussion of how Erlang relies on keeping process heaps isolated for quick clean up. It seems this might be very important to good performance for at least Erlang, possibly for other systems as well, so I thought it might be useful to provide high-level notes on how to make this possible.

First, we need an instruction like new_heap $h instr* end. This creates a new heap that can be referenced within instr* via $h. Supposing each Erlang process is run in its own stack, this instruction could be one of the first things in the root of each stack. Second, we need a reference type (constructor) that can be parameterized by a heap, say heapref $h (which also takes some additional parameter specifying the layout of the referenced data). Within the body of new_heap, one would then be able to construct and track references within specifically the new heap. Third, we will need to make it possible for functions to have heaps as parameters. This makes it possible to call such functions from within new_heap.

If you design your (sub)typing system right, you can ensure no references in this new heap are still live once end is reached (or the relevant stack frame is cleaned up). Consequently, you can clean up the entire new heap all at once. It's also possible to have a variant of the new_heap instruction that makes it so that the new heap is an extension of an existing heap, which (with appropriate subtyping rules) would enable data in the new heap to (easily) point into the existing heap but not the other way around.

Of course, this all relies on having the write (sub)typing rules. In particular, if heapref $h <: anyref holds, then all references in the new heap can leak as anyref values. So this feature seems incompatible with having a (global) top type for references.

The text was updated successfully, but these errors were encountered:

conrad-watt · 2020-10-07T18:53:57Z

Of course, this all relies on having the write (sub)typing rules. In particular, if heapref $h <: anyref holds, then all references in the new heap can leak as anyref values. So this feature seems incompatible with having a (global) top type for references.

I would make a related point, echoing @titzer from the meeting, that having to thread $h through every instruction/function/import/typing rule seems like a significant complication that would need to be well-motivated. In such a system, we probably wouldn't have one anyref, but anyref $h.

EDIT: with some care needed to support the cases where we want to put a host reference in a field of a Wasm object, or allow a Wasm reference to escape into the host.

KronicDeth · 2020-10-07T19:12:22Z

To be able to port optimizations from Erlang/BEAM, the heaps must be able to still reference things in the main heap. This is used for the optimization that binaries over 64 bytes are stored in the process heap as references to a reference counted copy in a global heap. The other external references are references to resources, which in WASM we use to hold DOM elements and other JS stuff.

Everything else is copied between the Process heaps when sending messages. The Process heaps are currently protected with locks, so that either the owning Process or the send can write directly to the heap. If the owning Process has its own heap locked, then heap fragments are used, where the terms are copied into the heap fragment and that fragment is attached to the Process in a faster manner than the heap lock (using an intrusive list data structure). Both the global heap, the Process heaps and the heap fragments could all be this heap type. We use the same trait for all of them in our Rust code.

RossTate · 2020-10-07T19:27:57Z

To be able to port optimizations from Erlang/BEAM, the heaps must be able to still reference things in the main heap

I figured, which is why I mentioned (albeit in a buried fashion) the ability to define the new heap as an extension of a main heap so that all references in the main heap can be treated as references in the new heap. (Though you have to design your mutability/readability right for the appropriate covariance to be accepted.)

having to thread $h through every instruction/function/import/typing rule

Most instructions already use the type of the reference they're operating on to determine, say, what the value of a field is. So similarly most instructions would not need a heap annotation. For most applications, they'll be using an instance-global heap, and so functions do not have to be parameterized. (That is, it's a pay-as-you-go feature with respect to functions.) Imports are only affected as much as their types are (and the fact that you'd need to import a heap). Typing rules would do a simple heap-subtyping check.

Also, remember to put this in context. Currently $t6 below is the type of java.util.ArrayList (where I omitted the parts of the type definition specifying the structure of runtime data structures):

$t1 = ref (func (param $t32) (result i32))
$t2 = ref (func (param $t6 i32) (result))
$t3 = ref (struct opt$t3 $t41 funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref $t10 $t11 $t35 $t8 $t1 $t35 $t35 $t39 $t35 $t20 $t13)
$t4 = ref (func (param $t12 $t28) (result i32))
$t5 = ref (func (param $t12) (result))
$t6 = ref (struct i32 $t23 (mut i32) (mut $t24) (mut i32))
$t7 = optref (struct i32 $t30 (mut i32))
$t8 = ref (func (param $t32) (result $t42))
$t9 = ref (func (param $t12 $t28) (result $t28))
$t10 = ref (func (param $t32) (result $t28))
$t11 = ref (func (param $t32 $t28) (result i32))
$t12 = ref (struct i32 $t14))
$t13 = ref (func (param $t32 i64 i32) (result))
$t14 = ref (struct $t3 $t41 funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref $t10 $t11 $t35 $t8 $t1 $t35 $t35 $t39 $t35 $t20 $t13 $t4 $t4 $t5 $t4 $t4 $t29 $t16 $t4 $t4 $t4 $t31 $t17 $t9 $t16 $t4 $t16 $t16)
$t15 = ref (func (param $t7 $t28) (result i32))
$t16 = ref (func (param $t12) (result $t28))
$t17 = ref (func (param $t12 $t24) (result $t24))
$t18 = ref (func (param $t6) (result))
$t19 = ref (func (param $t7 i32 i32) (result $t28))
$t20 = ref (func (param $t32 i64) (result))
$t21 = ref (func (param $t7 i32 $t28) (result $t28))
$t22 = ref (func (param $t7) (result $t28))
$t23 = ref (struct $t30 $t41 funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref $t10 $t11 $t35 $t8 $t1 $t35 $t35 $t39 $t35 $t20 $t13 $t4 $t4 $t5 $t4 $t4 $t29 $t16 $t4 $t4 $t4 $t31 $t17 $t9 $t16 $t4 $t16 $t16 $t41 $t37 $t15 $t15 $t22 $t37 $t37 $t26 $t21 $t19 $t15 $t38 $t25 $t38 $t36 $t2 $t18)
$t24 = optref (struct i32 $t3 $t41 $t27)
$t25 = ref (func (param $t7) (result i32))
$t26 = ref (func (param $t7 i32 i32) (result))
$t27 = array (mut $t28)
$t28 = optref (struct i32 $t3)
$t29 = ref (func (param $t12) (result i32))
$t30 = ref (struct $t30 $t41 funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref $t10 $t11 $t35 $t8 $t1 $t35 $t35 $t39 $t35 $t20 $t13 $t4 $t4 $t5 $t4 $t4 $t29 $t16 $t4 $t4 $t4 $t31 $t17 $t9 $t16 $t4 $t16 $t16 $t41 $t37 $t15 $t15 $t22 $t37 $t37 $t26 $t21 $t19 $t15 $t38 $t25 $t38 $t36)
$t31 = ref (func (param $t12) (result $t24))
$t32 = ref (struct i32 $t3)
$t33 = optref (struct opt$t3 $t41 $t10 $t11 $t35 $t8 $t1 $t35 $t35 $t39 $t35 $t20 $t13)
$t34 = array (packed i16 i16)
$t35 = ref (func (param $t32) (result))
$t36 = ref (func (param $t7) (result $t28))
$t37 = ref (func (param $t7 i32) (result $t28))
$t38 = ref (func (param $t7 $t28) (result))
$t39 = ref (func (param $t32) (result $t40))
$t40 = optref (struct i32 $t3 i32 i32 $t34)
$t41 = ref (func (param $t7 i32 $t28) (result))

conrad-watt · 2020-10-07T19:47:06Z

This is an apples-to-oranges comparison. There's a compounding cost to adding another dimension to the type system in that it touches the design and implementation (EDIT: and tooling) of every future feature - its not about the length of generated code. Consider for example that we're already planning a shareability dimension for concurrency.

RossTate · 2020-10-09T02:12:41Z

shareability seems to most directly be an attribute of heaps, which would mean that whether a reference is shareable is derived from the heap it belongs to rather than needing to be an additional parameter of the reference's type. Also, thinking about the issue further, it occurs to me that in a nominal system the heap that a reference belongs to could be incorporated into the nominal type, at least for the common case where a module defines all its structures and references as part of some instance-global heap (possibly with cross-references to other heaps). And for the case above, in which functions need to be polymorphic with respect to the heap, the nominal type would similarly need to be parameterized by the heap. This would make it so that applications only really pay for the extra dimension (in terms of code size and validation time) if they actually make use of it.

titzer · 2020-10-12T11:57:36Z

It's worth pointing out that taking heaps as parameters to functions is not enough; they have to be type parameters to functions, since they are part of types. Also, you have to disallow partial application of functions w.r.t. either heap parameters or references that mention heap parameters because then closures escape the scope of the declared heap.

This is by no means a trivial problem to express in any type system.

RossTate · 2020-10-14T01:32:45Z

We already don't allow partial application of functions. Are you talking about func.bind? If so, add this to the list of reasons why I think that instruction is a bad idea.

And sorry for the lack of clarity—when I said parameters, I did not mean value parameters. As I mentioned above, I was thinking polymorphism with respect to heaps (for the case of Lumen). This would likely be similar in complexity to type polymorphism. I'm not advocating for any form of polymorphism for the MVP; I'm just exploring what would be necessary for a Post-MVP to support this feature (and what would be necessary for the MVP to be forwards-compatible with this feature).

titzer · 2020-10-15T02:36:27Z

Yes, func.bind is partial application. It is included in the function references proposal because partial application (closures) cannot really be done in userspace without even worse typing problems than virtual dispatch sequences now.

RossTate · 2020-10-15T18:51:25Z

In #100, there was discussion of how to implement (OCaml) closures, and func.bind was not helpful. func.bind is also still listed as only an optional extension to the Typed Function References proposal. So I believe there should be an explicit discussion of that feature, but within the appropriate repo.

Issue WebAssembly/reference-types#69 requires that `ref.null` instructions include a reference type immediate. This concept isn't present in the bulk-memory proposal, but the encoding is (in element segment expressions). This change updates the binary and text format, but not the syntax. This is OK for now, since the only reference type allowed here is `funcref`.

tlively · 2022-11-01T17:48:51Z

This is not going to be included in the MVP, so I'll close this issue, but PRs adding future ideas to the post-MVP doc would be welcome. Perhaps this could be noted alongside the type parameters extension.

tlively closed this as completed Nov 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Local Discardable Heaps #149

Local Discardable Heaps #149

RossTate commented Oct 7, 2020

conrad-watt commented Oct 7, 2020 •

edited

Loading

KronicDeth commented Oct 7, 2020

RossTate commented Oct 7, 2020

conrad-watt commented Oct 7, 2020 •

edited

Loading

RossTate commented Oct 9, 2020

titzer commented Oct 12, 2020

RossTate commented Oct 14, 2020

titzer commented Oct 15, 2020

RossTate commented Oct 15, 2020

tlively commented Nov 1, 2022

Local Discardable Heaps #149

Local Discardable Heaps #149

Comments

RossTate commented Oct 7, 2020

conrad-watt commented Oct 7, 2020 • edited Loading

KronicDeth commented Oct 7, 2020

RossTate commented Oct 7, 2020

conrad-watt commented Oct 7, 2020 • edited Loading

RossTate commented Oct 9, 2020

titzer commented Oct 12, 2020

RossTate commented Oct 14, 2020

titzer commented Oct 15, 2020

RossTate commented Oct 15, 2020

tlively commented Nov 1, 2022

conrad-watt commented Oct 7, 2020 •

edited

Loading

conrad-watt commented Oct 7, 2020 •

edited

Loading