Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Local Discardable Heaps #149

Closed
RossTate opened this issue Oct 7, 2020 · 10 comments
Closed

Local Discardable Heaps #149

RossTate opened this issue Oct 7, 2020 · 10 comments

Comments

@RossTate
Copy link
Contributor

RossTate commented Oct 7, 2020

In #121 (specifically here), there is discussion of how Erlang relies on keeping process heaps isolated for quick clean up. It seems this might be very important to good performance for at least Erlang, possibly for other systems as well, so I thought it might be useful to provide high-level notes on how to make this possible.

First, we need an instruction like new_heap $h instr* end. This creates a new heap that can be referenced within instr* via $h. Supposing each Erlang process is run in its own stack, this instruction could be one of the first things in the root of each stack. Second, we need a reference type (constructor) that can be parameterized by a heap, say heapref $h (which also takes some additional parameter specifying the layout of the referenced data). Within the body of new_heap, one would then be able to construct and track references within specifically the new heap. Third, we will need to make it possible for functions to have heaps as parameters. This makes it possible to call such functions from within new_heap.

If you design your (sub)typing system right, you can ensure no references in this new heap are still live once end is reached (or the relevant stack frame is cleaned up). Consequently, you can clean up the entire new heap all at once. It's also possible to have a variant of the new_heap instruction that makes it so that the new heap is an extension of an existing heap, which (with appropriate subtyping rules) would enable data in the new heap to (easily) point into the existing heap but not the other way around.

Of course, this all relies on having the write (sub)typing rules. In particular, if heapref $h <: anyref holds, then all references in the new heap can leak as anyref values. So this feature seems incompatible with having a (global) top type for references.

@conrad-watt
Copy link
Contributor

conrad-watt commented Oct 7, 2020

Of course, this all relies on having the write (sub)typing rules. In particular, if heapref $h <: anyref holds, then all references in the new heap can leak as anyref values. So this feature seems incompatible with having a (global) top type for references.

I would make a related point, echoing @titzer from the meeting, that having to thread $h through every instruction/function/import/typing rule seems like a significant complication that would need to be well-motivated. In such a system, we probably wouldn't have one anyref, but anyref $h.

EDIT: with some care needed to support the cases where we want to put a host reference in a field of a Wasm object, or allow a Wasm reference to escape into the host.

@KronicDeth
Copy link

To be able to port optimizations from Erlang/BEAM, the heaps must be able to still reference things in the main heap. This is used for the optimization that binaries over 64 bytes are stored in the process heap as references to a reference counted copy in a global heap. The other external references are references to resources, which in WASM we use to hold DOM elements and other JS stuff.

Everything else is copied between the Process heaps when sending messages. The Process heaps are currently protected with locks, so that either the owning Process or the send can write directly to the heap. If the owning Process has its own heap locked, then heap fragments are used, where the terms are copied into the heap fragment and that fragment is attached to the Process in a faster manner than the heap lock (using an intrusive list data structure). Both the global heap, the Process heaps and the heap fragments could all be this heap type. We use the same trait for all of them in our Rust code.

@RossTate
Copy link
Contributor Author

RossTate commented Oct 7, 2020

To be able to port optimizations from Erlang/BEAM, the heaps must be able to still reference things in the main heap

I figured, which is why I mentioned (albeit in a buried fashion) the ability to define the new heap as an extension of a main heap so that all references in the main heap can be treated as references in the new heap. (Though you have to design your mutability/readability right for the appropriate covariance to be accepted.)

having to thread $h through every instruction/function/import/typing rule

Most instructions already use the type of the reference they're operating on to determine, say, what the value of a field is. So similarly most instructions would not need a heap annotation. For most applications, they'll be using an instance-global heap, and so functions do not have to be parameterized. (That is, it's a pay-as-you-go feature with respect to functions.) Imports are only affected as much as their types are (and the fact that you'd need to import a heap). Typing rules would do a simple heap-subtyping check.

Also, remember to put this in context. Currently $t6 below is the type of java.util.ArrayList (where I omitted the parts of the type definition specifying the structure of runtime data structures):

$t1 = ref (func (param $t32) (result i32))
$t2 = ref (func (param $t6 i32) (result))
$t3 = ref (struct opt$t3 $t41 funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref $t10 $t11 $t35 $t8 $t1 $t35 $t35 $t39 $t35 $t20 $t13)
$t4 = ref (func (param $t12 $t28) (result i32))
$t5 = ref (func (param $t12) (result))
$t6 = ref (struct i32 $t23 (mut i32) (mut $t24) (mut i32))
$t7 = optref (struct i32 $t30 (mut i32))
$t8 = ref (func (param $t32) (result $t42))
$t9 = ref (func (param $t12 $t28) (result $t28))
$t10 = ref (func (param $t32) (result $t28))
$t11 = ref (func (param $t32 $t28) (result i32))
$t12 = ref (struct i32 $t14))
$t13 = ref (func (param $t32 i64 i32) (result))
$t14 = ref (struct $t3 $t41 funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref $t10 $t11 $t35 $t8 $t1 $t35 $t35 $t39 $t35 $t20 $t13 $t4 $t4 $t5 $t4 $t4 $t29 $t16 $t4 $t4 $t4 $t31 $t17 $t9 $t16 $t4 $t16 $t16)
$t15 = ref (func (param $t7 $t28) (result i32))
$t16 = ref (func (param $t12) (result $t28))
$t17 = ref (func (param $t12 $t24) (result $t24))
$t18 = ref (func (param $t6) (result))
$t19 = ref (func (param $t7 i32 i32) (result $t28))
$t20 = ref (func (param $t32 i64) (result))
$t21 = ref (func (param $t7 i32 $t28) (result $t28))
$t22 = ref (func (param $t7) (result $t28))
$t23 = ref (struct $t30 $t41 funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref $t10 $t11 $t35 $t8 $t1 $t35 $t35 $t39 $t35 $t20 $t13 $t4 $t4 $t5 $t4 $t4 $t29 $t16 $t4 $t4 $t4 $t31 $t17 $t9 $t16 $t4 $t16 $t16 $t41 $t37 $t15 $t15 $t22 $t37 $t37 $t26 $t21 $t19 $t15 $t38 $t25 $t38 $t36 $t2 $t18)
$t24 = optref (struct i32 $t3 $t41 $t27)
$t25 = ref (func (param $t7) (result i32))
$t26 = ref (func (param $t7 i32 i32) (result))
$t27 = array (mut $t28)
$t28 = optref (struct i32 $t3)
$t29 = ref (func (param $t12) (result i32))
$t30 = ref (struct $t30 $t41 funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref $t10 $t11 $t35 $t8 $t1 $t35 $t35 $t39 $t35 $t20 $t13 $t4 $t4 $t5 $t4 $t4 $t29 $t16 $t4 $t4 $t4 $t31 $t17 $t9 $t16 $t4 $t16 $t16 $t41 $t37 $t15 $t15 $t22 $t37 $t37 $t26 $t21 $t19 $t15 $t38 $t25 $t38 $t36)
$t31 = ref (func (param $t12) (result $t24))
$t32 = ref (struct i32 $t3)
$t33 = optref (struct opt$t3 $t41 $t10 $t11 $t35 $t8 $t1 $t35 $t35 $t39 $t35 $t20 $t13)
$t34 = array (packed i16 i16)
$t35 = ref (func (param $t32) (result))
$t36 = ref (func (param $t7) (result $t28))
$t37 = ref (func (param $t7 i32) (result $t28))
$t38 = ref (func (param $t7 $t28) (result))
$t39 = ref (func (param $t32) (result $t40))
$t40 = optref (struct i32 $t3 i32 i32 $t34)
$t41 = ref (func (param $t7 i32 $t28) (result))

@conrad-watt
Copy link
Contributor

conrad-watt commented Oct 7, 2020

This is an apples-to-oranges comparison. There's a compounding cost to adding another dimension to the type system in that it touches the design and implementation (EDIT: and tooling) of every future feature - its not about the length of generated code. Consider for example that we're already planning a shareability dimension for concurrency.

@RossTate
Copy link
Contributor Author

RossTate commented Oct 9, 2020

shareability seems to most directly be an attribute of heaps, which would mean that whether a reference is shareable is derived from the heap it belongs to rather than needing to be an additional parameter of the reference's type. Also, thinking about the issue further, it occurs to me that in a nominal system the heap that a reference belongs to could be incorporated into the nominal type, at least for the common case where a module defines all its structures and references as part of some instance-global heap (possibly with cross-references to other heaps). And for the case above, in which functions need to be polymorphic with respect to the heap, the nominal type would similarly need to be parameterized by the heap. This would make it so that applications only really pay for the extra dimension (in terms of code size and validation time) if they actually make use of it.

@titzer
Copy link
Contributor

titzer commented Oct 12, 2020

It's worth pointing out that taking heaps as parameters to functions is not enough; they have to be type parameters to functions, since they are part of types. Also, you have to disallow partial application of functions w.r.t. either heap parameters or references that mention heap parameters because then closures escape the scope of the declared heap.

This is by no means a trivial problem to express in any type system.

@RossTate
Copy link
Contributor Author

We already don't allow partial application of functions. Are you talking about func.bind? If so, add this to the list of reasons why I think that instruction is a bad idea.

And sorry for the lack of clarity—when I said parameters, I did not mean value parameters. As I mentioned above, I was thinking polymorphism with respect to heaps (for the case of Lumen). This would likely be similar in complexity to type polymorphism. I'm not advocating for any form of polymorphism for the MVP; I'm just exploring what would be necessary for a Post-MVP to support this feature (and what would be necessary for the MVP to be forwards-compatible with this feature).

@titzer
Copy link
Contributor

titzer commented Oct 15, 2020

Yes, func.bind is partial application. It is included in the function references proposal because partial application (closures) cannot really be done in userspace without even worse typing problems than virtual dispatch sequences now.

@RossTate
Copy link
Contributor Author

In #100, there was discussion of how to implement (OCaml) closures, and func.bind was not helpful. func.bind is also still listed as only an optional extension to the Typed Function References proposal. So I believe there should be an explicit discussion of that feature, but within the appropriate repo.

rossberg pushed a commit that referenced this issue Feb 24, 2021
Issue WebAssembly/reference-types#69 requires
that `ref.null` instructions include a reference type immediate. This
concept isn't present in the bulk-memory proposal, but the encoding is
(in element segment expressions).

This change updates the binary and text format, but not the syntax. This
is OK for now, since the only reference type allowed here is `funcref`.
@tlively
Copy link
Member

tlively commented Nov 1, 2022

This is not going to be included in the MVP, so I'll close this issue, but PRs adding future ideas to the post-MVP doc would be welcome. Perhaps this could be noted alongside the type parameters extension.

@tlively tlively closed this as completed Nov 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants