Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(ref null extern)? #142

Closed
jakobkummerow opened this issue Sep 25, 2020 · 9 comments
Closed

(ref null extern)? #142

jakobkummerow opened this issue Sep 25, 2020 · 9 comments

Comments

@jakobkummerow
Copy link
Contributor

@dcodeIO raised an interesting point in #130: (ref null extern) runs into somewhat similar issues as (ref i31 extern).

For JavaScript embeddings (with "extern" typically referring to JS values), the key question would be whether JavaScript null should be distinguishable from a Wasm nullref, or whether the two should in fact be the same. I can see arguments for both solutions; at any rate this should be specified. Aside from that, since everything is running on the same engine, that engine can be assumed to be able to tell JS and Wasm values apart, and specifically to be able to guarantee that externrefs don't accidentally look like a nullref. (I do have concerns about RTTs for externref values, but that's yet another separate discussion.)

For the Wasm C/C++ API: in the current design of that API, "true" (arbitrary bits) host pointers (Foreign) must always be boxed by the engine, because they must fit into the unified representation demanded by the unified type system (everything is a Ref). If we wanted to allow engines to not box them, then "extern" would have to be a separate type, not nullable and not overlapping with any other types.

To be clear, I'm not arguing for anything in particular in this post; just pointing out that we should be aware of the tradeoff: if externrefs are nullable (and/or overlapping with i31), then we have to assume some amount of boxing is going on under the hood, at least in some scenarios. If we want to enable guaranteed-non-boxing implementations, then we have to make "extern" a standalone type: neither a subtype of anyref, nor union-able with other types.

@dcodeIO
Copy link

dcodeIO commented Sep 25, 2020

in the current design of that API, "true" (arbitrary bits) host pointers (Foreign) must always be boxed by the engine

May I ask what the last minute change of renaming anyref to externref to make them distinct in reference types solves then? I'm probably missing some context there, as I only joined the meetings after the, somewhat surprising to me, change was made. Like, we wouldn't have the problem if we had kept anyref, assuming that boxing foreign pointers is necessary anyway?

@jakobkummerow
Copy link
Contributor Author

AFAIK the C API hasn't been updated since that change was made, so it pretty much reflects the "before" state. (Confusingly, it does have an "Extern" class, but it means something else.)

The primary motivation for the late change to the reftypes proposal was to avoid funcref being a subtype of anyref, in order to allow engines to represent funcrefs differently (e.g. as fat pointers). That left the former "anyref" being used only for external references, so it was renamed to "externref" to reflect that.

With the GC proposal now looking to re-introduce an "anyref" top type that every other reference is a subtype of, this decision would effectively be reversed. (Which was the primary counter-argument raised back then: if we'll reintroduce anyref anyway, then dropping it temporarily is just churn.) To be fair, as the ongoing discussions indicate, it's not entirely clear yet what will happen to the type system described in the current version of the GC proposal.

we wouldn't have the problem if we had kept anyref

I don't think the fact that "externref" exists is creating any problems. It only creates the question how exactly it should be designed, in particular regarding its relations and interactions with the other types. If the reftypes proposal had kept anyref as a top type, then we wouldn't be facing this question now; we'd simply be stuck with one of the possible solutions, with all its pros and cons.

@dcodeIO
Copy link

dcodeIO commented Sep 25, 2020

The primary motivation for the late change to the reftypes proposal was to avoid funcref being a subtype of anyref, in order to allow engines to represent funcrefs differently (e.g. as fat pointers).

Hmm, I understand the intention, but I do not see how it guarantees engines to do that forever, as it's really just a name in reference-types as long as it doesn't introduce subtyping anyway. What really allows this is that there is no subtyping just yet, not the name. Might as well still be named anyref.

What I'm trying to get at is, now with GC, we are looking at

        any
   ┌─────┼─────┐
extern  func  ...

so the intended guarantee to allow engines to represent funcrefs differently only holds in a scenario where extern values are exclusively stored in externref targets and function values are exclusively stored in funcref targets (or concrete subtypes of func), and no anyref whatsoever gets in the way, plus, engines are willing to optimize for this case.

As soon as anyref enters the picture, we are essentially back to square one, and the type hierarchy might as well just be

        any
    ┌────┴────┐
   func      ...

(still allowing funcrefs to be represented differently as long as these are not assigned to anyrefs, if engines want to optimize for it) essentially eliminating the need for a separate externref by concluding that the initial goal was noble, but doesn't bring much to the table in practice anyway, respectively actually complicates matters as of #130 and this issue.

Even though reference types is in phase 4, it might not even be too late to correct this by means of a text format only change to reference types.

Other than that I agree of course that if we want externref because we think it's useful, then your initial post applies as we probably need to do something else about it.

@jakobkummerow
Copy link
Contributor Author

What really allows this is that there is no subtyping just yet, not the name.

Yes, precisely. Removing the subtyping was the significant change. The renaming was just a (text format only) "cleanup" afterwards, so that the type names reflect reality ("anyref" doesn't make sense if it isn't any reference).

still allowing funcrefs to be represented differently as long as these are not assigned to anyrefs, if engines want to optimize for it

Realistically, everything that's in the same subtyping hierarchy must use the same representation. In particular, when code for one module is compiled, the engine doesn't know yet what other modules might get loaded later. So the special-case you're hinting at would only apply in a situation where the engine has a chance to inspect all modules up front, which is not a realistic scenario. (That said, in V8 at least we're not planning to make use of special funcref representations, so we don't care much if we lose this freedom.)

@rossberg
Copy link
Member

The removal of anyref certainly has made the type structure less clear.

In a first approximation, externref it is the class of "foreign" references, i.e., values that are not (necessarilly) representations of Wasm concepts and that it probably has no interpretation for.

That does not imply that their representation is arbitrary! Any such reference must still be compatible with e.g. GC references. That is similar to the situation of foreign "raw" pointers in some existing engines, which e.g. are represented as either tagged integer or boxed in the heap, so have a compatible representation. In the C API (which, as @jakobkummerow said, is a bit out of sync with the recent changes), there is an API function that allows the host to allocate "foreign" objects on the Wasm heap, which follows a similar model -- it cannot just stuck in any of its own values.

The compatibility between externref and GC type representations is crucial to make type imports work. In particular, any imported type, like all other im/export entities, ought to be implementable by both another Wasm module or the host. In the latter case, it will be an externref, but the importing module can't know, so cannot distinguish statically.

So the union null and extern still is meaningful.

A closely related question is whether this union needs to be disjoint. A priori, there is no reason that it has to. For example, a Wasm function (export exotic function in JS) could be both a funcref and an externref. Similarly, given the right amount of normalisation at the boundaries, i31 values could be both i31ref and externref.

In a recent discussion with @jakobkummerow and @tebbi, we explored that option and did not see any immediate problem with that. However, we were discussing it in the context of the idea that externrefs are not considered to have extern as their RTTs, so one cannot cast down to externref, which would also avoid a few other issues. I am still thinking through the consequences.

@RossTate
Copy link
Contributor

Note that this arbitrary overlapping of externref and anyref will make it difficult for programs to reason about externrefs. Consider an OCaml module that casts an anyref to (struct int) using rtt.canon in order to get the "header" tag of its OCaml values (which it then uses to determine which sub-rtt to cast to). But OCaml values will be indistinguishable from foreign values that happen to have the same canonical rtt. On some hosts this won't be a problem because foreign values will always have foreign rtts, but on other hosts an externref can be any arbitrary anyref. Because of this, it seems that any module that wants to be able to work on an imported type or externref without being sensitive to such details will need either a dynamic way to distinguish foreign values (e.g. their own rtt) or to box them with the module's own tag reserved for foreign references. The former makes externref disjoint from all wasm references (besides null), whereas the latter makes externref <: anyref unnecessary (#143).

@rossberg
Copy link
Member

Huh? I did not close this. What's going on with GitHub?

@rossberg rossberg reopened this Feb 24, 2021
@tlively
Copy link
Member

tlively commented Feb 25, 2021

The merge of upstream brought in a bunch of commits with things like "Fixes #142" in their messages, and GitHub "helpfully" closes the corresponding issue in response to that landing. What GitHub doesn't know is that in this case, the issue numbers refer to issues in a different repo. It doesn't look like there is a way to turn this off, so this will be an issue for all future merges as well unless GitHub changes something :/

@tlively
Copy link
Member

tlively commented Apr 5, 2022

We've since decided to re-unify anyref and externref.

@tlively tlively closed this as completed Apr 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants