-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incompatibilities with Pointer Tagging #118
Comments
Ah, but only if you assume that variants may be unboxed. That was not the idea here. Variants would express tagged pointers, not integers. For the latter, you already have i31ref, and if you want a mixed representation then you can roll that in user space. There is no significant advantage in lowering that into the engine. If somebody wanted to encode a variant with custom cast semantics on nullary cases, then obviously an RTT has to be stored somewhere. So they'd have to pay the boxing price for nullary cases. It's their choice, and may depend on the use case. The proposal only provides the tool box for expressing different choices. And you pay what you ask for.
I think there still is some deeper misunderstanding here. Specifically, anyref is a union type, directly reflecting the union of possible representations. As such, it does not (have to) guarantee disjointness or opacity per se. If you want to ensure that your type's representation is distinguishable from others then you can make it so, but it's not automatic or required. Remember this is an assembly language (more or less). Again, you pay what you ask for. |
In the text I quoted, you say "cases with no
This would then require casting from
You don't seem to be realizing the consequences of this.
It needs to guarantee that downcasting behaves deterministically and is sound without trusting the application. That's why the first thing I had to do was derive how downcasting would need to work. The implication of that was that all distinct variant types would need to be disjoint, contrary to your claim here.
The current proposal permanently requires all reference values to be equipped with the JVM's casting model, which bakes in a high-level paradigm and is not particularly assembly-like.
What I'm illustrating is that other languages will pay the cost of Another problem along this line that I forgot to bring attention to regards the following statement:
Because everything is a subtype of This means that engines are either going to not perform this space-saving optimization at all, or they're going to perform it on a first-come first-serve basis. The latter case means that a module's space performance can vary widely just by being compiled later, after all the bits have been reserved for other modules' purposes. |
Speaking from just one engine's perspective: I think it is extremely unlikely that we will ever use any bits in a pointer for any module-specific special-casing. The marking phase of GC is expensive and hence performance-sensitive, and one consequence of that is that we want the heap layout to be as regular as possible. Being a highly complex system already, we also have a strong dislike for avoidable implementation complexity. One bit to distinguish unboxed scalars from regular pointers is fine, anything else will (in my personal prediction) very likely remain hypothetical. (Of course, if WasmGC ends up specifying module-controlled tagging schemes, we'll have to implement that observable behavior. But we may well implement it via an indirection, e.g. storing the tag as a plain integer somewhere in the "hidden class" descriptor, possibly even more than one hop away from the original pointer.) |
@jakobkummerow Yes, and for those reasons I thought @rossberg had consciously decided to never support such functionality. But now the plans in #114 contradict those suspicions. So now I am unsure if @rossberg is unaware of the limitations of his proposal and possibly advocating for it because he doesn't realize it has these problems. Regardless of his reasons, it's also important that the CG is making informed decisions. It's problematic if members of the CG decide to advance a proposal under the expectation that they'll be able to later add functionality that is impossible to add. It's another story if the CG discusses this limitation and consciously decides that they are willing to never support said functionality. So it's important for the documentation in #114 to accurately convey what extensions are and are not possible, and it's important for us to discuss and consciously decide to never support the functionality that is incompatible with the current proposal. |
@RossTate I would really like to understand the issues at play here, but I am finding it difficult.
Speaking for myself, I definitely don't realize the consequences of this, so it would be more helpful for me if you tried to explain them in a different way. There are many different assumptions among folks in this conversation that have already been brought up (e.g. whether variants may be unboxed, whether engines will want to implement various optimizations) and I suspect there are many more. This conversation will be most productive if we can collaborate to figure out what all our respective assumptions are and document them. That requires assuming that disagreements are due to different assumptions rather than a lack of understanding.
It would be helpful to me as a non-expert in these topics if you could more explicitly outline how you arrived at these conclusions since I don't have the expertise to see the connections myself. Stepping back, it seems like the current GC proposal was crafted with the boots-on-the-ground reality of what web engines will and will not want to implement in mind, whereas @RossTate is thinking about the needs of engines that would be willing to take on more complexity in return for higher performance. I think there will be engines all along this spectrum in the WebAssembly ecosystem, so I personally think we should make it a goal for the GC proposal to allow engines to make different tradeoffs on this axis. If others feel that this proposal should be tailored to minimize unused complexity for Web engines, we should talk about that on a separate issue and come to a common understanding. |
@tlively I'm happy to delve deeper into the reasoning; my terseness was an effort to keep writing brief, and I apologize for the unintended effect of it be inaccessible. Hopefully that will help identify different assumptions as you say.
Let me start with explaining how I came to this conclusion. There are three reasons:
Now lets consider two variant types:
then consider the following two values:
The values Does that better illustrate the reasoning? And does it illustrate how |
Before you continue asserting stuff, can you explain why an OCaml implementation would ever need to do that? First of all, RTTs are intended for checking downcasts against known target representations, not for switching on representations. Second, OCaml is typed, so usually never needs to switch on all possible representation. The only exception I know would be a primitive like polymorphic equality. And that would use the i32 header for the actual switch. Overall, that requires two downcasts per node (one to block, to access the header, and one to the actual type). Not ideal, but also not terrible, especially considering that using polymorphic equality in performance relevant code is usually advised against anyway. The other place making a runtime case distinction on representations is polymorphic array ops, but they only need to distinguish between two possible representation choices. |
Structural equality/comparison/hashing is one example that has been a running theme through many of the discussions on challenges of implementing OCaml efficiently and motivations for variants, including in the discussion I linked to. You name this in your comment and then explicitly discard the concern without checking if the people currently trying to support OCaml on WebAssembly agree.
The i32 header is a patch for not having variants. Only 1 of the 9 cases has any use for it besides saying which of the 9 cases it belongs to. So the fix you're suggesting involves making 8 of 9 cases unnecessarily larger just to make up for not having good support for variants.
Yes, and the point of this issue is to illustrate that other use cases are not well served by RTTs and that the use of I used OCaml as an example because other people had worked through how it should be represented, and these issues came up. Even if we were to eliminate/ignore these issues for OCaml, there are other languages that would still encounter the same problems. Note that this is somewhat related to the bug identified in #110 pointing out that the current design for RTTs for functions makes it impossible to cast from |
So let me double-check whether you agree that
The header is what OCaml's data representation is using today and which would naturally map over to a representation in Wasm by default. AFAICS, the implementation would not win anything right now by changing it just for Wasm, other than requiring bigger changes to the compiler pipeline. The goal for Wasm is to allow porting existing implementations to the extent possible, not force rewriting them.
Sure, but (1) do they need to in the MVP?, and (2) what prevents them from being supported by later extensions if there's need? |
The issue is about (2). It does not claim that pointer tagging needs to be in the MVP. It is rejecting the claims made in the Post-MVP and illustrating why the feature cannot be added by later extensions.
The header word in OCaml is its version of an RTT. If the OCaml runtime sees that this header word is the magic number corresponding to a double array, then it knows that the reference is a double array. But that's not the case in the current/post MVP. Even if the application knows that invariant, the WebAssembly engine does not, and so the application has to insert an This redundancy means that when compiling OCaml to the current/post MVP it no longer makes sense to have each value have such an i32 field. That is, the use of RTTs prompts existing implementations to be redesigned specifically for WebAssembly. There are existing design techniques that would be able to let OCaml use its header word as it does currently, i.e. they would be able to recognize that if the header word is the magic constant for a double array then the reference is a double array, but unfortunately they too are incompatible with the current MVP. |
I don't follow what you are arguing. That an OCaml port should be able to remove the field? That may save space, but may require cross-cutting changes to the compiler, so should not be required. (It's also worth noting that the information in the header is exposed to user code via the Obj module; that may not be worth supporting on a Wasm port, though.)
Now you seem to be arguing that an OCaml port should be able to keep the header field, but without introducing redundancy. That's fundamentally impossible, AFAICS. The header word has multiple bit fields. The biggest one is the object's size. Given that the Wasm GC will not be able to understand that, it will inevitably have its own representation of that information, in the RTT or elsewhere. The only way to avoid the redundancy is by removing the header word in OCaml's data mapping and have it use Wasm's built-in size, see above. Either way, I see no barrier to adding means to switch on RTTs post-MVP. Care to substantiate your claim? |
@RossTate is arguing that this should be allowed, not required.
IIUC, @RossTate would prefer not to have to make this assumption, and the fact that the current proposal does make this assumption is the root of the issue he is raising here. (@RossTate, my apologies if I'm misrepresenting your position) |
I have a hard time believing that that's truly what @RossTate is suggesting. It hardly seems realistic to expect that a Wasm GC would be able to read off object sizes from arbitrary bit fields in user space instead of using its own. And that that could result in an efficient GC. Not to mention a safe and secure one. |
Criteria to evaluate performance issues:
Answers for "header-fields of OCaml being redundant with RTTs and WASM GC structs" look to me to be pretty clear:
It's perfectly fine that the heap struct's header field becomes obsolete when compiling OCaml to WASM+GC. Other languages experience similar, resolvable redundancy when they port compiler backends to WASM. |
Thanks @sabine. That's a great rephrasing of the argument I was making! |
Oh, I read "becoming obsolete" as meaning no longer necessary, and I read "3. Likely" as meaning it would be easy to adapt the compiler to the change. But I do see how it can be read both ways. @sabine, would you mind clarifying what you meant? |
@RossTate I believe that, in the interest of getting a reasonable WASM GC MVP out of the door, it is very important to focus on resolving performance issues that trigger "Yes" answers to the first two criteria. Thus, we're fine with keeping the redundant header field in our WASM compiler backend MVP. As time progresses, we optimize the compiler implementation based on profiling results. We believe we can get rid of the header field in the WASM backend in the sense that there is no fundamental reason why we can't. It is likely that removal of the redundant header field is not a low-hanging fruit in terms of cost vs. reward right now. In the very long term, it is possible that we will go through the effort of making the heap model of the middle-end of the compiler more abstract to accommodate the different representation of the heap struct's header on different backends. I expect that this process of discovering abstractions for the middle end of the compiler will be similar for most, if not all, existing languages, and that this is, in the long run, a healthy development. |
Ah, thanks for the nuanced clarification! |
Closing this since it is about post-MVP features and not actionable for this particular proposal. Of course if we work on variant types in post-MVP, we will have to make sure they perform well and fit into the rest of the language, but the discussion will be more productive with a concrete variants proposal. |
#114 discusses an extension for variants along with a number of pointer-tagging optimizations it would be intended to support. Unfortunately these optimizations are not particularly compatible with the current design.
This is a major part of the question. Or more generally, how is downcasting from
anyref
supposed to work with variants? The answer that seems most consistent with the current MVP and with the suggested instructions for invariants is to have each variant case use anrtt
that is anrtt.sub
(or something analogous) of thertt.canon
for the variant type. That way there's a way to cast fromanyref
to a variant, which then provides access tobr_on_case
andvariant.test
to further cast to the respective cases.The last note here is problematic. Variants are subtypes of
anyref
, andanyref
already has an unboxed integer value:i31ref
. So this means having another bit tag to distinguish unboxed variants from references.But even then, the bigger problem is that
(variant (case) (case anyref))
and(variant (case) (case funcref))
are distinct types and so would have distinctrtt
s (even though one would ideally be a subtype of the other). So the nullarycase
would need to be represented by a different unboxed integer for each such type. This problem gets worse if you consider(variant (case) (case (ref $imported_type)))
, because now each instance of the module using this type would have to use a different unboxed integer for the nullary case. (And I have no idea in general how to make this design for variants work in the presence of parametric polymorphism.)The above problems with nullary cases are even worse with even unary cases. Because everything is a subtype of
anyref
, the engine would need to have a cross-module-coordinated bit-tagging scheme in order to provide consistent/sound downcasting fromanyref
. Given the complexity of the problems above, likely no tagging would be done.This incompatibility with pointer tagging seems fundamental to the use of a top type (
anyref
) with downcasting in the current MVP.The text was updated successfully, but these errors were encountered: