Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarification on Scaling Imports #128

Closed
RossTate opened this issue Aug 28, 2020 · 27 comments
Closed

Clarification on Scaling Imports #128

RossTate opened this issue Aug 28, 2020 · 27 comments

Comments

@RossTate
Copy link
Contributor

RossTate commented Aug 28, 2020

Given the response to #127, it seems the expectation is that every class in a Java-like language would have its own RTT. Given that we have already established that the current MVP will often have to cast objects to their class time even when the surface-code would not have to do so, it seems likely that most modules referencing a class will have to import that class's RTT (at the appropriate depth in the class hierarchy). In order to access the fields and methods of the object after a cast, this RTT import will have to specify the entire contents of that class, including all fields (even private ones) and methods of inherited classes. This sounds exactly like module parameterization.

@rossberg In your presentation, you claimed that parameterization does not scale. I do not understand how such parameterization manages to scale in the current MVP and why such parameterization would not scale in an analogous nominal system. Could you please illustrate?

@rossberg
Copy link
Member

In order to access the fields and methods of the object after a cast, this RTT import will have to specify the entire contents of that class, including all fields

Yes, currently, if you want to implement objects in a flat manner, then any client that wants to access fields needs to specify the entire structure. But that's a problem that's completely independent from nominal vs structural! It's simply a consequence of the fact that Wasm's current compilation model requires the ability to type-check and compile a module without knowing it's imports. So the import has to express all expectations accurately. Or you have to define user-level abstraction for such access.

That is probably too restrictive for languages with leaky abstractions, such as inheritance, and we might eventually need to extend the compilation model with some form of staging. But again, that's an orthogonal problem, and nominal typing by itself does nothing to ease it. (A more high-level object model would, but that's an anti thesis to the asm in Wasm. You should be able to build that high-level object model in user space.)

The parameterisation problem is a separate one. To reiterate, it comes up to the whenever imports are non-abstract, i.e., have to specify the expected definition. Like in any client module that needs to know the structure to access them directly but also should be separably compilable. Under a structural model, clients just need to specify the structure of the imported types they are directly using, because those can be expressed in closed form (that extends to the types of imported RTTs, even the ones that are dynamically nominal). Under a statically nominal regime, though, you don't have this "closed form", and expressing the expected definition requires access to the names of all types that it depends on! So you have to import the whole transitive closure at every level where you want to transparently import a type! Obviously, that can be as bad as exponential.

So that's a likely problem even when (separately) compiling nominal source-level types. At least for those, you can work around it by making the imported types abstract somehow, for the price of some extra indirections. But you gonna have an even worse problem when (separately) compiling structural source-level types, like arrays, tuples, etc. Because then there isn't even a canonical place that could be the "owner" of the definition. So you have to invert all type definitions into imports that wouldn't even be there otherwise. And again these pile up exponentially in case their descriptions require references to other such types. And at some point somebody has to tie up all the loose ends. FWIW, this approach is known as "closed functor style" in the ML world, and also well-known to have terrible scalability, even when the module system has more powerful means for expressing type sharing.

And to cycle back to the beginning, having a staged compilation model would perhaps ease the problem with compiling nominal source types (for the price of losing separate compilation!). But it would certainly not solve the additional problems with compiling structural source types.

@conrad-watt
Copy link
Contributor

conrad-watt commented Aug 28, 2020

My understanding is that, in the current MVP, the modules would each have to locally declare all of the types necessary to express

export/import (rtt n $type_index)
export/import (ref $type_index')

but only these import/exports would be coupled across the modules. In a nominal system, each name that the importing module needs to refer to would need its own import/export pair. Same number of total "lines", but more coupling, and an issue of "ownership" where all importing modules have to agree to get the type names from the same place.

@rossberg. could the importing module also choose to create its own RTT? I wasn't completely sure from the MVP whether rtt.canon on the (structurally) same type across different modules would create an identical RTT.

I'm also not sure I understand @rossberg's point about "closed form". If module A declares a structref with 10 mutable fields, and module B wants to import that structref, but only cares about accessing the last field, is it not still the case that module B would need to define the full structure of the types of each of the preceding fields?

@RossTate
Copy link
Contributor Author

RossTate commented Aug 28, 2020

Unfortunately that didn't really help me understand the problem. Could you provide a concrete side-by-side example (using parameterization), one in which it is clear why each of the excess imports in the nominal version is necessary? In the slides, these excess imports are claimed to be necessary, but the examples never show why, which is making it difficult for me to understand the problem.

@conrad-watt
Copy link
Contributor

conrad-watt commented Aug 28, 2020

I think I was advancing a different argument compared to @rossberg, which muddied the waters.

I think @rossberg's point about parameterisation was the following:

Imagine you have a module which declares a 4-field struct

module M1 {
  type $A := ...
  type $B := ...
  type $C := ...
  type $D := ...
  type $S := structref $A $B $C $D

  global $m1_glob :: (ref $S) := ref.new $S

  export $m1_glob
}

Maybe I have a module M2 which wants to import $m1_glob but only cares about accessing its first element.

With a structural type system, the module can just import that as a supertype structref $A

module M2 {
  type $A := ...
  type $T := structref $A

  import global $m2_glob :: $T
}

M2 doesn't have control over the type that M1 is exporting $m1_glob as. In a nominal system, M2 would have to import the whole definition of $S and all its dependent definitions in order to import $m1_glob.

@RossTate
Copy link
Contributor Author

RossTate commented Aug 28, 2020

The presentation made specific claims that nominal types would require extensive unnecessary imports for object-oriented languages, but it never demonstrated why nominal types required those imports or how structural types made them unnecessary. Before moving on to abstract examples, I would like to understand this concrete claim. Seeing an example side-by-side comparison would be very helpful for that. In particular, one example was that if a class C imported another class X and used X in the return type of one of C's methods, then for some reason a nominally typed module compiled from C would have to import X and all of its fields and methods and inherited classes. I do not see why the module could not simply import the nominal type for X and be done.

@conrad-watt
Copy link
Contributor

conrad-watt commented Aug 28, 2020

I agree that the example of slide 43 onwards looks wrong, if no other code in the same module is accessing X.

That being said, the previous examples, which are just dealing with subclassing, seem to broadly make sense, although the syntax is a little iffy. In particular, the module for E (slides 41-42) needs to know about both C and D in order to determine its own layout. If you think this is wrong, could you show how the module for E could be defined using a nominal scheme without requiring the import of C?

The SOIL proposal seems to suffer from this, because in order to check whether a scheme's refine field is valid, you need to know about the fields of all its (transitive) parents.

EDIT: slides for context

@RossTate
Copy link
Contributor Author

In the current MVP, a class (like E) will have to import its direct superclass's rtt (with full type description) in order to create its own rtt. A nominal system would have to do precisely the same. (I don't know where the impression that you have to also import the transitive hierarchy comes from—the structure of any indirect superclasses will be supertypes of the direct superclass, and so they do not need to be transitively checked for compatibility.)

In every parameterization example in those slides, it seems like the imports that are actually necessary would the same for both structural and nominal types.

@conrad-watt
Copy link
Contributor

conrad-watt commented Aug 28, 2020

(I don't know where the impression that you have to also import the transitive hierarchy comes from—the structure of any indirect superclasses will be supertypes of the direct superclass, and so they do not need to be transitively checked for compatibility.)

This suggests that each time a new nominal type is declared, it has to redeclare all of its parent's fields, and then a structural check must be performed to ensure that its declared fields actually match those of its parent. Is this correct?

In every parameterization example in those slides, it seems like the imports that are actually necessary would the same for both structural and nominal types.

The structural version may have type declarations which are analogous to the type imports of the nominal version, but in the nominal version, each type import for a type that needs to be introspected must be structurally checked at instantiation time.

I acknowledge that in both of the above scenarios, the structural check for each type declaration/import will just be a linear pass. However, if you're importing a nominal type representing (e.g.) a struct with fields that are themselves structs, the types of all of the fields have to be recursively imported and checked in the same way.

Considering the number of shallow structural checks that this approach seems to imply, could the same effect be achieved by having the MVP as it is, and layering a nominal branding mechanism on top as a subsequent extension?

@RossTate
Copy link
Contributor Author

RossTate commented Aug 28, 2020

In the current MVP, when a subclass creates its rtt from its direct superclass's rtt, it will have to redeclare all its parent's fields and there will be a structural comparison that the two are compatible.

In the current MVP, when you instantiate a class's module and supply it with its superclass's rtt, you will have to perform a structural comparison that the two are identical. This structural comparison will be slower than for a nominal system because it will have to recurse into fixpoints whereas a nominal system makes those fixpoints explicit. Furthermore, the two modules' rtt types will be defined using different arrays of types, so any memoization caches that were built to validate these two modules separately will not be useful in infering those fixpoints, meaning that this process will lead into quadratic-time behavior rather than nominal's linear-time behavior. So it seems like the nominal approach would have faster instantiation validation.

But now we're running into new territory. I would still like an explanation of the parameterization examples in the presentation so that I can understand the issue. Or does the above discussion suggest that structural and nominal systems would indeed have the same degree of parameterization, at least for object-oriented languages?

@conrad-watt
Copy link
Contributor

conrad-watt commented Aug 28, 2020

But now we're running into new territory. I would still like an explanation of the parameterization examples in the presentation so that I can understand the issue. Or does the above discussion suggest that structural and nominal systems would indeed have the same degree of parameterization, at least for object-oriented languages?

I guess it depends on what is meant by parameterisation. I think the central thrust is that the structural version will have a small number of imports which will each have to perform worst-case quadratic-time type checking, while the nominal version will have a much larger number of imports (scaling linearly with the breadths and depths of the [types of the] imported objects) which will each be checked in linear time.

I agree there are likely scenarios where the structural version's instantiation (counting RTT creation) is slower for the reasons you laid out, but there might also be overhead with the nominal approach due to the large number of imports (e.g. the cost of building the larger transient JS import objects required to perform instantiation on the Web). There are also likely opportunities to set up cross-instance memoization of structural subtyping by hash-consing at compile-time.

EDIT: the above two paras aren't completely accurate, see #128 (comment)

I also agree that the scenario described by @rossberg regarding subclassing isn't accurate, assuming my observation about the necessity of redeclaring and checking of parent fields is correct.

I'm still interested in how you feel about my branding suggestion! Importing a nominal type and having to do a shallow structural check on it seems very analogous to the structural check that would be required when importing a brand. This paper argues that layering branding over an underlying structural system is much easier than later adding structural typing to a formerly nominal system.

@RossTate
Copy link
Contributor Author

while the nominal version will have a much larger number of imports

I am confused. In every example we have discussed so far, they have had the same number of imports.

I'm still interested in how you feel about my branding suggestion!

I do not understand the suggestion. I can look at the paper to develop that understanding, but due to my reading disability it will take a while, so a proper response will have to wait. Sorry for the delay.

@taralx
Copy link

taralx commented Aug 29, 2020

@conrad-watt That looks a lot like the approach described in A Simple Typed Intermediate Language for Object-Oriented Languages, which Ross touched on at the last subgroup meeting. The underlying system has structural types, but the main mover is a hierarchy of classes, which are treated nominally and have an associated Identifier (brand) that can be used to refine an unknown type.

@conrad-watt
Copy link
Contributor

conrad-watt commented Aug 29, 2020

while the nominal version will have a much larger number of imports

I am confused. In every example we have discussed so far, they have had the same number of imports.

In general the examples have the same number of total (type declarations) + (imports), but in the structural version the type declarations necessary to declare (e.g.) import $Evt are local to the module. That is, the same type declarations are duplicated in both modules. In the nominal version, the type names have to be imported from the original declaring module.

The total amount of checking at the import boundary works out roughly the same (assuming memoization in the structural case). In the nominal version, the structural type check is effectively "flattened out" across multiple imports.

EDIT: not completely right, see #128 (comment)

@conrad-watt
Copy link
Contributor

I do not understand the suggestion. I can look at the paper to develop that understanding, but due to my reading disability it will take a while, so a proper response will have to wait. Sorry for the delay.

I have to spend time thinking about whether the idea is actually workable. If you don't already have a strong opinion, please don't feel pressured to generate one immediately!

@conrad-watt That looks a lot like the approach described in A Simple Typed Intermediate Language for Object-Oriented Languages, which Ross touched on at the last subgroup meeting.

Thanks for the paper pointer @taralx!

@RossTate
Copy link
Contributor Author

In general the examples have the same number of total (type declarations) + (imports)

No, or really more precisely, they have the same number of imports and the inclusion of type declarations is unnecessary. Every time the nominal version would import a type in the above examples, the structural version would as well. Every time the nominal version would need that type to be castable, the structural version would need an rtt as well. Every time the nominal version would need some subtyping constraint on a type, the structural version would as well. And vice versa.

We spent some time trying to figure out how structural types could be any better than nominal types with this regard, but the only way we could think of is rtt.canon, which is why the answer to #127 is important as it eliminates the possibility of rtt.canon from the object-oriented examples in the presentation. Besides, rtt.canon has its own problems, including slow module instantiation:

  1. rtt.canon $t causes slower instantiation than importing an rtt or nominal type for $t. The engine has to make sure to provide the exact same rtt for the equi-recursive equivalence class for $t. So at instantiation time, this requires first constructing the minimal automaton for $t (the engine cannot assume the module has already done this), then turning that automaton into a canonical form so that it can be hashed, then looking the hash up in a hashmap maintained by the engine, then performing a canonicalized-automaton equivalence check with the matching key in the map, and finally returning the corresponding rtt. So it still requires a comparison, in addition to a bunch more steps that an import would not require. So using rtt.canon to save an import results in slower, not faster, instantiation.

  2. While a few languages could use rtt.canon to eliminate the need for linking with a common instance to get the RTTs of primitive types, those languages will need to link with a common instance for a variety of other reasons anyways. For example, some instance will need to generate the language's exception-event tag, and all other instances of that module group will need to link with that common instance to get that exception-event tag so that they can catch each other's exceptions. In other words, surface-level modules when compiled down to WebAssembly will generally need to link with a common "runtime" module in order to, at the least, share engine-generated magic numbers (like exceptions and RTTs), and often to share the code and possibly data structures for common operations. Since that infrastructure needs to be there anyways, it is trivial to have this common instance generate the RTTs for a handful primitive types, and doing so comes with another advantage (see next point).

  3. Using rtt.canon also means you have no data abstraction. Other modules (or the embedding language like JS) can cast your values using rtt.canon. This means that if you want one of many benefits of data abstraction, then you'll need to use dynamic sealing, such as the private type proposed in the Type Imports proposal. But to do that, each module interacting with the outside world will need to link with a common instance in order to get the machine-generated dynamic seal. So you'll still need a common instance anyways if you want data abstraction.

At a high level, this discussion is illustrating that the linking patterns in typed low-level module systems differ from the linking patterns in typed high-level module systems. The same is true for the data-representation patterns of typed low-level systems versus typed high-level systems. And just as we (or really the paper @taralx linked to above) found that nominal types were a good fit for these low-level data-representation patterns (even for structural languages), we found that nominal types (for GC data) are also a good fit for low-level module-linking patterns (even for structural languages).

This is why a concrete WebAssembly example with a side-by-side comparison illustrating the problem would be very helpful. We cannot understand or solve the problem without such an example, and we have tried to find it on our own and so far everything has worked out fine. Ideally the example would be for an object-oriented language, since the presentation suggested the problem can be found there, but we have regretfully been unable to find it ourselves.

@conrad-watt
Copy link
Contributor

conrad-watt commented Aug 29, 2020

This comment isn't correct, see below #128 (comment)

(for posterity)

No, or really more precisely, they have the same number of imports and the inclusion of type declarations is unnecessary. Every time the nominal version would import a type in the above examples, the structural version would as well.

This is not important to the rest of your comment, but you're still not understanding my point. The structural version of the proposal does not import any type definitions. It wouldn't make sense! Each module independently defines the necessary structural types.

Think about what it would mean to "import" a structural type. The import statement would need to reference a full structural definition of the type itself. But having defined that, there is no point importing the type, you can just use that definition.

EDIT: This doesn't imply that the structural version is avoiding any type checking cost. Essentially, the whole cost of the type check is bound up in structurally checking the actual imports.

@conrad-watt
Copy link
Contributor

conrad-watt commented Aug 29, 2020

This is why a concrete WebAssembly example with a side-by-side comparison illustrating the problem would be very helpful. We cannot understand or solve the problem without such an example, and we have tried to find it on our own and so far everything has worked out fine.

What do you consider to be the remaining problem? After talking things through with you, I'm left with the impression that nominal vs structural types are mostly a wash complexity-wise, at least for the MVP, assuming decent memoization on the structural side. I'm glad we've been able to clear up misunderstandings regarding the previous in-person meeting though.

EDIT: I would like you to understand the point about type imports, although I don't think it's a strong argument one way or the other, at least not without a strong experimental example showing it blowing a hole in the side of a JS VM.

@conrad-watt
Copy link
Contributor

conrad-watt commented Aug 29, 2020

And just as we (or really the paper @taralx linked to above) found that nominal types were a good fit for these low-level data-representation patterns (even for structural languages), we found that nominal types (for GC data) are also a good fit for low-level module-linking patterns (even for structural languages).

Just to drill in to this comment, as our previous conversation seemed focussed on nominally typed OO languages. When compiling a structurally typed language such as OCaml to a nominal system, would the toolchain not have to do a lot of work to de-dupe the structural types into nominal ones? This would also prevent such a structural source language from providing "binary compatibility" at the Wasm module level.

EDIT: I'd also assume that your model of linking/compilation is not the same as Wasm's, which muddies the comparison slightly.

@RossTate
Copy link
Contributor Author

In my presentation I talked about how the nominal types you use in a low-level system are simply naming the records the compiler/runtime uses to implement the language, e.g. a "closure" record or a "variant" record (see this comment for a list of the sorts of records OCaml would have). In the current MVP, the toolchain already has to do this. In order to cast values from anyref, it has to know what rtt those values have. So a nominal type system is just making those RTTs part of the type itself, per #119; it's not adding any real work to the toolchain.

I'd also assume that your model of linking/compilation is not the same as Wasm's, which muddies the comparison slightly.

I'm not sure what I said here to prompt this assumption. We do believe that some extensions to wasm's compilation model would make it better aligned with various common compilation models, enable more performance optimizations, and support more composition and abstraction, but the observations above hold regardless of the compilation model.

@conrad-watt
Copy link
Contributor

In my presentation I talked about how the nominal types you use in a low-level system are simply naming the records the compiler/runtime uses to implement the language, e.g. a "closure" record or a "variant" record (see this comment for a list of the sorts of records OCaml would have). In the current MVP, the toolchain already has to do this. In order to cast values from anyref, it has to know what rtt those values have. So a nominal type system is just making those RTTs part of the type itself, per #119; it's not adding any real work to the toolchain.

On reflection OCaml is a terrible example anyway because all the types can be erased. Thanks for the pointer to that comment on implementation!

@RossTate
Copy link
Contributor Author

I'm bummed you missed the presentation. I'm happy to give you an abridged version of it some time. (The slides are up of course, but they unfortunately only have a subset of the content of the presentation.)

@jakobkummerow
Copy link
Contributor

"they have the same number of imports" vs. "The structural version of the proposal does not import any type definitions"

IIUC, Ross' argument was that the structural model would import an RTT when the nominal model would import a type?

As an aside here, for defining subtypes in a nominal system, we could generally go either way: we could require subtypes to repeat their supertypes' field definitions (which would avoid requiring importing indirect/transitive supertypes at the cost of bloating module size), or we could let them inherit such fields implicitly (like C++/Java classes do).
One major influence on this question is abstraction vs. AOT optimization: requiring subtypes to list all inherited fields breaks the supertype's ability to hide (/change) any implementation details about these fields, but allows (fully optimized) AOT compilation (because each module on its own has enough information to determine the actual in-memory object layout). At least that's the situation with the current model; I hear some folks are working on layering approaches that could enable better compromises here :-)

a concrete WebAssembly example with a side-by-side comparison illustrating the problem would be very helpful

+1, and with even bigger scope: I think we would benefit from exploring very concretely how various languages (most notably the "popular OO languages" like Java, which we appear to be focusing on for the MVP) would optimally get compiled to the current MVP, how any obstacles/limitations/inefficiencies would (or wouldn't) improve with the current post-MVP sketches, and how they might (or might not) be avoided with alternative WasmGC designs (whether that's a "minor" delta like #119 or a complete re-think like the SOIL proposal or ...). I'm aware that some discussions on this tracker have already started in this direction; what I'm saying is that I think we could all benefit if more of us found some time to contribute to these explorations, because "how to compile high-level language X to WasmGC design Y" is a very difficult question, and it's probably true for any one of us that while the first afternoon of thinking about it might yield a workable result, thinking about it for, say, a month would very likely yield better results.

would the toolchain not have to do a lot of work to de-dupe the structural types into nominal ones?

Wouldn't it be fair to say that some component of the overall stack has to do that work either way (as even with a "100% structural" system, its core point would be that the engine determines compatible types, which is computationally the ~same as de-duping them to nominal types), so the only question is who will be responsible for this work?

@conrad-watt
Copy link
Contributor

conrad-watt commented Aug 30, 2020

IIUC, Ross' argument was that the structural model would import an RTT when the nominal model would import a type?

Ah, I wasn't thinking clearly about that point. That would be true if the code needed create a new value of/perform down casting to that type.

I'm more fundamentally wrong anyway, because the structural version might choose to import some types abstractly (https://github.com/WebAssembly/gc/blob/master/proposals/gc/Overview.md#imported-types), so I should edit my previous comments.

@RossTate
Copy link
Contributor Author

I already incorporated that possibility in my responses above. Editing the comments with such a significant technical change would make things hard to follow and possibly incorrect. Instead of talking in the abstract, we need a concrete side-by-side WebAssembly example illustrating the supposed problem. I gave a summary of my reasoning above as to why I believe such an example does not exist, and the best way to make progress in establishing the problem exists would be to provide a concrete counterexample to my reasoning.

@conrad-watt
Copy link
Contributor

I already incorporated that possibility in my responses above. Editing the comments with such a significant technical change would make things hard to follow and possibly incorrect.

I appreciate the concern. Apologies for misleading you into thinking I would be changing text out from under you.

Instead of talking in the abstract, we need a concrete side-by-side WebAssembly example illustrating the supposed problem.

We've been essentially in agreement with each other on the main points for a while now, since the repeating fields approach has been clarified. Even if my previous interpretation of type imports was completely correct, I don't feel this would constitute a "parameterisation issue". @rossberg might still disagree and provide an example when he's back - but as far as I'm concerned this issue is satisfied.

I do strongly think that given Wasm's current module system, a nominal system should use the repeating fields approach (IIUC this is what SOIL was doing all along).

@RossTate
Copy link
Contributor Author

Ah, thanks for the update!

I think there are some high-level considerations on facilitating better separate compilation to WebAssembly with more efficient linking that are worth discussing more broadly. These would touch upon the concerns you're raising and what @jakobkummerow is suggesting. They also seem to be orthogonal to structural-versus-nominal typing, as well as various other things in flight (like "layering"). At some point I'll try to start up a thread that can phrase the challenge at a higher-level, and we can collectively dig into detail into whether to and how to address that challenge.

@tlively
Copy link
Member

tlively commented Feb 17, 2022

Closing this since we have decided not to include type imports in the MVP. If there is more to discuss here, the type imports repo would be the better place.

@tlively tlively closed this as completed Feb 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants