-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarification on Scaling Imports #128
Comments
Yes, currently, if you want to implement objects in a flat manner, then any client that wants to access fields needs to specify the entire structure. But that's a problem that's completely independent from nominal vs structural! It's simply a consequence of the fact that Wasm's current compilation model requires the ability to type-check and compile a module without knowing it's imports. So the import has to express all expectations accurately. Or you have to define user-level abstraction for such access. That is probably too restrictive for languages with leaky abstractions, such as inheritance, and we might eventually need to extend the compilation model with some form of staging. But again, that's an orthogonal problem, and nominal typing by itself does nothing to ease it. (A more high-level object model would, but that's an anti thesis to the asm in Wasm. You should be able to build that high-level object model in user space.) The parameterisation problem is a separate one. To reiterate, it comes up to the whenever imports are non-abstract, i.e., have to specify the expected definition. Like in any client module that needs to know the structure to access them directly but also should be separably compilable. Under a structural model, clients just need to specify the structure of the imported types they are directly using, because those can be expressed in closed form (that extends to the types of imported RTTs, even the ones that are dynamically nominal). Under a statically nominal regime, though, you don't have this "closed form", and expressing the expected definition requires access to the names of all types that it depends on! So you have to import the whole transitive closure at every level where you want to transparently import a type! Obviously, that can be as bad as exponential. So that's a likely problem even when (separately) compiling nominal source-level types. At least for those, you can work around it by making the imported types abstract somehow, for the price of some extra indirections. But you gonna have an even worse problem when (separately) compiling structural source-level types, like arrays, tuples, etc. Because then there isn't even a canonical place that could be the "owner" of the definition. So you have to invert all type definitions into imports that wouldn't even be there otherwise. And again these pile up exponentially in case their descriptions require references to other such types. And at some point somebody has to tie up all the loose ends. FWIW, this approach is known as "closed functor style" in the ML world, and also well-known to have terrible scalability, even when the module system has more powerful means for expressing type sharing. And to cycle back to the beginning, having a staged compilation model would perhaps ease the problem with compiling nominal source types (for the price of losing separate compilation!). But it would certainly not solve the additional problems with compiling structural source types. |
My understanding is that, in the current MVP, the modules would each have to locally declare all of the types necessary to express export/import (rtt n $type_index) but only these import/exports would be coupled across the modules. In a nominal system, each name that the importing module needs to refer to would need its own import/export pair. Same number of total "lines", but more coupling, and an issue of "ownership" where all importing modules have to agree to get the type names from the same place. @rossberg. could the importing module also choose to create its own RTT? I wasn't completely sure from the MVP whether I'm also not sure I understand @rossberg's point about "closed form". If module A declares a structref with 10 mutable fields, and module B wants to import that structref, but only cares about accessing the last field, is it not still the case that module B would need to define the full structure of the types of each of the preceding fields? |
Unfortunately that didn't really help me understand the problem. Could you provide a concrete side-by-side example (using parameterization), one in which it is clear why each of the excess imports in the nominal version is necessary? In the slides, these excess imports are claimed to be necessary, but the examples never show why, which is making it difficult for me to understand the problem. |
I think I was advancing a different argument compared to @rossberg, which muddied the waters. I think @rossberg's point about parameterisation was the following: Imagine you have a module which declares a 4-field struct
Maybe I have a module M2 which wants to import With a structural type system, the module can just import that as a supertype
M2 doesn't have control over the type that M1 is exporting |
The presentation made specific claims that nominal types would require extensive unnecessary imports for object-oriented languages, but it never demonstrated why nominal types required those imports or how structural types made them unnecessary. Before moving on to abstract examples, I would like to understand this concrete claim. Seeing an example side-by-side comparison would be very helpful for that. In particular, one example was that if a class |
I agree that the example of slide 43 onwards looks wrong, if no other code in the same module is accessing X. That being said, the previous examples, which are just dealing with subclassing, seem to broadly make sense, although the syntax is a little iffy. In particular, the module for E (slides 41-42) needs to know about both C and D in order to determine its own layout. If you think this is wrong, could you show how the module for E could be defined using a nominal scheme without requiring the import of C? The SOIL proposal seems to suffer from this, because in order to check whether a scheme's EDIT: slides for context |
In the current MVP, a class (like In every parameterization example in those slides, it seems like the imports that are actually necessary would the same for both structural and nominal types. |
This suggests that each time a new nominal type is declared, it has to redeclare all of its parent's fields, and then a structural check must be performed to ensure that its declared fields actually match those of its parent. Is this correct?
The structural version may have type declarations which are analogous to the type imports of the nominal version, but in the nominal version, each type import for a type that needs to be introspected must be structurally checked at instantiation time. I acknowledge that in both of the above scenarios, the structural check for each type declaration/import will just be a linear pass. However, if you're importing a nominal type representing (e.g.) a struct with fields that are themselves structs, the types of all of the fields have to be recursively imported and checked in the same way. Considering the number of shallow structural checks that this approach seems to imply, could the same effect be achieved by having the MVP as it is, and layering a nominal branding mechanism on top as a subsequent extension? |
In the current MVP, when a subclass creates its In the current MVP, when you instantiate a class's module and supply it with its superclass's But now we're running into new territory. I would still like an explanation of the parameterization examples in the presentation so that I can understand the issue. Or does the above discussion suggest that structural and nominal systems would indeed have the same degree of parameterization, at least for object-oriented languages? |
I guess it depends on what is meant by parameterisation. I think the central thrust is that the structural version will have a small number of imports which will each have to perform worst-case quadratic-time type checking, while the nominal version will have a much larger number of imports (scaling linearly with the breadths and depths of the [types of the] imported objects) which will each be checked in linear time. I agree there are likely scenarios where the structural version's instantiation (counting RTT creation) is slower for the reasons you laid out, but there might also be overhead with the nominal approach due to the large number of imports (e.g. the cost of building the larger transient JS import objects required to perform instantiation on the Web). There are also likely opportunities to set up cross-instance memoization of structural subtyping by hash-consing at compile-time. EDIT: the above two paras aren't completely accurate, see #128 (comment) I also agree that the scenario described by @rossberg regarding subclassing isn't accurate, assuming my observation about the necessity of redeclaring and checking of parent fields is correct. I'm still interested in how you feel about my branding suggestion! Importing a nominal type and having to do a shallow structural check on it seems very analogous to the structural check that would be required when importing a brand. This paper argues that layering branding over an underlying structural system is much easier than later adding structural typing to a formerly nominal system. |
I am confused. In every example we have discussed so far, they have had the same number of imports.
I do not understand the suggestion. I can look at the paper to develop that understanding, but due to my reading disability it will take a while, so a proper response will have to wait. Sorry for the delay. |
@conrad-watt That looks a lot like the approach described in A Simple Typed Intermediate Language for Object-Oriented Languages, which Ross touched on at the last subgroup meeting. The underlying system has structural types, but the main mover is a hierarchy of classes, which are treated nominally and have an associated Identifier (brand) that can be used to refine an unknown type. |
In general the examples have the same number of total The total amount of checking at the import boundary works out roughly the same (assuming memoization in the structural case). In the nominal version, the structural type check is effectively "flattened out" across multiple imports. EDIT: not completely right, see #128 (comment) |
I have to spend time thinking about whether the idea is actually workable. If you don't already have a strong opinion, please don't feel pressured to generate one immediately!
Thanks for the paper pointer @taralx! |
No, or really more precisely, they have the same number of imports and the inclusion of type declarations is unnecessary. Every time the nominal version would import a type in the above examples, the structural version would as well. Every time the nominal version would need that type to be castable, the structural version would need an We spent some time trying to figure out how structural types could be any better than nominal types with this regard, but the only way we could think of is
At a high level, this discussion is illustrating that the linking patterns in typed low-level module systems differ from the linking patterns in typed high-level module systems. The same is true for the data-representation patterns of typed low-level systems versus typed high-level systems. And just as we (or really the paper @taralx linked to above) found that nominal types were a good fit for these low-level data-representation patterns (even for structural languages), we found that nominal types (for GC data) are also a good fit for low-level module-linking patterns (even for structural languages). This is why a concrete WebAssembly example with a side-by-side comparison illustrating the problem would be very helpful. We cannot understand or solve the problem without such an example, and we have tried to find it on our own and so far everything has worked out fine. Ideally the example would be for an object-oriented language, since the presentation suggested the problem can be found there, but we have regretfully been unable to find it ourselves. |
This comment isn't correct, see below #128 (comment) (for posterity)
This is not important to the rest of your comment, but you're still not understanding my point. The structural version of the proposal does not import any type definitions. It wouldn't make sense! Each module independently defines the necessary structural types. Think about what it would mean to "import" a structural type. The import statement would need to reference a full structural definition of the type itself. But having defined that, there is no point importing the type, you can just use that definition. EDIT: This doesn't imply that the structural version is avoiding any type checking cost. Essentially, the whole cost of the type check is bound up in structurally checking the actual imports. |
What do you consider to be the remaining problem? After talking things through with you, I'm left with the impression that nominal vs structural types are mostly a wash complexity-wise, at least for the MVP, assuming decent memoization on the structural side. I'm glad we've been able to clear up misunderstandings regarding the previous in-person meeting though.
|
Just to drill in to this comment, as our previous conversation seemed focussed on nominally typed OO languages. When compiling a structurally typed language such as OCaml to a nominal system, would the toolchain not have to do a lot of work to de-dupe the structural types into nominal ones? This would also prevent such a structural source language from providing "binary compatibility" at the Wasm module level. EDIT: I'd also assume that your model of linking/compilation is not the same as Wasm's, which muddies the comparison slightly. |
In my presentation I talked about how the nominal types you use in a low-level system are simply naming the records the compiler/runtime uses to implement the language, e.g. a "closure" record or a "variant" record (see this comment for a list of the sorts of records OCaml would have). In the current MVP, the toolchain already has to do this. In order to cast values from
I'm not sure what I said here to prompt this assumption. We do believe that some extensions to wasm's compilation model would make it better aligned with various common compilation models, enable more performance optimizations, and support more composition and abstraction, but the observations above hold regardless of the compilation model. |
On reflection OCaml is a terrible example anyway because all the types can be erased. Thanks for the pointer to that comment on implementation! |
I'm bummed you missed the presentation. I'm happy to give you an abridged version of it some time. (The slides are up of course, but they unfortunately only have a subset of the content of the presentation.) |
IIUC, Ross' argument was that the structural model would import an RTT when the nominal model would import a type? As an aside here, for defining subtypes in a nominal system, we could generally go either way: we could require subtypes to repeat their supertypes' field definitions (which would avoid requiring importing indirect/transitive supertypes at the cost of bloating module size), or we could let them inherit such fields implicitly (like C++/Java classes do).
+1, and with even bigger scope: I think we would benefit from exploring very concretely how various languages (most notably the "popular OO languages" like Java, which we appear to be focusing on for the MVP) would optimally get compiled to the current MVP, how any obstacles/limitations/inefficiencies would (or wouldn't) improve with the current post-MVP sketches, and how they might (or might not) be avoided with alternative WasmGC designs (whether that's a "minor" delta like #119 or a complete re-think like the SOIL proposal or ...). I'm aware that some discussions on this tracker have already started in this direction; what I'm saying is that I think we could all benefit if more of us found some time to contribute to these explorations, because "how to compile high-level language X to WasmGC design Y" is a very difficult question, and it's probably true for any one of us that while the first afternoon of thinking about it might yield a workable result, thinking about it for, say, a month would very likely yield better results.
Wouldn't it be fair to say that some component of the overall stack has to do that work either way (as even with a "100% structural" system, its core point would be that the engine determines compatible types, which is computationally the ~same as de-duping them to nominal types), so the only question is who will be responsible for this work? |
Ah, I wasn't thinking clearly about that point. That would be true if the code needed create a new value of/perform down casting to that type. I'm more fundamentally wrong anyway, because the structural version might choose to import some types abstractly (https://github.com/WebAssembly/gc/blob/master/proposals/gc/Overview.md#imported-types), so I should edit my previous comments. |
I already incorporated that possibility in my responses above. Editing the comments with such a significant technical change would make things hard to follow and possibly incorrect. Instead of talking in the abstract, we need a concrete side-by-side WebAssembly example illustrating the supposed problem. I gave a summary of my reasoning above as to why I believe such an example does not exist, and the best way to make progress in establishing the problem exists would be to provide a concrete counterexample to my reasoning. |
I appreciate the concern. Apologies for misleading you into thinking I would be changing text out from under you.
We've been essentially in agreement with each other on the main points for a while now, since the repeating fields approach has been clarified. Even if my previous interpretation of type imports was completely correct, I don't feel this would constitute a "parameterisation issue". @rossberg might still disagree and provide an example when he's back - but as far as I'm concerned this issue is satisfied. I do strongly think that given Wasm's current module system, a nominal system should use the repeating fields approach (IIUC this is what SOIL was doing all along). |
Ah, thanks for the update! I think there are some high-level considerations on facilitating better separate compilation to WebAssembly with more efficient linking that are worth discussing more broadly. These would touch upon the concerns you're raising and what @jakobkummerow is suggesting. They also seem to be orthogonal to structural-versus-nominal typing, as well as various other things in flight (like "layering"). At some point I'll try to start up a thread that can phrase the challenge at a higher-level, and we can collectively dig into detail into whether to and how to address that challenge. |
Closing this since we have decided not to include type imports in the MVP. If there is more to discuss here, the type imports repo would be the better place. |
Given the response to #127, it seems the expectation is that every class in a Java-like language would have its own RTT. Given that we have already established that the current MVP will often have to cast objects to their class time even when the surface-code would not have to do so, it seems likely that most modules referencing a class will have to import that class's RTT (at the appropriate depth in the class hierarchy). In order to access the fields and methods of the object after a cast, this RTT import will have to specify the entire contents of that class, including all fields (even private ones) and methods of inherited classes. This sounds exactly like module parameterization.
@rossberg In your presentation, you claimed that parameterization does not scale. I do not understand how such parameterization manages to scale in the current MVP and why such parameterization would not scale in an analogous nominal system. Could you please illustrate?
The text was updated successfully, but these errors were encountered: