-
Notifications
You must be signed in to change notification settings - Fork 3
Design decisions and compilation scheme for hijacked classes #12
Comments
Thank you very much @sjrd for writing this detailed explanation! As I approved #13 I believe the overall design and an idea of delegating the representation of Regarding pure WASI environment (non JS-host), first of all, I totally agree that
I think it's too early to think about it, but what do you think about going to "do something like what the JVM does, and box the i32 in an actual class Integer(value: Int)" when we target to pure-WASI environment? Proposing our own reference typed primitive types (such as Also, it seems like there're same kind of discussion is happening in |
Yes, absolutely. If we're throwing away JS interop for the benefit of running on a non-JS host, we can box all primitive types in real classes. We could actually use the same system we use for |
This issue is to try and document general design decisions about the compilation scheme that we are going to use to compile the semantics of hijacked classes. The main theme is that we want to faithfully compile the semantics of the Scala.js dialect. Compiling IR Scala classes, fields and methods in a closed world is a significant compiler engineering work, of course, but the biggest challenges lie in the design of interoperability.
Semantics?
There are several reference pages that document the semantics we are concerned about, here:
Background on hijacked classes
There is one unusual feature of the IR that is at the core of the broad design decisions I cover here: hijacked classes. There is a fixed set of hijacked classes, all in the
java.lang.*
package. Each is associated with one primitive type of the IR of which is it the representative class.Void
<->undef
Boolean
<->boolean
Character
<->char
Byte
<->byte
Short
<->short
Integer
<->int
Long
<->long
Float
<->float
Double
<->double
String
<->string
The fun thing about them a primitive type is a true subtype of the corresponding hijacked class. For example,
int <: java.lang.Integer
.Hijacked classes cannot have constructors. The only way to create values of those classes is by creating the underlying primitive values, and upcasting them to their hijacked class. If the Scala source code contains
the corresponding IR is instead
where
5
is a primitiveint
, which can be upcast toInteger
.The primitive types are strict subtypes because they do not allow the
null
value, whereas their hijacked class types do. For example, there are exactly 2 values of typeboolean
(namelytrue
andfalse
) but there are 3 values of typejava.lang.Boolean
(true
,false
andnull
).Within the body of a method (of any class), the
this
value is always known to be a non-null value of the enclosing class type. That means that in the body of a method of a hijacked class, thethis
value has the corresponding primitive type. For example, in the body ofjava.lang.String.hashCode()
,this
has the primitive typestring
.This type system design has a consequence in terms of run-time semantics for virtual calls. When calling
(x: Object).hashCode()
, sincestring <: String <: Object
, it is possible thatx
is in fact a primitive string. If that is the case, where do we find the method that we need to execute? The answer is that we look in the representative class of the type ofx
. For all non-hijacked classesC
, their representative class is themselves. Whenx
is a primitive string, its type isstring
but its representative class isjava.lang.String
, and thereforex.hashCode()
needs to calljava.lang.String.hashCode()
, passing the primitive string as the value forthis
. By the way, the representative class of all JavaScriptobject
s isjava.lang.Object
.In practice, this means that for a method call
x.m(...args)
where the static type ofx
is a super type of any of the hijacked classes, we cannot directly emit a virtual call. Instead, we first have to perform explicit type tests for the primitives. If we take the example of(x: Object).hashCode()
again, its actual run-time behavior will be as follows:x
is adouble
, calljava.lang.Double.hashCode()
.x
is aboolean
, calljava.lang.Boolean.hashCode()
.x
is astring
, calljava.lang.String.hashCode()
.x
is an instance of a Scala classC
, callC.hashCode()
.java.lang.Object.hashCode()
(for JavaScript objects).Hijacked classes, boxing, and JS interop
Scala.js as a language, and its IR, guarantee that if a primitive
int
is ever seen by JavaScript, it is a primitivenumber
in the i32 bounds. Likewise for all the other primitive types exceptchar
andlong
. For example, astring
is guaranteed to always be seen by JavaScript as a primitivestring
.These guarantees are preserved through upcasting and in generic contexts (which are basically upcasting since they erase to
Object
): if we upcast5: int
as anObject
, then give it to JavaScript, it's still a primitivenumber
.That is a very powerful property, which allows us to fearlessly use primitives and generic types in facades for JavaScript APIs. It is also a very constraining property for the compilation strategy. And this is where we finally get to talk about Wasm.
Representation of
int
When compiling to Wasm, we obviously want to compile
int
s toi32
s. We want primitive int additions to be compiled toi32.add
, for example. If we cannot do that, there is zero hope for performance of our implementation.If we give that
i32
directly to JavaScript through Wasm's JS exports, we will happily get a primitivenumber
. That's good.But now,
int
is also a subtype ofObject
, so what do we do when we upcast?i32
is definitely not a subtype of Wasm'sany
or any other type, for that matter. So we must perform some conversion. The question is: what do we convert it to?(I ignore
i31ref
in this discussion. Even if it appears to solve some issues, it cannot represent alli32
s, so it's moot anyway.)We could do something like what the JVM does, and box the
i32
in an actualclass Integer(value: Int)
. This has nice properties for virtual dispatch: we can avoid the multiple type tests when calling(x: Object).hashCode()
, since if it's anint
it will actually be represented as an instance ofInteger
, with its vtable.The problem, of course, is that if we pass that instance of
Integer
to JavaScript, it clearly won't be a primitivenumber
! That destroys our interoperability guarantees.At this point, we have to make a big decision:
My opinion is that, for a first implementation, the second option will lead us more quickly to getting the standard library working, and therefore to have a usable implementation. A big downside is that it constrains us to Wasm embeddings that have a JS host.
If we go with that, we have to represent our
int
, upcast toObject
, as a Wasm value that, when given to JS, will be a primitivenumber
. There isn't much room for choice at this point: it basically has to be a JSnumber
! Even ifObject
is encoded as the most wide Wasm type, namelyref null any
, how do we put anumber
in there? More: how do we put ani32
converted to anumber
in there?The solution is to use a JavaScript helper function. Surprisingly, that helper function is an identity:
The trick is to import into Wasm with a type that is not the identity:
When calling
upcastInt
in Wasm, we can pass it ani32
, and we'll receive anextern
ref that is a JSnumber
. We can then put that one in aref any
withany.convert_extern
.When downcasting back to
int
, we'll follow the reverse path.Now for virtual dispatch, we'll have to ask JavaScript to help us a bit when inspecting the value, but that shouldn't be too bad.
Other easy types
Representation of
byte
,short
,float
,double
These follow the same strategy as
int
. They must be seen by JavaScript asnumber
s.Representation of
undefined
,true
,false
Since there are exactly 3 of them, we can receive the JavaScript constant values
undefined
,true
andfalse
from JS and store them in an importedglobal
. When upcasting we can fetch the corresponding global. When downcasting, we have to ask JS for equality, since they cannot be downcast toeqref
.Representation of
char
andlong
These are opaque to JS in the Scala.js semantics. We can implement them with real Wasm classes following the correct vtable. Upcasting will wrap a primitive into the corresponding class, and downcasting will extract the primitive from the field. This would not leave Wasm code.
Representation of
string
Finally, we are left with
string
. In theory, we could use a solution similar toint
here, with a helper JS conversion method. The problem is that this conversion would have to be O(n), in both directions. We cannot really afford to have O(n) upcasting and downcasting ofstring
s.Instead, I think our primitive representation of
string
should already be a JavaScript string. This means that the primitive operatorsString_length
,String_charAt
andString_+
will need to use JS helpers. This may have some cost, but at least it will be an O(1) cost. On the good side, it means that we use all the additional features that JS gives us about strings without additional cost:substring
,indexOf
, case conversions, etc.For the future, there are two Wasm Proposals that can make this more efficient for us, without having to redesign anything:
stringref
proposal would be ideal, by directly giving us the right type to represent our strings this way. It would mean that the primitive operations would not have to leave Wasm anymore.stringref
does not make it, the JS String Builtins proposal would also help by making the primitive operations cheaper. We would be able to directly use the builtins instead of our own user-space helpers.Future considerations: not depending on JS at all
stringref
would be a very nice way to have strings that can stay within Wasm while also interoperating with JS. Could we get the same for the small numeric types? Perhaps there is room for our own Wasm proposal to introducef64ref
(and maybe eveni32ref
)? That's a thought, but to properly motive it I think we first need a working implementation, and then demonstrate that we needf64ref
for better performance.The text was updated successfully, but these errors were encountered: