Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Value types rather than int31 #53

Closed
fgmccabe opened this issue Dec 18, 2018 · 39 comments
Closed

Value types rather than int31 #53

fgmccabe opened this issue Dec 18, 2018 · 39 comments

Comments

@fgmccabe
Copy link

An alternative to the i31ref would be to support 'value types' as well as 'reference types'.
Semantically, one of the differences is that instead of the tagged value determining that the value is 'flat', the owner of the reference would have to assert the type.
The big pay-off is a uniformity of representability of data types (the only required primitive type would be bit string(s))
You can reconcile value types with existentially quantified types - at the expense of a more elaborate system of allocating local variables.
My suggestion would be to supplement reference types with value types - I am not advocating C++-style specialization of generated code.

@rossberg
Copy link
Member

rossberg commented Jan 9, 2019

A basic assumption of Wasm is that programs are lowered to the level of machine code as much as possible on the producer side. Language-level features like "value types" are hence assumed to be compiled away already and shouldn't require support from Wasm. (Although immutable structs are actually semantically equivalent to value types if reference equality is not available for them.)

I don't see how they address the same problem as int31ref, though. Since pointer tagging is the preferred technique for implementing GC in all existing Wasm VMs -- and AFAIK proven to be the most efficient technique in general --, the GC design has to allow for this implementation while still supporting unboxed integers in places where references are stored (see the discussion of polymorphism and uniform representation in the overview).

@titzer
Copy link
Contributor

titzer commented Jan 9, 2019

Another way to say this is that Wasm's design approach is to offer minimal mechanisms for high-level languages to compile to an abstract machine without (unreasonable) loss of performance. As such it exposes an abstraction that is an efficient reflection of today's CPU capabilities with minimal additional abstraction to make it safe (e.g. function boundaries, invisible callstack, indirect function tables, globals). For languages that implement polymorphism with a universal representation, like Ocaml, int31ref is pretty close to the minimal mechanism necessary to open up implementation techniques close to what these languages employ for native instruction sets. Wasm going "full ADT"--at least at this stage--is perhaps overkill, since this can be done by the language implementation. Since int31ref seems forward compatible with any ADT mechanism we might be tempted to add in the future, I think it makes sense.

@jdh30
Copy link

jdh30 commented Oct 21, 2019

Andreas Rossberg wrote:

proven to be the most efficient technique in general

Perhaps we're talking at cross purposes but I'd have said the opposite: OCaml and Java are slow because they box types like float * float (e.g. complex numbers, 2D/3D/4D vectors and matrices) and (int * float) array (e.g. hash tables, vertex buffers) and OCaml is even worse because it boxes type like i32 (e.g. RGBA colors, PRNG state).

Some relevant benchmarks:

Hash table insertion on solutions using tagging (OCaml/Haskell) is 4-10x slower than on .NET (no tagging): http://flyingfrogblog.blogspot.com/2013/09/hash-table-insertion-performance-f-vs.html

Another hash table benchmark showed that the tagging solution (JVM) was 17x slower than the non-tagging solution (.NET) http://fsharpnews.blogspot.com/2010/05/java-vs-f.html

OCaml is ~3x slower than .NET on the Monte Carlo task (a PRNG) of the SciMark2 benchmark which uses 31- or 32-bit ints http://flyingfrogblog.blogspot.com/2009/01/mono-22.html

On a hailstones benchmark that requires 64-bit ints the solutions using tagging (OCaml/Haskell) are 7-32x slower than those that do not (F#/HLVM) https://groups.google.com/forum/#!msg/fa.caml/yg3p8Txk4_o/7hDNY6ObLrYJ

On the ray tracer benchmark the solutions using tagging (Java/Haskell/OCaml) are 1.4-3x slower than those that do not (MLton, HLVM, C++) http://flyingfrogblog.blogspot.com/2010/01/hlvm-on-ray-tracer-language-comparison.html

@aardappel
Copy link

Thanks, those are some interesting numbers.

Besides efficiency, to me the bigger reason why I don't see the point of a baked in int31 type is that I don't see it as that widely applicable. The amount of languages/systems/formats/protocols that specify an int as having exactly 32 (or 8/16/64/same_as_cpu) bits has to dwarf the ones that say it is implementation defined and can conveniently pick 31 without funny things happening to certain kinds of values.

That said, I guess I am not against a feature that helps certain languages but is neutral for others, if its usefulness is sufficiently widespread.

@RossTate
Copy link
Contributor

i31ref is not own its own problematic for other languages, but having i31ref be a subtype of anyref forces all engines to use this packing scheme. This packing scheme is not the most efficient in general either; there are others that work better for languages with more complex numerical systems (e.g. common mixing of integers and floats).

@rossberg
Copy link
Member

@aardappel:

The amount of languages/systems/formats/protocols that specify an int as having exactly 32 (or 8/16/64/same_as_cpu) bits has to dwarf the ones that say it is implementation defined and can conveniently pick 31 without funny things happening to certain kinds of values.

As I discussed before, this is missing the point. We don't provide it for the purpose of exposing it in a source language. We provide it so that producers can use the very common implementation technique of (predictably!) unboxing small integers when compiling to GC types. The vast majority of implementations of dynamic or polymorphic languages rely on this -- I think you happen to be working on one of them. ;)

@RossTate:

having i31ref be a subtype of anyref forces all engines to use this packing scheme.

I don't see how it does. An engine could still use a different scheme. Though there are reasons why so many GC'ed runtimes use pointer tagging.

@RossTate
Copy link
Contributor

Here's a packing scheme we found worked really well for our language we just developed: if the low bits are 00, it's a ref; if they are 11, it's a signed integer constructed by bit-shifting right 2 bits and preserving the sign (with other signed integers represented using a ref); otherwise it's a (single word) float constructed by bit-wrapping right 3 bits (with other floats represented using a ref), unless the high 30 bits are 0 in which case it is constructed by big-wrapping right 2 bits. The reason for the float-packing design is that it is fast to pack and unpack and it ensures that all 32-bit floats representing numbers whose exponent's absolute value is small are packed. This enabled us to perform well on both integer-based numerical benchmarks and float-based numerical benchmarks rather than favoring one or the other (and of course there's a 64-bit version of this packing scheme as well).

How do you propose that my 32-bit engine use this packing scheme and support i31ref with the performance expectation that 31-bit integers be packed rather than boxed? What about the various packing schemes that rely on having references be 8-byte aligned and so can use the lower 3 bits for tagging? What about packing schemes in which the lower bits are used to distinguish between different types of references, such as using a bit to distinguish interior references so that the gc knows to walk it differently?

@rossberg
Copy link
Member

@RossTate, that sounds like a plausible scheme for a native implementation of a specific language, but I think I'm missing how that matters for Wasm. How would a Wasm engine benefit from such a scheme? Custom tagging or packing schemes would perhaps be nice for certain producers, but there is no way that Wasm could expose a mechanism for that to producers in a platform-neutral manner.

FWIW, extra tagging bits are compatible with i31ref. Bit 0 tells you whether it's an int or not. If not, it's a pointer that can use the next couple of bits for tagging, as alignment allows. V8 is one engine using this scheme (or once did).

@steveblackburn
Copy link

FWIW, we believed a tagged reference was important for the Mu micro vm (which has similar objectives to Wasm). We have tagref64, which is tag union of:

  • double,
  • int<52>,
  • ref<void> and int<6>

See the 'Tagged Reference' in the Mu spec or 5.2.4 of Kunshan's thesis.

@RossTate
Copy link
Contributor

Ah, so you found nan boxing was a better packed representation than a simply bit flag. I suspect nan hacking wouldn't do so well in the 32-bit setting, but that's just a detail. The important thing is that both general-purpose virtual-machine engines and specialized language implementations have found it worthwhile to pack floating-point numbers at the expense of reducing the number of integers that can be packed. This is how a wasm engine could benefit from my scheme (to answer your question). It is plausible that wasm has some special properties that make that not the case, but wasm's implementation strategies certainly have not been explored enough at this point to know that, so adding something like i31ref that will make those other packing strategies difficult seems premature.

FWIW, extra tagging bits are compatible with i31ref.

As for using i31ref to do these sort of packing tricks at the source level, the 31-bit representation is not always appropriate, and wasm doesn't always offer the low-level operations. For my example, both issues apply. On x86, an engine could unpack from my scheme by simply masking to get the lowest 2 bits, branching on the zero flag for the reference case, branching on the parity flag for the integer case (then sign-preserving shifting right 2 bits), and then falling into the float case (and rotating right 3 bits). But in wasm, the packing and unpacking process has to use a single bit to indicate whether the exponent started with 01 or 10, and this forces unpacking within the float case to either branch yet again (within just the float case) or use 2 additional registers to store the necessary intermediate values to achieve the result through bit-level instructions.

And all of this complexity is pointless if the wasm module happens to be executed on a 64-bit machine, in which case many packing schemes can easily store a full 32-bit integer.

So i31ref is simply at the wrong level of abstraction to a) be compatible with established packing schemes, b) be useful for a wide range of languages, and c) adapt well to various architectures (especially 32-bit vs. 64-bit).

@rossberg
Copy link
Member

@steveblackburn, that's interesting. So this is another type in the spirit of i31ref, fixing on a specific encoding. That could be a post-MVP feature. We actually discussed the viability of such a type in the past. There are some issues for Wasm, though: such a type won't easily work on 32 bit platforms, and Wasm requires strict portability; (2) injection into such a type would be lossy for f64, and Wasm elsewhere guarantees to maintain all its bits. Also, we weren't aware of many impls using NaN boxing in practice, it seems to be interesting primarily for languages like JavaScript, where floats are the only number type (and even there most engines don't use it). Are you aware of other interesting examples?

@RossTate:

The important thing is that both general-purpose virtual-machine engines and specialized language implementations have found it worthwhile to pack floating-point numbers at the expense of reducing the number of integers that can be packed. This is how a wasm engine could benefit from my scheme (to answer your question).

Not sure that answers my question. ;) Concretely, which Wasm reference type would be represented as an unboxed float, and how would you make its performance predictable?

And all of this complexity is pointless if the wasm module happens to be executed on a 64-bit machine, in which case many packing schemes can easily store a full 32-bit integer.

You are fixated on int32s, but that misses the point of this type, which is completely unrelated. It isn't meant to map native integer types from a source language. Nor is it limited to mapping integer types in the first place. Instead, a producer can use it for the representation of a broad range of scalar types, like booleans, enums, characters, small integers, etc. For these, they don't care whether the max range is 32, 31, or 28 bits, they are smaller anyway. But they do care that they are unboxed. So 31 bit is indeed a somewhat arbitrary choice, but we have to pick some portable max range, and it's the most common one by far.

So i31ref is simply at the wrong level of abstraction to a) be compatible with established packing schemes, b) be useful for a wide range of languages, and c) adapt well to various architectures (especially 32-bit vs. 64-bit).

Agreed with (a) on a more philosophical level, but I still haven't heard a suitable alternative given Wasm's portability constraint! Strongly disagreed with (b) and (c), see above.

Why not be concrete: What would be your counter proposal for supporting portable unboxed scalars under the Wasm GC proposal?

@jdh30
Copy link

jdh30 commented Oct 24, 2019

Why not be concrete: What would be your counter proposal for supporting portable unboxed scalars under the Wasm GC proposal?

Does Wasm need to support dynamic linking or could it assume whole-program compilation? If the latter, types are known at compile time and there is no need for any boxing or tagging.

@sabine
Copy link

sabine commented Oct 24, 2019

@jdh30 what is the downside of the .NET approach, i.e. for what kind of code is the OCaml tagging strategy faster than .NET? (Or does the OCaml strategy, generally, just consume less memory at the cost of being slower? Or does the OCaml strategy enable something that is not possible under the .NET garbage collector?) I'd really like to understand how the .NET GC works.

@rossberg I believe I understand this: We need i31ref as an escape hatch.
One problem, that this escape hatch can be useful for, is the inability to have a uniform representation of values for polymorphic functions (of ML-style languages) in the presence of typed WASM heap structs. Reason being that we can't easily compile a polymorphic function to a single WASM function if we need to provide types for every type (and all the combinations of types, if a function is polymorphic in more than one argument) our polymorphic function could be passed values of.
So, instead of compiling a polymorphic function to multiple WASM functions (which ones we could only discover by looking at the program as a whole, no longer on a per-module basis), one would use the anyref type together with explicit type casts to get a uniform value representation on the heap. Now, the polymorphic functions can operate on anyref arrays and the problem is solved, there is only a single, uniform type for values.
Except that the problem is not actually solved yet because the anyref type is only for references and ML-style uniform value representations also contain primitive values, and not just references to other heap blocks. Thus, in order to be able to store values other than references inside an anyref array, there is the proposed i31ref type that is a subtype of anyref and that one enables typecasting 31-bit integers to anyrefs forth and back.
And, I suppose, the argument continues that, since browsers do use tagging of primitive values already in JavaScript, exposing this via i31ref isn't a big deal (at least for the browsers).

This is my current mental model, I am not an expert at any of this, please try to correct where I am just plain wrong. Also, feel free to drop links to papers or anything that someone who wants to understand this should study.

@wingo
Copy link
Contributor

wingo commented Oct 24, 2019

@sabine Not sure if this answers a concern, but for what it's worth, it's perfectly possible to use anyref to get tagged integers into a program. Schism does this, for example; e.g here's a declaration of a make-number import, and here's the definition. There's no extra allocation overhead relative to i31ref.

Problem is, you have to call out to the runtime to do type predicates and to tag/detag the value. The i31ref proposal effectively allows us to pull those predicates / constructors / accessors into WebAssembly, and to use the type system to eliminate some run-time type checks that you have to do if you have a uniform value type.

@rossberg
Copy link
Member

@jdh30:

Does Wasm need to support dynamic linking

Yes, absolutely. Furthermore, whole-program monomorphisation approaches cannot handle first-class polymorphism or polymorphic recursion anyway, which are common today.

@sabine, yes, that's right. Uniform representation is one important motivation for tagging schemes. That is not at all specific to OCaml, but used by many or most dynamically or polymorphically typed languages. The approach used by .NET is the only expressive enough alternative, but it is waaaay more complicated (in Wasm, it would require dynamic generation and compilation of Wasm code, i.e., shipping a Wasm code generator with your web page), and the type passing involved can also become a significant cost.

@lars-t-hansen
Copy link
Contributor

Related discussion: WebAssembly/design#919

@RossTate
Copy link
Contributor

So 31 bit is indeed a somewhat arbitrary choice, but we have to pick some portable max range, and it's the most common one by far.

So you agree that 31 bits is an arbitrary choice that was made based on your understanding of what is common. What I am trying to let you know is that, because of this project, I have spent the last year asking people how their gc's work, and what I saw was that, while 31 bits was common, it was not as common as you seem to think it is. For 32-bit machines, 30 bits was quite common, and 29 bits was used a decent amount as well. For 64-bit machines, 63-bit, 62-bit, 61-bit, and (as in @steveblackburn's example) 52-bit were all common. I have seen experiments showing that many of these schemes perform better than 31 bits. I gave you a concrete example and the intuition as to why it performs better (i.e. floats can be packed in addition to ints - something that is perfectly plausible to be useful to wasm engines since at some point you'll have some subtype of anyref that denotes/references floats).

We provide it so that producers can use the very common implementation technique of (predictably!) unboxing small integers when compiling to GC types.

You keep bringing up "predictable" performance, but this is not the right measure. What someone compiling to gc-wasm wants (beyond generally good performance) is to know that they picked the implementation strategy for their language that will work best for each engine. That is, they want to be able to evaluate the tradeoffs of various implementation strategies. So something we want to avoid is for someone to compile to i31ref, using all 31 bits, then discover that in fact the integer is being boxed when the high bit is set because the engine uses a 30-bit or 29-bit strategy, and then think "If I had known that would happen I would have used a different implementation strategy."

Predictable performance is just a proxy for this ability to evaluate tradeoffs. This is important because there are many wasm programs/operations that are going to perform differently on 32-bit versus 64-bit machines, and it's unreasonable to expect performance to be predictable across such significant differences in hardware. What's actually important is that the optimal way to compile a program to wasm running on a 32-bit machine is also the optimal way to compile a program to wasm running on a 64-bit machine.

What would be your counter proposal for supporting portable unboxed scalars under the Wasm GC proposal?

I can think of many, but right now I'll start with the simplest, building upon the above principle. I'm also going to focus on values that belong to anyref rather than values that are internal to a wasm module (where the latter has room for more language-specific optimizations). I think I saw a similar idea by @wingo in some thread I've forgotten, but I believe the details differ a bit here.

Add an intref subtype of anyref. Add s32_to_intref, u32_to_intref, s64_to_intref, u64_to_intref instructions for signed/unsigned conversion from fixed-size integers to intrefs. Add intref_to_s32, intref_to_u32, intref_to_s64, intref_to_u64 instructions for signed/unsigned conversion from intrefs to fixed-size integers. Have the intref_to_xxx instructions fail if the integer value referenced by the intref is out of the range of the integers expressible by the target type. (Add instructions for testing whether the extraction would trap, and an instruction for casting anyref to intref.) Then convey the expectation that intrefs will be packed when the represented integer is suitably small, but that the precise boundary will vary by machine and engine.

@rossberg
Copy link
Member

rossberg commented Oct 24, 2019

what I saw was that, while 31 bits was common, it was not as common as you seem to think it is. For 32-bit machines, 30 bits was quite common, and 29 bits was used a decent amount as well.

I wouldn't even mind if it was an i30ref, it doesn't really matter. Though I'd want to see real evidence that that is a better choice, that it would benefit any of the existing engines (which already use 31 bits), and that it would benefit Wasm even hypothetically. In my mind, your example has not been evidence for that yet (see below).

For 64-bit machines, 63-bit, 62-bit, 61-bit, and (as in @steveblackburn's example) 52-bit were all common

That doesn't matter, because we cannot use any 64-bit-only limit without defeating the purpose.

I gave you a concrete example and the intuition as to why it performs better (i.e. floats can be packed in addition to ints - something that is perfectly plausible to be useful to wasm engines since at some point you'll have some subtype of anyref that denotes/references floats).

Are you proposing to introduce an f32ref type? But your encoding is lossy, so this cannot be an f32, but rather an f30ref. Unless the engine is implicitly allocating and boxing in some cases, but see next point.

Predictable performance is just a proxy for this ability to evaluate tradeoffs.

I think that is where the fundamental disagreement lies. You are talking about semantics that introduces implicit allocations. I would argue that that is a dubious thing to have in a low-level VM, and moreover, in many circumstances its use wouldn't be safe for space. So it would not be attractive for producers in many cases, and certainly not the ones I talked about.

Add an intref subtype of anyref. Add s32_to_intref, u32_to_intref, s64_to_intref, u64_to_intref instructions for signed/unsigned conversion from fixed-size integers to intrefs. Add intref_to_s32, intref_to_u32, intref_to_s64, intref_to_u64 instructions for signed/unsigned conversion from intrefs to fixed-size integers.

You can do that, but that is a more expensive type (that has the aforementioned issues). It quite simply solves a different problem. If you go back to the older threads, we have already discussed all that. It is conceivable that we may want to add a type like i32ref or i64ref in the future. But it doesn't solve the problem that I was asking about and which i31ref solves.

Specifically, on a 32 bit platform intref would make every injection and projection involve an extra check and branch, and in addition, for every injection the 32 bit jit would have to generate a code path for allocation. That is completely redundant overhead when the producer already knows that the value can never exceed the limit of an unboxed value -- why would you want to generate allocation code when storing a two-valued enum on the heap? But the Wasm engine can't know that it doesn't need that path, unless you give the producer a way to communicate that knowledge. And that is exactly what a type like i31ref does.

@RossTate
Copy link
Contributor

But the Wasm engine can't know that it doesn't need that path, unless you give the producer a way to communicate that knowledge.

Good point, although easily fixed. Add to the uxx_to_intref instructions an integer parameter indicating how many bits to consider. Add to the intref_to_uxx instructions an integer parameter indicating how many bits to permit (trapping if the contained value is out of range).

So it would not be attractive for producers in many cases, and certainly not the ones I talked about.

You haven't talked about a single concrete producer in this thread. The only concrete example of how you expect this to be used is booleans in the Overview, so I'll demonstrate that use case.

Suppose an OCaml or Haskell-like language is being compiled to wasm. It's compiling the expression let x = f a b in ... where f is a top-level function of type forall t. t -> t -> t and where a and b are local variables of type Bool. The wasm locals (or stack slots) aloc, bloc, and xloc corresponding to a, b, and x are using an unboxed representation of the Bool type, i.e. they have type i32. There's a wasm function ffunc corresponding to f that takes two anyrefs and returns an anyref.

In my proposal, the compilation would be (forgiving syntactic slip ups):

u32_to_intref 1
local.get bloc
u32_to_intref 1
call ffunc
cast_anyref_to_intref
intref_to_u32 1
local.set xloc

For any engine with at least a bit of packable (unsigned) integers, this will perform almost identically to i31ref. The one issue is that the intref_to_u32 1 has to trap even if the value is packable but out of bounds (which is easily checked). That tiny bit of overhead can go away when we get a full gc proposal rolled out that can reason about polymorphism. In the meanwhile, unlike i31ref, this adapts well to a variety of engine designs.

Furthermore, for any engine that packs exactly 31-bit integers, u32_to_intref 31 and the other 31-bit instructions in my proposal will perform just as well as they would with i31ref (and only a tiny bit worse if they can pack more than 31-bit integers).

So why do you insist that this would be unattractive for producers when it performs just as well and is more adaptable?

I would argue that that is a dubious thing to have in a low-level VM

Let me know how the above doesn't address what you would argue for this use case.

@aardappel
Copy link

As I discussed before, this is missing the point. We don't provide it for the purpose of exposing it in a source language. We provide it so that producers can use the very common implementation technique of (predictably!) unboxing small integers when compiling to GC types.

That sounds exactly like exposing it to me.. since either the source language needs to be aware of the fact that this is how small integers are represented (to be both efficient and know what values can be represented) or needs to pay the cost of dynamically checking for values that use the full 32-bits and "upgrading" them as need be (which seems very undesirable in an efficiency oriented system like Wasm).

The vast majority of implementations of dynamic or polymorphic languages rely on this -- I think you happen to be working on one of them. ;)

I don't think I am. I guess you are referring to JS, and as far as I can tell, Wasm+GC as it currently stands would be a fairly unattractive target to directly compile JS to (much like other very dynamic languages), regardless of the presence of i31ref, so I don't see how that is relevant.

When it comes to more statically typed languages, most of those I have worked on or am aware of are statically able to tell whether a given 32-bit quantity is all integer, float or pointer bits, and don't need this tagging to function. Your "The vast majority of implementations" seems to be my "niche", so I don't think we're going to resolve this :)

@RossTate
Copy link
Contributor

I just remembered, one way to do weak references is to use a bit flag to distinguish between standard references and weak references. But this strategy also means you only have room to pack 30-bit integers. It also has the interesting property that two of the distinguished cases are references rather than packed values.

@rossberg
Copy link
Member

@RossTate:

But the Wasm engine can't know that it doesn't need that path, unless you give the producer a way to communicate that knowledge.

Good point, although easily fixed. Add to the uxx_to_intref instructions an integer parameter indicating how many bits to consider. Add to the intref_to_uxx instructions an integer parameter indicating how many bits to permit (trapping if the contained value is out of range).

No, that loses the information after injection. Consequently, projection still has to branch. Worse, it also has to do an additional range check now (another branch), so projection is even more expensive than with the plain instruction. No, the value range has to be evident in the type.

Moreover, what is the maximum XX you are going to support? If you pick some N then what’s the point of supporting smaller ones? Or do you want to support all N and not say which ones are guaranteed to be unboxed? Then you have again defeated the purpose.

(An obvious example where a producer needs to know a guaranteed unboxing limit is the efficient implementation of bignums. A performant implementation has to unbox small values. But if the producer can’t rely on a certain type being unboxed then whatever choice it makes may turn out to be worse than not trying to unbox at all, since now both the producer and the engine have to insert code for making the distinction and branching based on it.)

Let me reiterate the requirements:

  • A scalar type that is known to be unboxed and provides efficient (i.e. non-branching) injection/projection across all relevant platforms.

  • The largest possible value range that is feasible for that.

AFAICS, providing an iNref type for some N < 32 is the only way to meet these requirements. Whether N is 30 or 31 bits or something else is secondary, and I’m certainly open to picking another value if that was a clear win.

You haven't talked about a single concrete producer in this thread. The only concrete example of how you expect this to be used is booleans in the Overview, so I'll demonstrate that use case.

Upthread I listed a number of examples of source-level types that ask for unboxed representations:

a broad range of scalar types, like booleans, enums, characters, small integers, etc.

This is pretty much universal to all producers.

For any engine with at least a bit of packable (unsigned) integers, this will perform almost identically to i31ref. The one issue is that the intref_to_u32 1 has to trap even if the value is packable but out of bounds (which is easily checked).

Disagreed, see above.

That tiny bit of overhead can go away when we get a full gc proposal rolled out that can reason about polymorphism. In the meanwhile, unlike i31ref, this adapts well to a variety of engine designs.

Providing polymorphism does not solve this. Dynamic languages won’t benefit from it and will always require uniform representations; likewise, most polymorphic languages use uniform representations and tend to have forms of polymorphism that the Wasm type system will never be able to handle fully, so they won’t benefit either; and even if they could, they will certainly not want to redo their entire implementation for an efficient Wasm port.

Uniform representation is a fact of life and a low-level VM like Wasm has to support it efficiently.

I just remembered, one way to do weak references is to use a bit flag to distinguish between standard references and weak references. But this strategy also means you only have room to pack 30-bit integers. It also has the interesting property that two of the distinguished cases are references rather than packed values.

Okay, first, it does not mean that, because the second bit is only needed for pointers. Second, why would anybody possibly want to optimise their heap representation around weak references by putting an extra cost on all others???

@rossberg
Copy link
Member

rossberg commented Oct 25, 2019

@aardappel:

As I discussed before, this is missing the point. We don't provide it for the purpose of exposing it in a source language. We provide it so that producers can use the very common implementation technique of (predictably!) unboxing small integers when compiling to GC types.

That sounds exactly like exposing it to me.. since either the source language needs to be aware of the fact that this is how small integers are represented (to be both efficient and know what values can be represented)

The producer, i.e., the source language compiler certainly needs to be aware of such limits -- as with every architecture they target, physical or virtual. But it is in no way necessary to expose it to the programmer.

or needs to pay the cost of dynamically checking for values that use the full 32-bits and "upgrading" them as need be (which seems very undesirable in an efficiency oriented system like Wasm).

Only if you are talking about int32. Not in the case of all the other types I mentioned that are represented as small ints. You guys are way to fixated on int32 and keep overlooking everything else. ;)

For int32, somebody has to do these checks, at least on 32 bit platforms. For Web assembly it wouldn't be unnatural to do that in user land. As I have said several times before, it is also conceivable that we introduce an i32ref type as well to defer that choice to the engine (such that it can be avoided on 64 bit platforms). But that is a different use case from i31ref, and a less fundamental one. (And it actually is unnecessary to make that a feature, as engines could simply unbox structs with a single immutable i32 field to achieve the exact same effect.)

The vast majority of implementations of dynamic or polymorphic languages rely on this -- I think you happen to be working on one of them. ;)

I don't think I am. I guess you are referring to JS, and as far as I can tell, Wasm+GC as it currently stands would be a fairly unattractive target to directly compile JS to (much like other very dynamic languages), regardless of the presence of i31ref, so I don't see how that is relevant.

Plenty of people have asked about compiling JS to Wasm. I agree with you that that is not an attractive thing to do, but for reasons that are mostly unrelated to GC. That does not change the general observation, though.

When it comes to more statically typed languages, most of those I have worked on or am aware of are statically able to tell whether a given 32-bit quantity is all integer, float or pointer bits, and don't need this tagging to function. Your "The vast majority of implementations" seems to be my "niche", so I don't think we're going to resolve this :)

Let's see. For dynamic languages there are two possible heap representations:

  1. Pointer tagging, unboxing small scalars
  2. Boxing everything

Every serious dynamic language implementation uses (1).

For polymorphic languages, there are these:

  1. Pointer tagging, unboxing small scalars
  2. Type passing, unboxing native scalars, runtime type dispatch
  3. Type passing, unboxing native scalars, runtime code specialisation
  4. Boxing everything
  5. Static code specialisation

For example, pretty much all functional languages use (1). I am not aware of any major language using (2), since it's known to be fairly expensive. C# is the main example of (3), and that requires a jit (i.e., is incompatible with ahead-of-time compilation), obviously is very complicated to implement (even more so on top of Wasm), so will likely remain a rare technique. Java and JVM-languages effectively do (4), which is pretty terrible for the performance of polymorphism, but from what I heard there are plans to move away from it. C++ is an example of (5), but that solution only works for limited forms of 2nd-class polymorphism, so is not applicable to most modern languages (it's also inherently unmodular).

@KronicDeth
Copy link

In Lumen (Erlang/Elixir AOT compiler) for 32-bit (and therefore wasm32-unknown-unknown), we use approximately the same tagging scheme as the BEAM

It uses 3 levels of 2-bit tags. Pointers (boxes and lists) and small integers use the lower 2 bits tags. That means small integers are stored as i30 << 2 and floats (f64) are behind a pointer as Erlang only has f64 and no f32.

@rossberg
Copy link
Member

rossberg commented Oct 25, 2019

@KronicDeath, thanks for the data point. I think this would almost map to the GC proposal, except for the special cons tag (which is popular among Prolog and Lisp implementations as well).

In particular, the Header tag wouldn't be needed when using GC types, since the respective information becomes abstracted away be the Wasm engine. Immediate would map to i31ref, with the secondary and tertiary tags being mapped into the int.

I don't think much can be done about the cons tag. Supporting such a custom tag would require engines to reserve a "user bit" in the representation of all reference values. That is most likely incompatible with at least some existing engines. The closest we can do is introducing variant types in some future extension. Until then, cons will require a regular in-block tag.

@KronicDeth
Copy link

except for the special cons tag (which is popular among Prolog and Lisp implementations as well).

Erlang was originally implemented in Prolog, so that is Erlang showing its heritage.

@aardappel
Copy link

@rossberg

That sounds exactly like exposing it to me.. since either the source language needs to be aware of the fact that this is how small integers are represented (to be both efficient and know what values can be represented)

The producer, i.e., the source language compiler certainly needs to be aware of such limits -- as with every architecture they target, physical or virtual. But it is in no way necessary to expose it to the programmer.

Not necessary, but practically required. Many languages (certainly the ones that map well to Wasm+GC) make guarantees about the number of bits in a type, or don't want performance to fall off a cliff. So the language hiding this from the programmer would be very undesirable. "Your int can fit 32bits.. but if you use them, we will transition to heap allocation!" ?

Only if you are talking about int32. Not in the case of all the other types I mentioned that are represented as small ints. You guys are way to fixated on int32 and keep overlooking everything else. ;)

Because that is what languages need. What languages, ones that are a good fit for Wasm and the current GC proposal, are cool with.. you don't really need to know how big an int is. It may heap allocate at some random value?

For int32, somebody has to do these checks, at least on 32 bit platforms.

Why? Knowing this statically doesn't seem impossible to me, in fact, common, and desirable.

Plenty of people have asked about compiling JS to Wasm. I agree with you that that is not an attractive thing to do, but for reasons that are mostly unrelated to GC. That does not change the general observation, though.

They ask it because it is an obvious question, not because it has a sensible answer. Most people have no clue what magic V8 and friends do to their code, and why thus why compiling to Wasm likely will make it (a LOT) slower rather than faster.

Either way, my point is in designing features like i31ref, languages like JS are not a strong argument of what kind of languages it should serve. Look towards Java, C#, Go, Kotlin and friends.

Let's see. For dynamic languages there are two possible heap representations:

  1. Pointer tagging, unboxing small scalars
  2. Boxing everything

Every serious dynamic language implementation uses (1).

Again, I am not considering dynamic languages here. If we wanted to support them realistically (e.g. competitively with their current implementations) we need a very different GC design and/or special Wasm features to support them (JIT?).

For polymorphic languages, there are these:

  1. Pointer tagging, unboxing small scalars
  2. Type passing, unboxing native scalars, runtime type dispatch
  3. Type passing, unboxing native scalars, runtime code specialisation
  4. Boxing everything
  5. Static code specialisation

For example, pretty much all functional languages use (1). I am not aware of any major language using (2), since it's known to be fairly expensive. C# is the main example of (3), and that obviously is very complicated to implement (even more so on top of Wasm), so will likely remain a rare technique. Java and JVM-languages effectively do (4), which is pretty terrible for the performance of polymorphism, but from what I heard there are plans to move away from it. C++ is an example of (5), but that solution only works for limited forms of 2nd-class polymorphism, so is not applicable to most modern languages (it's also inherently unmodular).

Yes, so 2/3/4/5 encompasses languages which can say "int is 32 naked bits always, and has no magic performance cliffs you don't control yourself" (Java's Integer is not the same as int). It also encompasses the mainstream "very static" languages like C/C++/Rust and the mostly statically typed mainstream GC languages like Java/C#/Go/Kotlin.

1 then is all about FPLs, Ocaml, Haskell, and friends. Some of these languages probably could be using better implementation techniques like 3 or 5 but choose not to.

So lets be clear here, i31ref is thus about supporting existing implementation techniques of certain statically typed FPLs. I have no problem with this. I am on the "let Wasm become the ultimate VM for an as wide set of languages as possible" bandwagon. But lets not pretend this is somehow the vast majority of implementations or languages, or is the only way things can work.

@RossTate
Copy link
Contributor

No, that loses the information after injection. Consequently, projection still has to branch.

What information? If the (static) integer parameter to the instruction is less than the number of packable bits, then it knows to trap if the value is a reference (something it has to check for with i31ref anyways).

Worse, it also has to do an additional range check now (another branch), so projection is even more expensive than with the plain instruction. No, the value range has to be evident in the type.

You execute a bitwise-and and then trap if the result is non-zero. I've implemented this for other systems. The cost is trivial. If you really want to get rid of the branch, you can do a tiny more bitwise arithmetic to merge this with the check that it's not a reference.

Moreover, what is the maximum XX you are going to support? If you pick some N then what’s the point of supporting smaller ones? Or do you want to support all N and not say which ones are guaranteed to be unboxed? Then you have again defeated the purpose.

I explicitly said that there is no guarantee how many bits will be boxed, and I gave a big argument about why this is useful without knowing that number. Take my example with the booleans. If the engine boxes the boolean, then it boxes the boolean. That engine happened to decide not to do packed integers, whether the reason be simplicity, confidence in correctness, ease of formal verification, or performance tradeoffs elsewhere. The point is that, for that engine, this is still the optimal way to support uniform representation of booleans. That is the purpose: to provide a way for wasm programs to uniformly represent small scalar values in a manner that will be implemented optimally for the engine at hand whatever it may be.

An obvious example where a producer needs to know a guaranteed unboxing limit is the efficient implementation of bignums.

You explicitly told me to focus on small scalar values and not on i32 or i64, which I would presume extends to not bignums. Yes, bignums needs a solution, but per your specification I'm not solving it with this design (and I don't want to digress into yet another topic so I'm not going to go into the many possible solutions).

A scalar type that is known to be unboxed and provides efficient (i.e. non-branching) injection/projection across all relevant platforms.

i31ref already fails this "requirement". The value of i31ref comes from being a subtype of anyref. The injections and projections a producer cares about are not those between ints and i31ref but those between ints and anyref.

Whether N is 30 or 31 bits or something else is secondary, and I’m certainly open to picking another value if that was a clear win.

How about 16? Since you're insisting on not 32, that seems like the natural next step down. It's big enough to pack "booleans, enums, characters, small integers", so it satisfies your stated use cases. It's small enough to allow a lot of packing schemes.

even if they could, they will certainly not want to redo their entire implementation for an efficient Wasm port

Wait, what? A bunch of these languages have javascript implementations. They're redoing their entire implementations for wasm solely for efficiency. And that's a huge effort because wasm is so different! What makes you assume they wouldn't do a much smaller revision for a more efficient wasm port? (Language teams have already told me to expect to do this.)

Uniform representation is a fact of life and a low-level VM like Wasm has to support it efficiently.

I never said it wasn't. We're just disagreeing about how to enable uniform representation.

Okay, first, it does not mean that, because the second bit is only needed for pointers.

Ah, good point!

Second, why would anybody possibly want to optimise their heap representation around weak references by putting an extra cost on all others???

It doesn't impose an extra cost. The surface language doesn't let you dereference a pointer without knowing its static type (because after all you don't even know that it's a pointer without its static type in a uniformly represented language!) The static type tells you whether its a normal reference or a weak reference, so the compiler knows to apply the mask only in the latter case.

Why producer-level packing is a bad idea

In a lot of languages, something like enum Suit = Hearts | Clubs | Spades | Diamonds is semantically equivalent to enum Suit = Diamonds | Spades | Clubs | Hearts. That is, the order the cases are declared is irrelevant to the program. But the i31ref strategy of having packing be done on the producer-side rather than the engine-side means that these programs will likely not compile to semantically equivalent wasm modules (supposing the module uses anyref to communicate with the outside world, as is one of its intended purposes). That is, this strategy implicitly exposes a lot of the producer-level internals to the outside world. @rossberg, you're always emphasizing how wary people need to be of the crazy hacks web developers do and how easy it is to get trapped by reliance upon these hacks. That mentality seems to suggest this sort of failure of abstraction should be discouraged as well.

@Pauan
Copy link

Pauan commented Oct 25, 2019

@aardappel No, it isn't just "statically typed functional languages", there's also the wide range of dynamically typed languages, such as almost all Lisps, Python, Lua, Perl, Julia, Ruby, etc.

They all need uniform representation, and they don't want to pay the cost of boxing small integers, so 31-bit ints are very common in compilers for those languages (with auto-promotion to boxed integers).

Basically any language which isn't static will likely want 31-bit ints. I think you're focusing far too much on static languages while ignoring the large number of (very popular) dynamic languages.

There's nothing wrong with Wasm including a feature which is useless for static languages but really beneficial for dynamic languages (or vice versa).

@aardappel
Copy link

@Pauan if you read my full posts above, you can see I actually address dynamic languages. I don't think the current GC proposal caters to them (they are currently much better served by linear memory or even JS objects), but feel free to prove me wrong.

@Pauan
Copy link

Pauan commented Oct 26, 2019

@aardappel That seems like an odd argument to me. You're essentially saying "we shouldn't have int31 because the GC MVP is currently sub-par for dynamic languages".

Aren't those orthogonal concerns? And in any case, wouldn't the solution instead be to improve the GC so that it is suitable for dynamic languages (which is the long term plan)? So I don't see how removing int31 is a good solution.

@rossberg
Copy link
Member

rossberg commented Oct 26, 2019

@aardappel

Many languages (certainly the ones that map well to Wasm+GC) make guarantees about the number of bits in a type, or don't want performance to fall off a cliff. So the language hiding this from the programmer would be very undesirable. "Your int can fit 32bits.. but if you use them, we will transition to heap allocation!" ?

I'm starting to suspect that there is some misunderstanding about what boxing scenarios we are even talking about. The GC proposal readily supports unboxed native numbers in almost all situations, i.e., locally or when stored inside structs or arrays on the heap. Numeric values inside structs and arrays are not tagged! The only case where boxing would be necessary is when you needed to inject a standalone number into the heap, e.g., to pass it to a generic function as a reference. But that's exactly the situation where a language like Java will box its number types as well!

So it is no worse than Java. You simply box native numbers in that situation and thereby avoid any of the more subtle performance cliffs you mention.

It's even less of a problem if your language's generics are weak enough to allow monomorphisation, i.e., static specialisation a la C++ templates. Then you'll never run into the case at all.

Does that alleviate your concerns?

What languages, ones that are a good fit for Wasm and the current GC proposal, are cool with.. you don't really need to know how big an int is. It may heap allocate at some random value?

For int32, somebody has to do these checks, at least on 32 bit platforms.

Why? Knowing this statically doesn't seem impossible to me, in fact, common, and desirable.

As clarified above, I believe for the kinds of scenarios you have in mind that is certainly the case, whether Wasm GC uses tagging or not.

Either way, my point is in designing features like i31ref, languages like JS are not a strong argument of what kind of languages it should serve. Look towards Java, C#, Go, Kotlin and friends.

That is a very narrow set of languages, all from the same corner of the spectrum, and none of them is dynamic or handles polymorphism very efficiently. It's a stated goal of the GC work to take a broader perspective than that.

The combination of heterogeneous structs/arrays and unboxed scalars in the proposal is meant to support both the Java-like use case and the FP use case equally well (except that the latter will still need lots of casts in the MVP). Makes sense?

(Go is way tougher, due to its decision to make interior pointers statically indistinguishable from regular ones. I'm not sure what the chances are that all engines would be willing to support that efficiently. For the MVP, we certainly have to pass on that one.)

Let's see. For dynamic languages there are two possible heap representations:

  1. Pointer tagging, unboxing small scalars
  2. Boxing everything

Every serious dynamic language implementation uses (1).

Again, I am not considering dynamic languages here. If we wanted to support them realistically (e.g. competitively with their current implementations) we need a very different GC design and/or special Wasm features to support them (JIT?).

People are implementing languages like Scheme and Erlang in Wasm today, so I don't know what exactly you base that claim on. Fortunately, not all dynamic languages are even remotely as crazy as JavaScript. The GC proposal should be a sufficiently decent fit for the others -- that's the intention anyway. Do you care to elaborate why you think it isn't?

So lets be clear here, i31ref is thus about supporting existing implementation techniques of certain statically typed FPLs. I have no problem with this. I am on the "let Wasm become the ultimate VM for an as wide set of languages as possible" bandwagon. But lets not pretend this is somehow the vast majority of implementations or languages, or is the only way things can work.

You seem to be implying two things: no GC'ed language besides the narrow breed of Java-like ones really matters (including dynamic ones). And that all these other languages picked their implementation approach either out of laziness or out of ignorance. Neither is true, and I would suggest to be more thoughtful with assumptions like that. As mentioned, (5) is not an option for almost any modern language. (3) relies on jitting, i.e., is inherently incompatible with AOT, and generally is the most expensive and complex approach (which I doubt would fare well with the double stage jitting it would induce on top of Wasm).

To clarify, high-level languages like functional ones typically choose tagging because their use of polymorphism tends to be way more expressive and pervasive (as in: every other function call is likely polymorphic). That puts their profile much closer to dynamic languages. With such a profile, heavyweight techniques like (2) or (3) quickly turn out to be inferior, let alone (4).

@rossberg
Copy link
Member

rossberg commented Oct 27, 2019

@RossTate:

No, that loses the information after injection. Consequently, projection still has to branch.

What information? If the (static) integer parameter to the instruction is less than the number of packable bits, then it knows to trap if the value is a reference (something it has to check for with i31ref anyways).

As you say, this implication only holds if the integer parameter is less than the unboxing limit. If the producer cannot know what the unboxing limit is then it generally has no way of picking an adequate parameter. It must assume the worst case for any choice it makes. That worst case is two checks and two branches on every access to an "unboxed" value on the heap -- as opposed to a mere shift with an explicit type like i31ref.

An obvious example where a producer needs to know a guaranteed unboxing limit is the efficient implementation of bignums.

You explicitly told me to focus on small scalar values and not on i32 or i64, which I would presume extends to not bignums. Yes, bignums needs a solution, but per your specification I'm not solving it with this design (and I don't want to digress into yet another topic so I'm not going to go into the many possible solutions).

Sorry if I didn't bring up this particular example before. There is a lot of rationale behind the design, some implicit, some explicit. I cannot exhaustively repeat it in every comment, nor can I always guess which goals are self-evident to whom. Either way, I think we agree that this clearly is a relevant use case for unboxing (and representative of a larger class).

(FWIW, a lot of the points I keep reiterating in this thread are already explained in the Overview, though I wouldn't claim that to be exhaustive either. I don't think it mentions the bignum example.)

A scalar type that is known to be unboxed and provides efficient (i.e. non-branching) injection/projection across all relevant platforms.

i31ref already fails this "requirement". The value of i31ref comes from being a subtype of anyref. The injections and projections a producer cares about are not those between ints and i31ref but those between ints and anyref.

Sorry, I don't follow. The anyref->intref test is the only branch, and always needed. After that, it should not be necessary to repeat the pointer test in the generated code.

Whether N is 30 or 31 bits or something else is secondary, and I’m certainly open to picking another value if that was a clear win.

How about 16? Since you're insisting on not 32, that seems like the natural next step down. It's big enough to pack "booleans, enums, characters, small integers", so it satisfies your stated use cases. It's small enough to allow a lot of packing schemes.

In today's world of Unicode, characters are 21 bits. ;)

Also, this obviously is very wasteful, see the second bullet. For example, every bignum would take up approximately twice as much memory. At the same time, there is no conceivable engine architecture that would align individual allocations to 64 K and benefit from the ability to use 16 tag bits, so this seems like a rather silly choice to make.

even if they could, they will certainly not want to redo their entire implementation for an efficient Wasm port

Wait, what? A bunch of these languages have javascript implementations. They're redoing their entire implementations for wasm solely for efficiency.

Building a compiler to Wasm off a JS backend (other than asm.js) is a terrible fit -- one is a very high-level target, the other low-level, and they have very little in common. Any language that has an existing native code or VM backend (i.e., most) will want to build their Wasm compiler off such a middle/backend. I am not aware of many counter-examples.

And that's a huge effort because wasm is so different! What makes you assume they wouldn't do a much smaller revision for a more efficient wasm port? (Language teams have already told me to expect to do this.)

From what I have seen, this wouldn't be a small revision.

Okay, first, it does not mean that, because the second bit is only needed for pointers.

Ah, good point!

Second, why would anybody possibly want to optimise their heap representation around weak references by putting an extra cost on all others???

It doesn't impose an extra cost. The surface language doesn't let you dereference a pointer without knowing its static type (because after all you don't even know that it's a pointer without its static type in a uniformly represented language!) The static type tells you whether its a normal reference or a weak reference, so the compiler knows to apply the mask only in the latter case.

Always remember: knowledge of the engine =/= knowledge of the producer. The engine cannot know that, unless the producer can communicate that knowledge through the Wasm type system. But if the Wasm type system distinguishes weak references from others then there is no need to distinguish them dynamically anymore (unless you allow injecting them into anyref, but there is little point in allowing that, and even if we did, an extra indirection for weak refs in the engine solves that just fine).

The extra cost with multiple pointer tags is that dereferencing always has to do masking instead of using static offsets. In case you haven't seen, one particularly cheap tagging scheme is to set the lsb to 1 for pointers and use offset -1 in all load/store instructions, which is essentially free on many architectures. But that only works if there is only a single pointer tag.

Why producer-level packing is a bad idea

In a lot of languages, something like enum Suit = Hearts | Clubs | Spades | Diamonds is semantically equivalent to enum Suit = Diamonds | Spades | Clubs | Hearts. That is, the order the cases are declared is irrelevant to the program. But the i31ref strategy of having packing be done on the producer-side rather than the engine-side means that these programs will likely not compile to semantically equivalent wasm modules (supposing the module uses anyref to communicate with the outside world, as is one of its intended purposes). That is, this strategy implicitly exposes a lot of the producer-level internals to the outside world. @rossberg, you're always emphasizing how wary people need to be of the crazy hacks web developers do and how easy it is to get trapped by reliance upon these hacks. That mentality seems to suggest this sort of failure of abstraction should be discouraged as well.

Sorry, but I really don't know what you are talking about.

First, in a language where enums are equivalent up to reordering, the compiler can simply normalise the order lexicographically. Alternatively, it can use hashes of the constructor names as their runtime representatives (which even works with structural subtyping or row polymorphism). I can easily point you to examples of both techniques in existing compilers.

Second, and more importantly, this has absolutely nothing to do with tagging or whether you map the enum to 31 or 32 bit integers in Wasm. I also don't follow what this has to do with hacks on the web, or why it is exposing anything to the "world" other than the engine, or in a way that is specific to enums...

@wingo
Copy link
Contributor

wingo commented Oct 28, 2019

@aardappel writes:

I don't think the current GC proposal caters to [dynamic languages] (they are currently much better served by linear memory or even JS objects), but feel free to prove me wrong.

I have worked on a Scheme implementation that used to use linear memory and currently uses JS objects via anyref. Linear memory is bad because you have to maintain a shadow stack to mark, and you cannot solve cycles between wasm and the host (JS). JS solves those problems but introduces more dynamic checks and run-time call-outs than would be necessary; it's not in the spirit of a low-level VM. The GC MVP solves all of these problems, quite neatly IMO.

@aardappel
Copy link

aardappel commented Oct 28, 2019

@rossberg

Does that alleviate your concerns?

I am quite aware i32's in structs etc. are untagged. My concern is about the usefulness of i32ref at all, which has not been alleviated. Even if it strictly doesn't make things worse compared to not having it, every feature has some cost.

That is a very narrow set of languages, all from the same corner of the spectrum, and none of them is dynamic or handles polymorphism very efficiently. It's a stated goal of the GC work to take a broader perspective than that.

I argued that the set of languages for which i31ref is useful is also a very narrow set, and much less mainstream at that.

Not sure if mainstream-ness should be an argument. Maybe just a tie-breaker ;)

The combination of heterogeneous structs/arrays and unboxed scalars in the proposal is meant to support both the Java-like use case and the FP use case equally well (except that the latter will still need lots of casts in the MVP). Makes sense?

Sure..

People are implementing languages like Scheme and Erlang in Wasm today, so I don't know what exactly you base that claim on.

Which are using linear memory and/or JS objects? I'm purely arguing in terms of what features make sense for the GC proposal.

Fortunately, not all dynamic languages are even remotely as crazy as JavaScript. The GC proposal should be a sufficiently decent fit for the others -- that's the intention anyway. Do you care to elaborate why you think it isn't?

Maybe I am missing something, but most of these languages represent all their data with something that looks like a dictionary + a variety of other, more basic types. I don't see how you'd represent any of that with this proposals typed structs and arrays, that can't be done better with whatever custom representation they are currently using in linear memory (or as JS object).

Or rather, if the best way to represent a dynamic language in the GC proposal is significantly worse than those other options, that counts to me as not catered for.

You seem to be implying two things: no GC'ed language besides the narrow breed of Java-like ones really matters (including dynamic ones).

I can't imagine where I said that. My mention of that set of Java-likes was merely as a counter to your assertion that pretty much "all languages" need i31ref. As for dynamic languages.. see above.

And its not that narrow. Once you remove dynamic languages and "languages happier with linear memory anyway", these Java-likes cover the remaining mainstream, and more importantly, languages the GC proposal appeared to be targeting most.

And that all these other languages picked their implementation approach either out of laziness or out of ignorance. Neither is true, and I would suggest to be more thoughtful with assumptions like that.

I pointed out that their implementation technique is not set in stone, and that there are other options. You put the words "laziness" and "ignorance" in my mouth. I would suggest to be more thoughtful with characterizations like that.

As mentioned, (5) is not an option for almost any modern language.

I'd say most any (static) language can do this, if they don't mind link-time code generation (simulating a closed world model).

(3) relies on jitting, i.e., is inherently incompatible with AOT, and generally is the most expensive and complex approach (which I doubt would fare well with the double stage jitting it would induce on top of Wasm).

The argument was about existing implementation techniques, which so far have not been architected for Wasm.

@jdh30
Copy link

jdh30 commented Oct 29, 2019

Sabine wrote:

what is the downside of the .NET approach, i.e. for what kind of code is the OCaml tagging strategy faster than .NET?

F# is generally much faster than OCaml in practice, often several times faster. The only place where tagging is competitive is when the heap is composed almost entirely of pointers and enums which is common in pedagogical examples but extremely rare in industrial code.

(Or does the OCaml strategy, generally, just consume less memory at the cost of being slower?

OCaml's tagging strategy consumes more memory whilst running slower.

Or does the OCaml strategy enable something that is not possible under the .NET garbage collector?) I'd really like to understand how the .NET GC works.

OCaml's strategy enables extremely simple implementation. The entire GC could be just ~100 lines of code. It is also trivially applicable to untyped languages. However, my bugbear is that OCaml's strategy fails to leverage static type information to recover C-like performance which is really frustrating because all of the information required to generate really fast code is right there and OCaml just throws it all away and generates (sometimes) really awful code instead.

Andreas Rossberg wrote:

whole-program monomorphisation approaches cannot handle first-class polymorphism or polymorphic recursion anyway, which are common today.

But .NET does whole-program monomorphisation and handles polymorphic recursion?

For dynamic languages there are two possible heap representations: Pointer tagging, unboxing small scalars, Boxing everything

Where do Julia and Swift fit into that?

C# is the main example of (3), and that obviously is very complicated to implement (even more so on top of Wasm)

I think this is the killer argument: a uniform representation is vastly easier to implement.

@rossberg
Copy link
Member

rossberg commented Oct 29, 2019

@aardappel:

I argued that the set of languages for which i31ref is useful is also a very narrow set, and much less mainstream at that.

Okay, now your argument isn’t about the design but the design goal.

Supporting a diverse range of languages and paradigms has been a stated goal for Wasm GC and its MVP from the beginning. It is also the stated goal for Wasm in general. And it has been agreed long ago that the GC design ought to be validated on representatives of at least OO, FP, and dynamic languages, which are the 3 dominant classes of GC’ed languages.

People are implementing languages like Scheme and Erlang in Wasm today, so I don't know what exactly you base that claim on.

Which are using linear memory and/or JS objects? I'm purely arguing in terms of what features make sense for the GC proposal.

What else should they be using right now? Are you implying that they are satisfied with the status quo? I think two people on this thread already told you otherwise.

Maybe I am missing something, but most of these languages represent all their data with something that looks like a dictionary + a variety of other, more basic types. I don't see how you'd represent any of that with this proposals typed structs and arrays, that can't be done better with whatever custom representation they are currently using in linear memory (or as JS object).

Dynamic languages are not all centered around dictionaries, where did you get that idea? Just consider classics like Lisp & Scheme, Erlang, Prolog and so on. And for the OO ones you are probably thinking of: uniform representation may or may not be sufficient, but it's still necessary. (I would hope that dictionaries can be implemented in Wasm user space, but I wouldn’t make any performance predictions. A workable implementation of, say, Lua seems totally feasible with this proposal, however.)

I can't imagine where I said that. My mention of that set of Java-likes was merely as a counter to your assertion that pretty much "all languages" need i31ref. As for dynamic languages.. see above.

I believe I said the majority of dynamic or polymorphic languages. And I stand by that.

And its not that narrow. Once you remove dynamic languages and "languages happier with linear memory anyway", these Java-likes cover the remaining mainstream, and more importantly, languages the GC proposal appeared to be targeting most.

There is no valid reason to remove dynamic languages from the equation just to define the problem away. If the current Java-like mainstream was all we cared about, then we could just tweak the JVM and go home.

And that all these other languages picked their implementation approach either out of laziness or out of ignorance. Neither is true, and I would suggest to be more thoughtful with assumptions like that.

I pointed out that their implementation technique is not set in stone, and that there are other options. You put the words "laziness" and "ignorance" in my mouth. I would suggest to be more thoughtful with characterizations like that.

Well, you seemed quick to assert that they could make different choices. Why do you think they made the existing choices? Why do you think that practically all FP and dynamic languages end up with uniform representations if there are better alternatives? Did you try to understand what constraints and use cases they have and what cost trade-offs are involved? Are you aware that there has been quite a bit of research about efficient memory representations for such languages?

As mentioned, (5) is not an option for almost any modern language.

I'd say most any (static) language can do this, if they don't mind link-time code generation (simulating a closed world model).

In the Wasm setting, linking is dynamic and link-time = runtime, so this becomes more or less indistinguishable from (3) and would e.g. require shipping a jit with each webpage.

More importantly, though, as I said before, your guess is simply wrong for any sufficiently interesting type system. Simple (contrived) counter example in pseudo code:

struct Pair<A, B> { x : A; y : B; }

func bar(n : int, v : T) {
if (n > 0) {
bar<Pair<T, T>>(n - 1, new Pair(v, v));
}
}

There are many other, more involved and probably more practically interesting typing features (first-class polymorphism, existential types, GADTs, inductive type families, ...), this is just a particularly easy one to demonstrate.

(3) relies on jitting, i.e., is inherently incompatible with AOT, and generally is the most expensive and complex approach (which I doubt would fare well with the double stage jitting it would induce on top of Wasm).

The argument was about existing implementation techniques, which so far have not been architected for Wasm.

Not sure how that addresses what I said. Either way, we want to allow porting native implementations to Wasm, not require redoing them from scratch with some yet to be invented techniques. For better or worse, pointer tagging is a common technique in GCs.

@jdh30:

F# is generally much faster than OCaml in practice, often several times faster. The only place where tagging is competitive is when the heap is composed almost entirely of pointers and enums which is common in pedagogical examples but extremely rare in industrial code.

I question those claims, unless you equate industrial with numeric — which I know is your main use case, but not the typical use case for OCaml (or the web).

And don't forget that F# threw out half of OCaml's features to be able to run efficiently on .NET.

Also note that this proposal, unlike OCaml, supports heterogeneous structs, so should adapt reasonably well to both worlds.

whole-program monomorphisation approaches cannot handle first-class polymorphism or polymorphic recursion anyway, which are common today.

But .NET does whole-program monomorphisation and handles polymorphic recursion?

.NET does do runtime code specialisation, I was referring to monomorphisation, i.e., static specialisation.

For dynamic languages there are two possible heap representations: Pointer tagging, unboxing small scalars, Boxing everything

Where do Julia and Swift fit into that?

I don’t know enough about Julia, but Swift inherited reference counting from Objective-C, so is not in the same boat.

C# is the main example of (3), and that obviously is very complicated to implement (even more so on top of Wasm)

I think this is the killer argument: a uniform representation is vastly easier to implement.

No, as mentioned before, the alternative requires a JIT, so is not possible, allowed, practical, affordable, or even performant in some environments. On the Web, for example, each web page would have to include a complete JIT for its language (on top of Wasm). I’m pretty sure the impact on both latency and performance would make that a rather unattractive solution.

@jdh30
Copy link

jdh30 commented Nov 1, 2019

I question those claims, unless you equate industrial with numeric — which I know is your main use case, but not the typical use case for OCaml (or the web).

Numerics is certainly one place where .NET is often much faster than OCaml but there are many others. Here are some of them:

  • Less indirection, e.g. in hash tables for memoization, data compression, dynamic programming, graph algorithms.
  • Better locality of reference, e.g. when sorting arrays of unboxed values
  • Fewer GC write barriers, e.g. when inserting unboxed values into mutable collections.
  • Fewer generational survivors, e.g. when cycling unboxed values through a queue.
  • Scalable parallelism due to reduced GC stress.

For example, this non-numerical program is over 3x faster in F# (0.8s) than OCaml (2.7s) if you unbox the variant types:

type a = M | N
type b = O | A of a
type c = P | B of b
type d = Q | C of c
type e = R | D of d

let a = function M -> N | N -> M
let b = function O -> A M | A x -> A(a x)
let c = function P -> B O | B x -> B(b x)
let d = function Q -> C P | C x -> C(c x)
let e = function R -> D Q | D x -> D(d x)

let () =
  let n = 1000000 in
  let xs = Array.make n R in
  for j=1 to 10 do
    for i=1 to n-1 do
      xs.(i) <- e xs.(i-1)
    done
  done

No, as mentioned before, the alternative requires a JIT, so is not possible, allowed, practical, affordable, or even performant in some environments. On the Web, for example, each web page would have to include a complete JIT for its language (on top of Wasm). I’m pretty sure the impact on both latency and performance would make that a rather unattractive solution.

Ah, I see. If you make it support both unboxing/untagging and features like polymorphic recursion then you need JIT which would suck, agreed.

I was suggesting it as a default that languages could use if they didn't support such features, i.e. when there is no need for a JIT. Languages that do could use a uniform representation.

@tlively
Copy link
Member

tlively commented Feb 11, 2022

This discussion covered many different pointer tagging and value packing schemes and abstract discussion about related design principles for Wasm. Now that we have concrete implementations of the proposal, the best way to continue this discussion would be to raise concrete suggestions as new issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests