Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flexible control-flow operators. #427

Merged
merged 2 commits into from
Oct 30, 2015
Merged

Flexible control-flow operators. #427

merged 2 commits into from
Oct 30, 2015

Conversation

sunfishcode
Copy link
Member

This introduces br, br_if, and so on, while also preserving the
high-level if and if_else operators. It also converts switch into
tableswitch and generalizes it to support labels in enclosing scopes.

This introduces `br`, `br_if`, and so on, while also preserving the
high-level `if` and `if_else` operators. It also converts `switch` into
`tableswitch` and generalizes it to support labels in enclosing scopes.
@sunfishcode
Copy link
Member Author

Updated PR to be independent of statements=expressions.

@rossberg
Copy link
Member

I'd like to understand the motivation for tableswitch better. Isn't that simply a more convoluted way of writing a C-style switch? As far as I can tell, it is completely equivalent, the only differences being stylistic. Namely, (1) all cases have to be written with an indirection through an extra label instead of using case directly as the label, (2) it forces to list all cases between 0 and the max label explicitly instead of having omitted ones go to default implicitly, and (3) it can jump to outside labels directly instead of using the existing branch construct for that.

(1) and (2) are effectively just changes in syntax, not semantics. Of course, this representation makes sense on the encoding level, but on the AST level, it just makes the construct (significantly) more verbose and more complicated to describe and spec. (3) on the other hand is just syntactic sugar, analogous to br_if.

So in terms of AST semantics and spec factorisation, tableswitch is strictly inferior to a C-style switch, while being functionally equivalent. I think we should treat the table representation as an encoding concern, not a semantics concern.

@sunfishcode
Copy link
Member Author

This tableswitch has one area of flexibility beyond what a C-style switch has: it can directly branch to labels outside its own body.

It also has a notable restriction in that it requires the index space to be zero-based and dense. This makes it more directly correspond to an actual jump table, as reflected by the rename to tableswitch.

@rossberg
Copy link
Member

Yes, I understand, but as I tried to point out above, that seems more like an encoding consideration, and not like something that is necessarily relevant on the semantic abstraction level of the AST.

@sunfishcode
Copy link
Member Author

I don't see how the semantics of where a tableswitch can branch to is an encoding consideration. Similarly, the index space determines how tableswitch interprets its operand, so it's also part of the semantics.

@rossberg
Copy link
Member

As mentioned above, the ability to jump to an external label is merely syntactic sugar over a break (and limited one, because there is no way you can make it transfer a value). As such, it is primarily an encoding consideration as well.

Can you explain what you are referring to as index space, and how it affects the operand?

@qwertie
Copy link

qwertie commented Oct 24, 2015

We discussed tableswitch earlier in #322. It is advantageous over the old switch because

  1. tableswitch is able to represent the vast majority of C switch statements, unlike the old switch (IIUC, some C switch statements such as duff's device are still not allowed)
  2. Using a dense rather than sparse jumptable allows a Wasm JIT or interpreter to be very simple, which maximizes minimizes startup time. The optimization problem of how to implement the switch statement (as a collection of jump tables, gotos (breaks) and if-elses) is shifted from JIT-time to compile-time.

@sunfishcode
Copy link
Member Author

@rossberg-chromium One can express any reasonable switch-like construct as syntax sugar on top of any other plus extra nodes. As such, I don't see why this is a reason to prefer any one over any other.

Also, Statements == Expressions is an orthogonal concern. This tableswitch can still transfer values, just as br can, if we choose to do that.

The index space of tableswitch is the integer sequence starting at 0 and continuing for some finite length. A tableswitch operator defines a mapping from values in its sequence to labels, and it uses this mapping to compute a label, given its operand.

@rossberg
Copy link
Member

@qwertie, yes, I was comparing with a C-style switch, which I'm happy moving towards. Density is an independent consideration.

@sunfishcode, re index space, that's what I guessed, but then I still don't understand your earlier remark on that.

It may help to point out that we are actually discussing syntax here rather than semantics. I am fine with the goal of increasing the flexibility of the switch operator. But I am puzzled how the explicit syntactic indirection through extra labels helps at the AST level.

Let's be concrete. Consider a switch as described in this proposal:

$label: (switch (operand)
  ($case0 $case1 $case2 $label $case4 $case5) $default
  (case $case2 (a))
  (case $case1 (b))
  (case $case5 (c))
  (case $case4)
  (case $default (d))
  (case $case0 (e))
)

Here is how I would design the concrete S-expr syntax:

$label: (switch (operand)
  (case 3 br $label)
  (case 2 (a))
  (case 1 (b))
  (case 5 (c))
  (case 4)
  (default (d))
  (case 0 (e))
)

That is expressing exactly the same structure but with much less repetition. You can still represent it as a table of labels in the binary encoding, but as a textual or AST-level representation it is superior in just about every way I can imagine, for both producers and "consumers", including the spec.

Note that the density requirement is independent from this choice of syntax. I don't see a particularly strong reason to require listing the (case 4) explicitly in the latter form, but there is nothing preventing us from requiring it if we want to emphasise the table nature.

Note also that I even turned the break to the outer label into special syntax for case, that's different from using a separate br operator (which would be parenthesised). It obviously is redundant to introduce that, but there you go.

So, again, this is really about syntax, not semantics. AFAICT, the latter way of structuring the operator syntax is more convenient on all levels. I'd like to understand what advantage you see in the former notation. Or would you be fine with the latter as well?

@sunfishcode
Copy link
Member Author

I've now added a patch to this PR converting it to what I think you're asking for. Is this what you had in mind?

In my opinion, this patch makes the prose harder to read for humans, makes it harder for me to convince myself that the semantics aren't incomplete or inconsistent, and it puts greater distance between the design-document AST representation and what I might imagine we'll want in the binary encoding.

@rossberg
Copy link
Member

@sunfishcode, cool, thanks! And for your pleasure, I already implemented it: WebAssembly/spec#153 :)

@sunfishcode
Copy link
Member Author

I decided I didn't like it, so I removed it now.

@rossberg
Copy link
Member

I see. :( Well, please have a look at the spec PR. AFAICS, with this structure, it is as simple as it will get. Tests also remain more readable. I hear what you are saying re prose and distance to binary encoding, but I think those are far outweighed by familiarity and structural simplicity, respectively.

References to labels must occur within an *enclosing construct* that defined
the label. This means that references to an AST node's label can only happen
within descendents of the node in the tree. For example, references to a
`block`'s label can only happen from within the `block`'s body. In practice,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"happen from within" => "occur within"

@titzer
Copy link

titzer commented Oct 26, 2015

This mostly looks good modulo some wordsmithing.

@titzer
Copy link

titzer commented Oct 26, 2015

@rossberg-chromium You've mentioned a C-style switch as a basis for the tableswitch design. Here we are being a little bit more explicit with labels, which pushes a bit of work up to producers, at the gain of being explicit for backends and engines. Is this a reasonable tradeoff in your opinion? Or are you proposing that the syntax of the AST is C-like and the binary encoding is table-switch like?

@kg
Copy link
Contributor

kg commented Oct 26, 2015

@sunfishcode Have you done much research into how this would impact pre-compression and post-compression file size (in the asmjs packer, for example?) The significant changes to ASTs this implies could mean a big file size regression (or a big file size win, in which case that alone is a compelling reason to make your changes!)

@sunfishcode
Copy link
Member Author

@kg Looking at some asm.js code of some miscellaneous applications, if statements are over 6x more frequent than loops, and over 300x more frequent than switches. This iteration of the proposal still has if and if_else. And, I expect no difference in the size of loops in the common case of optimized code. Consequently, I'm assuming this doesn't introduce a big file size regression or win.

@sunfishcode
Copy link
Member Author

Wordsmithing changes applied.

@kg
Copy link
Contributor

kg commented Oct 26, 2015

The question is about the binary encoding. Are you saying if and if_else would be opcodes in the binary representation? I thought they were syntax sugar?

@ghost
Copy link

ghost commented Oct 26, 2015

@kg I think this has been answer elsewhere - syntax sugar is a spec matter and there are still opcodes for the sugar.

@rossberg
Copy link
Member

On 26 October 2015 at 18:55, titzer [email protected] wrote:

@rossberg-chromium https://github.com/rossberg-chromium You've
mentioned a C-style switch as a basis for the tableswitch design. Here we
are being a little bit more explicit with labels, which pushes a bit of
work up to producers, at the gain of being explicit for backends and
engines. Is this a reasonable tradeoff in your opinion? Or are you
proposing that the syntax of the AST is C-like and the binary encoding is
table-switch like?

Yes, the latter. There is no real benefit in introducing explicit labels as
jump targets on the AST level -- quite the opposite, in fact. A
C/JavaScript-style switch notation is much more manageable when formulating
semantics, or writing/reading examples.

@rossberg
Copy link
Member

WebAssembly/spec#153 now matches everything described by this PR (I think), modulo the notation for tableswitch.

I did some more pondering over the latter, esp looking into how the proposed notation would affect the spec. Honestly, it would be quite ugly. :( In addition to tracking the extra indirection, it would require distinguishing two different kinds of (raw) labels in the AST (since tableswitch addresses its inner targets differently from external targets) as well as different scoping rules for those labels' symbolic representation.

In other words, my assessment still stands that this notational factorisation for tableswitch is suboptimal. Unless we have strong technical reasons to do it that way, I highly recommend more streamlined notation. @sunfishcode, do you think you can still warm up to that?

@sunfishcode
Copy link
Member Author

My attempt above at implementing what I thought you were asking for made the prose about twice as long, and I've now discovered, as I feared, that it did have a bug hidden in the extra complexity: it didn't address whether case and default could fall through to case_br and default_br.

Right now, I am very confused as to the intended relationship between AstSemantics.md, ml-proto's s-expression syntax, what we might expect to call the binary format AST, and what we might expect in the actual binary format. This confusion is making it difficult for me to figure out what you're asking for here, and what it means for other parts of the system.

I propose we merge this PR as a starting point, and then you can submit a new PR after it proposing the specific changes what you would like to see. I think that will help me understand better how you want this to look, rather than having me try to write up what I think you want. Does that make sense?

@rossberg
Copy link
Member

@sunfishcode, okay, sounds good to me.

@sunfishcode
Copy link
Member Author

Ok. Merging.

sunfishcode added a commit that referenced this pull request Oct 30, 2015
@sunfishcode sunfishcode merged commit ca9fc75 into master Oct 30, 2015
@sunfishcode sunfishcode deleted the new-control-flow branch October 30, 2015 15:33
@sunfishcode sunfishcode mentioned this pull request Nov 6, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants