Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define the behavior of "switch" #322

Closed
AndrewScheidecker opened this issue Sep 1, 2015 · 11 comments
Closed

Define the behavior of "switch" #322

AndrewScheidecker opened this issue Sep 1, 2015 · 11 comments
Milestone

Comments

@AndrewScheidecker
Copy link

The design is vague about how switches work. The spec interpreter does this:

  • A switch is an operation with N>=2 operands
  • The first operand is an integer of any type.
  • The last operand is an expression that yields the same type as the switch that is used by default if no case is matched.
  • The other operands are cases.
  • Each case has a literal integer and an expression (the syntax allows 0 or many expressions that yield nop or block expressions).
  • Each case has an optional fallthru flag that determines whether it will fall through to the following case (or default case). The syntax sugar for a case without an expression implies fallthru.
  • If a case is fallthru, then its expression is void type ([] in the spec interpreter).
  • If a case isn't fallthru, then its expression is the same type as the switch.

More relevant context about the spec interpreter:

  • All operations are expressions: they may yield a value, and so have an associated type
  • Break/return/loop operations are considered to yield a value of the bottom type, representing that they never yield a value to their enclosing expression, and may occur as a subexpression where any type is expected.
  • Labels are expressions, so the break operation needs an operand of the appropriate type for the label to yield.

This issue is not about whether there should be a C-like concept of statements, which seems to be an open issue. If the spec ends up making switch a statement, this is irrelevant. But if not:

As it's defined in the spec interpreter, there isn't a direct translation to and from C-style switches: the default case must be the equivalent of the final case, and so cannot fall through to other cases. I believe the rationale for this design is to ensure a switch always yields a value of the type expected by its context.

Another way to achieve the same goal would be:

  • Give switch an implicit label.
  • Add a way to declare a default case, and get rid of the final default expression.
  • Make all but the last case the equivalent of "fallthru". That means the last case expression would need to be same type as the switch, but the other case expressions would be any typed.

Instead of each non-fallthru case being an exit from the switch, the exits from this switch would be explicit breaks and the final case. This makes switch equivalent in flexibility to C-style switches, but allows it to still be used as an expression.

For example:

...
(param %key i64)
(local %a f64)
(local %b i32)
(setlocal %a (switch.i64 #switch %key
    (case 0)
    (case 1 (break #switch (constant.f64 100.0)))
    (default (setlocal %b 1))
    (case 2 (break #switch (constant.f64 200.0)))
    (case 3 (constant.f64 300.0))
    ))

This would map 0=>100.0, 1=>100.0, 3=>300.0, and any other value =>200.0. Any value other than 0|1|2|3 would also set %b to 1. To implement this with the current spec interpreter's switch, you'd need to duplicate case 2's expression into the default expression.

@qwertie
Copy link

qwertie commented Sep 1, 2015

sgtm, it encodes both C-style switches and (some) functional-language ones adequately. Hm, it feels like there ought to be a compact binary encoding for common instances where none of the cases fall through... and note that allowing default in the middle implies we need extra bits of information to indicate which parameter is the default, IOW, default needs explicit encoding rather than being assumed.

@sunfishcode
Copy link
Member

+1 to not requiring the default to be last, and to allowing it to fall through to other cases. I'm not tied to a specific AST representation of this, but we want this functionality.

And +1 to giving switch a label, making cases fall through by default, and making the exits from a switch be breaks. Making switch cases be non-fallthrough by default is on many people's list of "things to fix if I ever write a new language", and for human-written languages I agree, but for WebAssembly, switch is the N-way dispatch construct, and it's nice to limit it to that and not also bundle in additional implicit branching. We're going to be using breaks to exit from switches to outer nesting levels anyway, so it's consistent to also use breaks to exit to the immediate nesting level too.

@titzer
Copy link

titzer commented Sep 1, 2015

We have to make sure we're not getting into too much of a tangle for the
implementation of switches inside of engines.

I think we should avoid making switches into a swiss army knife of control
flow constructs and should instead try to find a very simple switch
mechanism that is the minimal added value on top of an if-cascade. What I
mean is that there's marginal value to adding a switch statement that's
just a more compact representation of an if-cascade that has to be unpacked
by the engine which will ultimate produce no better code.

So unless the implementation is going to generate a more efficient
dispatch, switches don't have much value. So, I would argue we should stick
to table-switches (which result in an indirect branch) and lookupswitches
(which can be done with a binary search), which are really easy to do with
no-fall through as the default, and no default in the middle.

On Tue, Sep 1, 2015 at 5:03 PM, Dan Gohman [email protected] wrote:

+1 to not requiring the default to be last, and to allowing it to fall
through to other cases. I'm not tied to a specific AST representation of
this, but we want this functionality.

And +1 to giving switch a label, making cases fall through by default, and
making the exits from a switch be breaks. Making switch cases be
non-fallthrough by default is on many people's list of "things to fix if I
ever write a new language", and for human-written languages I agree, but
for WebAssembly, switch is the N-way dispatch construct, and it's nice to
limit it to that and not also bundle in additional implicit branching.
We're going to be using breaks to exit from switches to outer nesting
levels anyway, so it's consistent to also use breaks to exit to the
immediate nesting level too.


Reply to this email directly or view it on GitHub
#322 (comment).

@sunfishcode
Copy link
Member

This proposal does exactly as your first two paragraphs suggest: it makes switch less of a tangle for the engine implementation, and makes it less of a swiss army knife of control flow constructs.

I agree that there's little value of a construct which is just shorthand for an if-cascade, so I don't see a strong motivation for a lookupswitch construct. Producers can emit their own if-trees if they know they want a binary search tree.

The switch proposed here is similar to table-switch. The value it adds to wasm is that it's a jump table. Having fallthrough just means that it doesn't also have implicit branching tacked on, so it's simpler. C, asm.js, and others can put a default case in the middle and it can fall through to other cases, so if WebAssembly can't do that, it'll force us to produce more awkward control flow.

@sunfishcode sunfishcode added this to the MVP milestone Sep 3, 2015
@sunfishcode
Copy link
Member

LLVM's switch is actually even more expressive than this. Instead of requiring cases be contained within the body of a switch, a switch statement can directly branch anywhere. In the spirit of #299, the most expressive form of switch would be one where each arm is just a label of a block/statement to break from or loop to continue to.

@AndrewScheidecker
Copy link
Author

Two downsides to LLVM's switch:

  • It can't be used as an expression
  • It isn't isomorphic to ASM.JS switches.

One alternative is to add a CFG node that is a bit like the loop+switch construct that Relooper generates for irreducible control flow. It would have an initial label, a label to exit the CFG, and some labeled basic blocks. It could easily compile to either a loop+switch in ASM.JS or basic blocks+branches in a native backend. LLVM's switch would be expressible as a switch within the initial expression or label of a CFG node, with each arm simply branching to one of the CFG labels.

e.g.

(cfg #exit #initial
    (bb #a (if ... (branch #b) (branch #exit)))
    (bb #b ... (branch #a))
    (bb #initial
        (switch ...
            (case 0 (branch #a))
            (case 1 (branch #b))
            (default (branch #exit))
        )
    )
)

in ASM.JS:

var label = 2;
cfg: do {
    switch(label) {
    case 0:
        if(...) { label = 1; continue cfg; }
        else { break cfg; }
    case 1:
        ...
        label = 0;
        continue cfg;
    case 2:
        switch(...) {
        case 0: label = 0; continue cfg;
        case 1: label = 1; continue cfg;
        default: break cfg;
        }
    }
} while(1);

A more extreme possibility is: just use a CFG, but include Relooper classifications of the basic blocks so the backend can easily turn it into structured control flow if necessary. Or even just make the ASM.JS backend do its own Relooper analysis.

@sunfishcode
Copy link
Member

It would actually be fairly straightforward to add a return value to a #299 style switch. Such a switch is a break statement that, instead of taking a single immediate that specifies which block to break to, takes N immediates that specify N blocks, and an integer to select which one to break to. All we'd need to do is add a result value operand just like break's.

LLVM's switch is actually even more general than that and can go "backward", and #299 has a distinction between forward and backward branching (viz. break and continue). One option is to ignore that generality; another would be to eliminate that distinction too and just use break for both forward and backward branching (and presumably break's result value would just be ignored when it's backward).

Either way, lowering to asm.js switch is not complex; one can use a switch with cases that immediately do a break or continue with the appropriate labels.

A notable difference between these #299 style approaches and generalized CFG approaches is that #299 manages to retain some of the advantages of well-structured control flow -- it can still be structured and analyzed hierarchically, loops are identified and guaranteed to be single-entry, etc.

@AndrewScheidecker
Copy link
Author

All we'd need to do is add a result value operand just like break's.

I meant the other way: that a value can't be returned out of such a switch. But you can do that by adding a label to the switch statement that's used as a target to "return" from the switch, I guess.

LLVM's switch is actually even more general than that and can go "backward"

My example goes backward. The CFG node's basic blocks are in scope in the nested switch, so it both branches to basic blocks that precede it in the CFG, as well as the CFG's exit label.

another would be to eliminate that distinction too and just use break for both forward and backward branching (and presumably break's result value would just be ignored when it's backward).

I'm in favor of getting rid of the distinction between branching forward and backward (see #310 (comment)). The value operand in branch should match the type defined by the label being branched to. In (loop #break #continue ...), the continue label has a void type, but the break label has the type expected as a result of the loop expression. The AST I use for WebAssembly works that way: AST, LLVM, WAST parsing

Either way, lowering to asm.js switch is not complex; one can use a switch with cases that immediately do a break or continue with the appropriate labels.

Can you elaborate on this? I don't understand how you'd do this.

A notable difference between these #299 style approaches and generalized CFG approaches is that #299 manages to retain some of the advantages of well-structured control flow -- it can still be structured and analyzed hierarchically, loops are identified and guaranteed to be single-entry, etc.

The CFG expression I described is meant to work in conjunction with all the existing structured control flow operations. The idea is that a compiler backend uses something like Relooper to produce structured control flow, and for the irreducible case it falls back to a CFG expression rather than a var+loop+switch. On ASM.JS the CFG expression turns into the var+loop+switch, but for probably every other runtime it would map to something lower level.

@sunfishcode
Copy link
Member

I meant the other way: that a value can't be returned out of such a switch. But you can do that by adding a label to the switch statement that's used as a target to "return" from the switch, I guess.

I mean that a #299 style switch, because such a switch would just be an N-way break, so if break can return a value, then switch can too, in the same way.

My example goes backward.

It does, though it does so at the cost of introducing a generalized CFG even in cases where control flow could otherwise remain explicitly well-structured.

Can you elaborate on this? I don't understand how you'd do this.

switch (x) {
case 0: break L0;
case 1: break L1;
case 2: break L2;
...
}

This is now a switch where the cases break to any label. Replace break with continue for backward branches, or eliminate the break/continue distinction to taste :-). The spirit of #299 is to collapse this idiom into a simple switch node that just directly branches places instead of cases that contain branches to places.

I do think the idea of an embedded CFG-like construct is worth considering. The advantage of #299 is that it can always preserve well-structured control flow without falling back to a CFG-like construct when it isn't necessary.

@AndrewScheidecker
Copy link
Author

I mean that a #299 style switch, because such a switch would just be an N-way break, so if break can return a value, then switch can too, in the same way.

When I say that switch can return a value, I mean to the enclosing expression, not to the labels it branches to. So you're right that you could define a switch as a N-way break that may pass values to the branch targets, but it would be pretty weird to be able to use that switch construct as an expression: (set_local %0 (switch ...)). But maybe that's best.

switch (x) {
case 0: break L0;
case 1: break L1;
case 2: break L2;
...
}

I see, so you mean that the switch would just be a table-based branch, but that it could still only target labels that are in-scope according to JavaScript-like structured control flow semantics.

So a C-style switch statement might be translated like this:

switch(x)
{
case 0: a;
default: b; break;
case 1: c;
}

(label #exit
    (label #1
        (label #default
            (label #0
                (switch x #default (case 0 #0) (case 1 #1))
            )
            a
        )
        b
        (branch #exit)
    )
    c
)

That works, and it's appealing that it distills switch to just a table-based branch, but it's certainly confusing to read. I think the problem is that label defines a branch target at its end. You could redefine the label node to create a branch target to its beginning scoped to the expressions preceding it in a block. That would make the above example:

(switch x #default (case 0 #0) (case 1 #1))
(label #0 a)
(label #default b (branch #exit))
(label #1 c)
(label #exit)

// in ASM.JS
Lexit:{
    L1:{
        Ldefault:{
            L0:{
                switch(x) {
                case 0: break L0;
                case 1: break L1;
                default: break Ldefault;
                }
            }
            a
        }
        b;
        break Lexit;
    }
    c
}

I like that. Much closer to the LLVM-style switch while still being easy to transform to JavaScript. It does complicate the scoping rules for the label's branch target, though.

Then the CFG node I mentioned above might look like this:

(cfg #exit
    (switch ... #exit (case 0 #a) (case 1 #b))
    (label #a (if ... (branch #b) (branch #exit)))
    (label #b ... (branch #a))
)

So it would just use normal label nodes with more permissive scoping rules for their branch targets.

@lukewagner
Copy link
Member

This should be resolved by #427.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants