Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eliminate continue statement #310

Closed
rossberg opened this issue Aug 24, 2015 · 27 comments
Closed

Eliminate continue statement #310

rossberg opened this issue Aug 24, 2015 · 27 comments
Milestone

Comments

@rossberg
Copy link
Member

In a language with break-from-block, the continue statement is redundant. Any loop of the form

l: while (C) { ... continue l; ... }

can equivalently be rewritten to

l: while (C) l2: { ... break l2; ... }

While a separate continue statement is useful in user-facing syntax, I see no benefit in having it in Wasm. In particular, since labels are implicit, so there isn't even any encoding overhead in the second form.

I propose removing continue.

@kg
Copy link
Contributor

kg commented Aug 24, 2015

Labels are implicit? That's interesting, when did we decide on that? If that's the case, I agree that continue x is redundant and unnecessary.

@rossberg
Copy link
Member Author

At least that was my understanding wrt to the binary encoding. It's what the V8 prototype does -- break and continue just take a relative depth parameter.

@sunfishcode
Copy link
Member

This does look like it's an improvement over the current design. It slightly obscures backedges in the case of forever loops, but makes do_while control flow simpler, which is nice.

That said, I also have an open proposal for a much broader redesign of the control constructs in #299.

@titzer
Copy link

titzer commented Aug 24, 2015

I'm not sure we're gaining anything by reducing the number of control flow
constructs here. On the one hand we're adding more forms of comparisons,
more conversions, and more integer doodads, and then we're taking away
common control flow patterns. This is not serving producers well, and it is
not really making implementations any simpler, since the surface area of
all those other operations completely dwarfs implementing control flow.

No one even has a completely functional pipeline and we're changing
fundamentals with zero data based on taste.

On Mon, Aug 24, 2015 at 8:23 PM, Dan Gohman [email protected]
wrote:

This does look like it's an improvement over the current design. It
slightly obscures backedges in the case of forever loops, but makes
do_while control flow simpler, which is nice.

That said, I also have an open proposal for a much broader redesign of the
control constructs in #299
#299.


Reply to this email directly or view it on GitHub
#310 (comment).

@kg
Copy link
Contributor

kg commented Aug 24, 2015

Would it help if I put a wasm backend in my personal compiler so we have a test corpus to play with until LLVM is ready?

@titzer
Copy link

titzer commented Aug 24, 2015

On Mon, Aug 24, 2015 at 9:05 PM, Katelyn Gadd [email protected]
wrote:

Would it help if I put a wasm backend in my personal compiler so we have a
test corpus to play with until LLVM is ready?

Yeah, let's see what comes out :-)


Reply to this email directly or view it on GitHub
#310 (comment).

@rossberg
Copy link
Member Author

@titzer, this is not taking away any pattern. A producer just literally replaces continue n with break n-1, that's all.

@titzer
Copy link

titzer commented Aug 24, 2015

No, they need to have another block inside the loop order to do so.

On Mon, Aug 24, 2015 at 9:40 PM, rossberg-chromium <[email protected]

wrote:

@titzer https://github.com/titzer, this is not taking away any pattern.
A producer just literally replaces continue n with break n-1, that's all.


Reply to this email directly or view it on GitHub
#310 (comment).

@rossberg
Copy link
Member Author

@titzer, in asm.js break can exit any statement, not just blocks or loops. But even if Wasm is supposed to be more restricted (should it?), isn't that a triviality?

@titzer
Copy link

titzer commented Aug 24, 2015

In v8-native, a loop is just like a block in that it contains multiple
statements/expressions. IIUC you are proposing that loop has only a single
statement/expression and that to encode continue you make a loop with a
block as its subexpression and then break the subblock, correct?

If so, I think that's a lot of aggravation for no gain anywhere.

On Mon, Aug 24, 2015 at 10:14 PM, rossberg-chromium <
[email protected]> wrote:

@titzer https://github.com/titzer, in asm.js break can exit any
statement, not just blocks or loops. But even if Wasm is supposed to be
more restricted (should it?), isn't that a triviality?


Reply to this email directly or view it on GitHub
#310 (comment).

@rossberg
Copy link
Member Author

I see. I hadn't noticed that about v8-native. The AST semantics doesn't say anything to the contrary, so I have been assuming the obvious C statement structure so far. Has this actually been discussed?

@sunfishcode sunfishcode added this to the MVP milestone Aug 30, 2015
@AndrewScheidecker
Copy link

I think the primary benefit to eliminating the continue operation is that it simplifies the AST context; e.g. the context to mimic the JavaScript break and continue semantics as the binary encoding from the polyfill prototype requires is annoying. Another way to simplify the context (and eliminate continue) without requiring extra label nodes is to give loop nodes two branch targets: one for break, and one for continue.

e.g. this WebAssembly: (loop $break $continue ... (branch $continue) ... (branch $break) ...)
could translate to this JavaScript: a: while(true) { ... continue a; ... break a; ... }
and can also be easily translated to lower level branching operations.

Incidentally, I also think the text format should eventually require referencing labels, locals, and globals by name; that would mean only consumers of the binary format need to understand indexing. With rules like loops introducing two labels, or separate index spaces for locals by type, that's potentially a lot of complexity that could be isolated to the binary format. It makes sense to keep the indexing in the text format for prototyping, but once the reference interpreter also implements a binary format, it would be nice to get rid of it.

@kg
Copy link
Contributor

kg commented Aug 31, 2015

Another way to simplify the context (and eliminate continue) without requiring extra label nodes is to give loop nodes two branch targets: one for break, and one for continue.

This would address my concerns. I like it.

Incidentally, I also think the text format should eventually require referencing labels, locals, and globals by name; that would mean only consumers of the binary format need to understand indexing.

That would be great, since I basically don't understand how the numeric indexes work right now despite trying to :-)

@titzer
Copy link

titzer commented Aug 31, 2015

On Mon, Aug 31, 2015 at 6:11 PM, Andrew Scheidecker <
[email protected]> wrote:

I think the primary benefit to eliminating the continue operation is that
it simplifies the AST context; e.g. the context to mimic the JavaScript
break and continue semantics as the binary encoding from the polyfill
prototype requires is annoying. Another way to simplify the context (and
eliminate continue) without requiring extra label nodes is to give loop
nodes two branch targets: one for break, and one for continue.

e.g. this WebAssembly: (loop $break $continue ... (branch $continue) ...
(branch $break) ...)
could translate to this JavaScript: a: while(true) { ... continue a; ...
break a; ... }
and can also be easily translated to lower level branching operations.

Incidentally, I also think the text format should eventually require
referencing labels, locals, and globals by name; that would mean only
consumers of the binary format need to understand indexing. With rules like
loops introducing two labels, or separate index spaces for locals by type,
that's potentially a lot of complexity that could be isolated to the binary
format. It makes sense to keep the indexing in the text format for
prototyping, but once the reference interpreter also implements a binary
format, it would be nice to get rid of it.

I'm all for using labels in the text format as opposed to indices for
break/continue, but I still think the continue statement is useful in and
of itself. FWIW v8-native basically does this. It maintains a stack of
ifs/loops which it uses to find the target for either a break or a continue.


Reply to this email directly or view it on GitHub
#310 (comment).

@AndrewScheidecker
Copy link

FWIW v8-native basically does this. It maintains a stack of
ifs/loops which it uses to find the target for either a break or a continue.

The main change to v8-native to implement this would be that each label on the stack only has a single target instead of a break target and a possibly null continue target. It also eliminates an edge case to verify that the stack index parameter of a continue operation has a non-null continue target.

@qwertie
Copy link

qwertie commented Aug 31, 2015

(loop $break $continue ... (branch $continue) ... (branch $break) ...)
could translate to this JavaScript: a: while(true) { ... continue a; ... break a; ... }

sgtm.

Incidentally, I also think the text format should eventually require referencing labels, locals, and globals by name...

I think it would be unfortunate if the text format were to diverge too much from the binary format - people learning about webassembly will naturally assume that the text format is conceptually the same as the binary one. If the text format supports both names and indices, documentation will naturally introduce both and explain how one is syntactic sugar for the other. Such information is likely to be left out, or not explained as well, if the text format is not allowed to reflect the binary format.

@titzer
Copy link

titzer commented Aug 31, 2015

On Mon, Aug 31, 2015 at 11:29 PM, David Piepgrass [email protected]
wrote:

(loop $break $continue ... (branch $continue) ... (branch $break) ...)
could translate to this JavaScript: a: while(true) { ... continue a; ...
break a; ... }

sgtm.

Incidentally, I also think the text format should eventually require
referencing labels, locals, and globals by name...

I think it would be unfortunate if the text format were to diverge too
much from the binary format - people learning about webassembly will
naturally assume that the text format is conceptually the same as the
binary one. If the text format supports both names and indices,
documentation will naturally introduce both and explain how one is
syntactic sugar for the other. Such information is likely to be left out,
or not explained as well, if the text format is not allowed to reflect
the binary format.

I agree, and I also think that webassembly shouldn't be unnecesarily
cumbersome because we're trying to be too minimal with control structures.
I think continue is very, very common and we wouldn't save much, if
anything, by leaving it out.


Reply to this email directly or view it on GitHub
#310 (comment).

@AndrewScheidecker
Copy link

I don't feel too strongly one way or the other about continue. Just wanted to point out you can do it without requiring extra label nodes.

I think it would be unfortunate if the text format were to diverge too much from the binary format - people learning about webassembly will naturally assume that the text format is conceptually the same as the binary one.

I agree in general, but the text format already lets you name things where I assume the binary format will be all indices: non-exported functions, globals, locals, parameters, labels, etc. IMO that's justifiable because it makes it much easier for a human to read (the whole point of the text format) without changing the semantics. I don't see why anybody other than people writing an encoder or decoder need to even know that the binary format uses indices. How many people with a functional knowledge of x86 assembly know how it's encoded?

@qwertie
Copy link

qwertie commented Sep 1, 2015

I agree in general, but the text format already lets you name things where I assume the binary format will be all indices: non-exported functions, globals, locals, parameters, labels, etc.

You have a point there... but just to be clear, I didn't immediately find the issue id, but I remember we were talking about (and I advocate) having a place for names (of parameters, locals, etc.) to be stored in the binary format, so that text and binary will round-trip and other reasons (e.g. having a good debugging experience in the absence of source code.) The names could be stripped to save space, which makes me think the text format needs a way to represent indices in that scenario.

@titzer
Copy link

titzer commented Sep 1, 2015

On Tue, Sep 1, 2015 at 5:45 PM, David Piepgrass [email protected]
wrote:

I agree in general, but the text format already lets you name things where
I assume the binary format will be all indices: non-exported functions,
globals, locals, parameters, labels, etc.

You have a point there... but just to be clear, I didn't immediately find
the issue id, but I remember we were talking about (and I advocate) having
a place for names (of parameters, locals, etc.) to be stored in the binary
format, so that text and binary will round-trip and other reasons (e.g.
having a good debugging experience in the absence of source code.) The
names could be stripped to save space, which makes me think the text format
needs a way to represent indices in that scenario.

Names are good, so I support having space somewhere for storing them. But I
would advocate that we don't specific a standard format for them, and leave
the interpretation of that section up to a higher level. Other than dynamic
linking, names shouldn't have a semantic impact on the program. In
particular, names shouldn't be required for the engine to produce stack
traces, IMO.

Engines should produce low-level stack traces that should be interpreted by
a higher level to produce a source-level trace. That's a screwup in the
JVM; exceptions have line numbers, not bytecode offsets, so you need to
have a standard bytecode offset -> line number table.


Reply to this email directly or view it on GitHub
#310 (comment).

@kg
Copy link
Contributor

kg commented Sep 1, 2015

The names could be stripped to save space, which makes me think the text format needs a way to represent indices in that scenario.

The consensus there was that stripped names would be bare symbols, i.e. @7 instead of just raw indices. In theory a disassembler could (should) assign them better names, as most decompilers do, i.e. @loop3, @func9, @local7 or even @i32_7. The only important thing there is round-tripping the index relationships.

And as Ben says, the names are a human concern, not something that affects the engine. They just live in a table and are used for human-facing stuff like a debugger or a disassembler.

@titzer
Copy link

titzer commented Oct 23, 2015

Polling this issue to see if there is a consensus: text format aside, continue stays or goes?

@rossberg
Copy link
Member Author

If I understand the recent resolution of the control structure discussion correctly (not sure about that), then continue is merely a synonym for (certain uses of) break.

@sunfishcode
Copy link
Member

Yes, I think we all agree now that we don't want continue as a distinct operator.

@titzer
Copy link

titzer commented Oct 23, 2015

I think so. There is a distinction at the v8-native binary level since break targets the end label of a loop/block and continue targets the begin label. So, encoding issue?

@titzer titzer reopened this Oct 23, 2015
@rossberg
Copy link
Member Author

Yes, I'd argue that's an encoding issue.

@titzer
Copy link

titzer commented Oct 23, 2015

I see how we can do this with a v8-native change; just push two labels onto the stack for loop and then break and br_if can pick which one based on the nesting depth.

Closing. Remove continue.

@titzer titzer closed this as completed Oct 23, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants