Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Break / Branch? #445

Closed
kripken opened this issue Nov 3, 2015 · 44 comments
Closed

Break / Branch? #445

kripken opened this issue Nov 3, 2015 · 44 comments
Milestone

Comments

@kripken
Copy link
Member

kripken commented Nov 3, 2015

This repo currently has br: branch to a given label in an enclosing construct. I assume the intention is that

(block $block
  (branch $block) // goes to $more
)
(block $more)
)

Is my understanding correct?

If so, isn't this more of a "break" than a "branch"? We not are not actually branching to $block, we are going to $more in fact. But the name "branch" implies to me "branch to". It seems nicer to do either

  1. (branch $more), which says "branch to $more" (where we actually branch to), or
  2. (break $block), which says "break on $block" (just like a C/JS/etc. break on a labeled block),

over the current state, which says (branch $block) but actually does not branch to $block.

Obviously 1 is a forward goto, which is bad. How about 2?

@kripken
Copy link
Member Author

kripken commented Nov 3, 2015

(bonus: we don't need to rename the br ;)

@sunfishcode
Copy link
Member

Another approach is to rearrange the syntax. Instead of putting the label at the top of a block, we could put it at the bottom:

(block
  (blah)
  (blah)
  (blah)
  $label
)

which makes it quite clear where control goes on a branch with that label.

@rossberg
Copy link
Member

rossberg commented Nov 4, 2015

I agree that "break" is the more adequate term.

@sunfishcode, labels are symbolic names for AST nodes, not "positions" in sequential code. The syntax you propose would not be good fit for that.

@sunfishcode
Copy link
Member

@rossberg-chromium It's a new language, so we can make labels be whatever we want. This issue appears subjective at this point.

@lukewagner
Copy link
Member

@rossberg-chromium You could consider the syntax for the block nodes to be (block ... $label), so it's still just syntax for an AST node.

Furthermore, if we do the analogous thing for loop:

(loop $back
  (blah)
  (blah)
  $forward)

then it seems 10x more readable and intuitive than (loop $back $forward (blah) (blah)). Also, I do think "branches" is the right metaphor here as we're designing an assembly language (it's in the name!).

@rossberg
Copy link
Member

rossberg commented Nov 4, 2015

@sunfishcode, well, if it involves changing the semantics of labels then it would be a change to the language as it has been spec'ed and implemented so far (e.g. in v8-proto). So in that sense, it's not just subjective.

@lukewagner, the name notwithstanding, a language with nesting structure and expressions is pretty remote from a "real" assembly language (for good reason!), so I'm not sure that is too strong an argument. The fact that we have breaks rather than random branches is a direct consequence of that structure. Sure, we could tweak syntax in various ways, but are you sure this isn't trying to shoehorn the language into a form that does not actually reflect its actual structure anymore?

@kripken
Copy link
Member Author

kripken commented Nov 4, 2015

I agree with @rossberg-chromium - yes, this is an "assembly", but it's an AST-based assembly. Doing something that seems odd for an AST is an unnecessary weirdness. And in AST-based forms, it is natural to see

L1: do {
  ..
  break L1;
  ..
} while (..)

e.g. in JavaScript, which is going to be a major point of comparison since this is an assembly for the Web (but the same is also in Java, etc.). It seems better to not have unnecessary odd differences, like putting the label at the end.

However, if people feel strongly that "branch" is meaningful here, then how about "branch out of"? (as opposed to the implied "branch to") That's novel, which is a downside, but at least it could work with the labels in their natural place on the top.

@sunfishcode
Copy link
Member

@rossberg-chromium I expect we'll end up with the same binary encoding either way, so even if you impose a formalism on this discussion which makes it technically a semantics question, it's still a subjective one.

@titzer
Copy link

titzer commented Nov 4, 2015

In which context do we need to spell out whether "br" stands for "break" or
"branch"? Can't we just simply refer to them as "br" and "br_if" and leave
the interpretation up to the reader (as long as the semantics are described
succintly)?

On Wed, Nov 4, 2015 at 9:08 AM, Dan Gohman [email protected] wrote:

@rossberg-chromium https://github.com/rossberg-chromium I expect we'll
end up with the same binary encoding either way, so even if you impose a
formalism on this discussion which makes it technically a semantics
question, it's still a subjective one.


Reply to this email directly or view it on GitHub
#445 (comment).

@kripken
Copy link
Member Author

kripken commented Nov 4, 2015

In the s-expression format, yes, br can stand for either. But in AstSemantics and in the future spec, we'll need to name this thing.

And it would be nice to not have some people call it a "branch" and others a "break". That already happened now: reading s-expression testcases, I read "break" in my head, then was discussing something with @sunfishcode, and he said "there is no break, there's just branch"...

@sunfishcode sunfishcode modified the milestone: MVP Nov 5, 2015
@qwertie
Copy link

qwertie commented Nov 6, 2015

I'm not sure why there appears to be disagreement. It seems like, in programmers' minds, the meaning of "break" and "branch" is well-established. This is break:

L1: do {
  ...
  break L1;
  ...
} while (...);

And this is branch:

do {
  ...
  goto L1;
  ...
} while (...); L1:

So if it's "branch", the label must go on the end in the text format. Anything else is confusing.

@kg
Copy link
Contributor

kg commented Nov 6, 2015

I expect we'll end up with the same binary encoding either way

I don't think it's reasonable to dismiss someone's concerns with this, given that we've actually made zero effort to verify that our changes don't regress binary encoding efficiency or significantly alter the binary representation.

@sunfishcode
Copy link
Member

Does this pertain to what I responded to here, or to break/branch?

On break/branch, regardless of the tree encoding, nodes will be opcodes and attributes. Break and branch would both have a single attribute, which for break could be a depth in the control flow stack identifying an AST node to break from or continue in, and for branch could be the depth in the label stack identifying a label to branch to. My main observation here is that these are the same, except for the words we use to describe them, and the appearance of the s-expression language.

@kripken
Copy link
Member Author

kripken commented Nov 6, 2015

I agree that these are basically the same: this is just a discussion about how to call this particular AST node. And I think there are strong reasons to call it "break":

  • The current s-expression format looks like a break, but we're calling it a branch in AstSemantics. I suspect this isn't more annoying just because the s-expression syntax has only a br! :) If it had "branch" but looked like a "break", the problem would be much more noticeable.
  • When designing a text format, it is much more familiar for people to use "break" notation, just like in JavaScript, Java, and others, since like them we are AST-based, and have blocks with labels and so forth. None of those put the label at the end, which is what calling it "branch" would require for consistency.
  • Putting the label at the end almost begs the question of why not just make the branch to the label of the thing right after it, which would avoid odd labels at the end, and make it look like a normal (forward) goto.
  • Branch/goto sound like they support unstructured control flow, but we don't support that.

@sunfishcode
Copy link
Member

The current s-expression format

can be changed

more familiar

to some people

JavaScript, Java, and others

Java bytecode, CLR bytecode, actual assembly languages, and others

we are AST-based

for reasons of compression and decode speed

odd

in your opinion

Branch/goto sound like they support unstructured control flow

Break sounds like it can't go to the top of a loop

shrug

@kripken
Copy link
Member Author

kripken commented Nov 6, 2015

I'm not sure you're taking this seriously :) If there's a joke here going over my head, apologies in advance...

The current s-expression format

can be changed

Of course. I'm arguing that the changing it that way is not good. There are known conventions for s-expression formats, for ASTs, and so forth.

Java bytecode, CLR bytecode, actual assembly languages, and others

Do any of those have an AST node where the label is at the end? (Honest question, not being snarky.)

we are AST-based

for reasons of compression and decode speed

Are you saying those are the only reasons? In particular, are you saying that that text format should not be AST-based? Perhaps this is the core issue here?

Break sounds like it can't go to the top of a loop

Indeed break can't get to the top of a loop. But we put a block inside the loop, and break out of that - isn't that the desugaring?

@lukewagner
Copy link
Member

a language with nesting structure and expressions is pretty remote from a "real" assembly language (for good reason!)

But we are first and foremost designing an assembly language / virtual ISA. The only reason for an AST structure is because it buys us something concrete (compact encoding, cheap phi placement, free use-def/liveness info for dumb compilers, ease of impl for compiler backends w/o support for irreducible control flow); were it not for those reasons, then we'd likely just have goto and some three-address-form-esque instructions. The argument that "higher level" constructs makes simple source-to-wasm compilers nicer sounds like a concrete win until you factor in the practical reality that any such compiler will want to reuse a wasm assembler library (to avoid rewriting variable-length immediate encoding, and all the twiddly details of a real binary format) which is something we'd likely publish here in github.com/WebAssembly and it will be quite easy for this library to have every high-level construct you could imagine.

@kripken
Copy link
Member Author

kripken commented Nov 6, 2015

I believe there is another concrete benefit to being an AST: it's better for view source. While we could have an arbitrary binary format and figure out an unrelated text representation on top, it's very nice to keep those harmonious and close.

In other words, if we call something a "branch" in the binary but call it a "break" in the text format (or, worse, put the labels at the end in the text format), we just introduce unnecessary confusion. Again, as @sunfishcode said, this is just naming things. But naming things matters for view source.

Calling it "break" seems by far the best option for every reasonable text format I can think of. Since this is just a name, why not call it that? There is no actual downside to the binary format.

Or do people not even want to think of the text format yet?

@lukewagner
Copy link
Member

An important distinction I didn't make in my comment above is "benefit" vs. "constraint". I don't think we should constrain the design of wasm around trying to make wasm look like a high-level language; as argued quite a few times before, that is the role of the source maps future feature; people want to read foo->bar++, not something with load, store, and integer offsets. Trying to pretend we have a high-level language in the few cases where high- and low-level constructs happen to coincide seems to miss this point.

@kripken
Copy link
Member Author

kripken commented Nov 6, 2015

  1. Source maps don't help people doing view-source. Imagine a person viewing source on a site that does something cool. They can see calls to Web APIs, that part can mostly make sense to them. Having control flow look familiar, around those calls, would be a huge benefit to them, even if there are a lot of loads and stores etc. that they can't immediately figure out.
  2. This does not constrain the binary format. We are literally talking about calling a thing "break" vs "branch" :) And I agree that it should not constrain. But wherever reasonable and possible, making the text format view-source-friendly is important, as well as keeping the text format harmonious with terminology elsewhere in WebAssembly (e.g. we wouldn't have i32 in the binary spec docs, but write Int32 in the text format, without strong reason).

@lukewagner
Copy link
Member

  1. Without symbols or source maps, view-sourcing wasm will be like view-sourcing minified asm.js: not especially pleasant; slightly higher-level ops are not going to move the needle.
  2. I'm more arguing the general point so as to avoid it getting mistaken for a general wasm design constraint we're trying to optimize for.

Anyhow, this is pretty much the epitome of bikeshedding, but "branch" follows from the recognition that we're designing a virtual ISA here (which happens to have sufficient efficiently-checktable constraints to guarantee reducibility), not a programming language.

@kripken
Copy link
Member Author

kripken commented Nov 7, 2015

Let me try to explain why I strongly disagree. To make this concrete, imagine I am a web dev, and I see a cool WebGL thing on a site. I want to see how it's done. I open view-source, and see this:

{
  x = (f() + q) >> 2;
  y = load x;
  z = arf.makeBuffer(y);
  if (!z) branch L1;
  q = wq(z);
} : L1;

Now, stuff like load and >> are a little confusing at first. Without knowing WebAssembly, a lot will look unfamiliar. But I see the call to makeBuffer, which I recognize as a WegGL call, so that's a good place to focus on. Then I see the output from that call can lead to a "branch", something I am not familiar with. Looking for L1, it's in a weird place. I sort of guess that a branch jumps to L1? But does it jump directly, can branches jump anywhere, etc. - all are questions that might come to mind, since it's novel and puzzling. And if there were several such nested loops and so forth, the confusion compounds. branch is extra noise.

Instead, if we had view source show

L1: {
  x = (f() + q) >> 2;
  y = load x;
  z = arf.makeBuffer(y);
  if (!z) break L1;
  q = wq(z);
}

then it looks natural to the many millions of people that know JavaScript, Java, etc. At least control flow looks natural and familiar, so figuring the rest out is much easier. Here we know that L1 is a label on a block, just like we are already familiar with, and breaks work in a simple, well-understood manner. @lukewagner, you dismissed this as "won't move the needle", but it's actually a huge deal.

This is not a pure bikeshed of picking a color. This has implications for the text format, which is crucial for the view-source capability, which is necessary for WebAssembly to fit in well on the Web and to succeed. If our view-source is the former (with branch) then people will be confused, less productive, and we'll see "wat"-style blogposts. We already saw great concern on Hacker News and elsewhere about the view-sourceability of WebAssembly. And we try to counter that in the FAQ, where we say

by dropping all the coercions required by asm.js validation, the WebAssembly text format should be much more natural to read and write than asm.js.

Dropping the coercions is great, certainly more natural to read that way. But replacing the familiar break with an unfamiliar branch, and moving labels to a nonstandard place, is very unnatural to the vast majority of people that will do view-source. The FAQ tries to say, "don't worry, we'll make view-source as good as possible for everyone", and keeping control flow looking natural is a big part of that. Importantly, it's something we can easily do, as opposed to other things that might be confusing but that we can't avoid.

Can we defer thinking about view-source for now? If we do we might end up with branch in the spec, but break in the text format - which I don't see how we can avoid. That would be an annoying inconsistency in WebAssembly.

But maybe someone can think of a better text format that does use branch, and would be equally natural to web developers (or better), so it could fulfill the promise we made in the FAQ?

@ghost
Copy link

ghost commented Nov 7, 2015

I think the use case of being able to generate a relative pretty source view from the wasm binary should be considered - this use case is not just bike shedding. The fact that a higher level language can be compiled to wasm binary form does not invalidate this use case. I think most programmers understand that expressions are easier to read than assembler code, because they hide irrelevant register allocation matters, and encapsulate the data flow and control flow.

Perhaps the generation of the pretty wasm source from the binary could be a plug-in module itself in web browsers, giving the user more flexibility, and deferring the issue here and now. But still the support for this could be a technical consideration now.

If it is trivial for a wasm binary pretty printer to emit the label at the start or end and to emit a branch or break then I suggest deferring the matter but keep it in mind when evaluating the binary format. If a little extra support is needed now then just accept a patch now.

Personally, if I had a choice I would use a wasm-binary pretty printer that emitted s-expressions, and I'd like to see the loop-break pattern too as it is familiar to me, but I would not impose this choice on anyone else.

@kg
Copy link
Contributor

kg commented Nov 7, 2015

Other decisions have already had a major negative impact on view-source quality (for example, the removal of globals), so this ship may have sailed already, @kripken. But I do agree that for your example scenario a raw textual representation of the AST would be much less confusing if labels and break functioned as they do in most imperative languages.

On the other hand, a view-source algorithm (and the standard text format, if we come up with a fancy new one) could do a lot of this instead of it being baked into the design. For example, the : L1 at the end of your example moves to the next line and becomes L1:, appearing to label the next statement - which represents the behavior correctly while seeming more familiar. At that point it's basically just goto, with branch as the kw instead (not as 'obvious' but certainly not bad or hard to understand.)

If you care about the quality of view-source and the text format it's important to think about the big picture - lots of things interact here. For example right now return is sugar that parses out to a break with a value. If you were to view-source that on the client, would it be a break? Would the client's view-source algorithm realize it's a return and show that? Do these transforms round-trip (text -> binary -> text -> ...)?

Casual familarity for view-source is probably lost already for wasm. Our current control flow approach discards the patterns that casual developers will be familiar with (if, for, switch, etc) in favor of lower-level constructs (raw heap offsets instead of named globals, br_if). So in that sense it really is just another assembly language and people who want to interact with it will have to learn it.

@lukewagner
Copy link
Member

Now, stuff like load and >> are a little confusing at first.
[...]
If our view-source is the former (with branch) then people will be confused, less productive, and
we'll see "wat"-style blogposts.

I think break vs. branch will contribute .1% wat if at all (not everyone view-sourcing wasm has a JS background; perhaps one day the majority won't); the entire rest of wasm being an ISA with loads and stores w/o names (remember, no source maps in this hypothetical case) will contribute 99.9% wat. I'm not saying view source isn't important, I'm saying that this one particular issue is not going to move the dial so we shouldn't start pretending we're a HLL in this one case. It's an assembly language (and this ship already sailed 5 years ago when Emscripten started compiling C++ to JS; I'm afraid this is all your fault ;-).

@kripken
Copy link
Member Author

kripken commented Nov 9, 2015

@kg: I don't think this is a lost cause. Yes, we don't have globals, and that's not great for readability. However, in effect, neither did asm.js - while we had the stack pointer and a few others, 99% of globals in 99% of asm.js code out there did not use named globals (they were lowered into constant addresses), and even if we did have them, minifiers (on asm.js or non-asm.js code) removed meaningful global names anyhow. So we are not regressing on the aspect of globals. But using "branch" and oddly-placed labels in the text format, on the other hand, would regress control flow significantly.

Putting labels at the end, and making them "face forward", is a possible compromise, as you suggest. (In which case we should possibly call them "goto" and not "break" or "branch", to be consistent with C, etc.?) But they are not true gotos due to the structural limitation - it actually does matter that they are on the proper AST node. That is,

{
  a() ? branch L1 : b(); // this is ok
  c() ? branch L2 : d(); // this is not
}
L1:
{
  e();
}
L2:
{
  f();
}

The reason the second branch is invalid is because of the AST structure, and I think hiding that makes things more confusing. L1 belongs to its AST node, not to the code after it.

Overall, the main question I have now is what I proposed earlier as a possible compromise: Would we be willing to call something a "branch" in the spec for the binary format, but write it out as a normal labeled "break" in the text format? In other words, would we be ok with that inconsistency between the spec and what users actually read?

If we are ok with that, then the entire problem in this issue goes away (and you can ignore the rest of this post): View-source can be as good as we can get it for people on the web, while assembly purists get to use the term "branch".

But the downside is a lack of consistency. A lot of people will experience WebAssembly via view-source, and they'll learn "breaks", and then feel puzzled if they some time later read something more in-depth (maybe even the spec), and see things are different ("this is exactly the same concept, why is now called a break? Am I missing something subtle here?"). Going in the other direction, people reading the spec (like compiler hackers) would see "branch", but then see something else when they view the text format representation of their first binaries. It feels wrong to me to have such an inconsistency; however, if we have nothing better, I can live with it. We could try to explain the inconsistency well in the spec, at least.

Another possible compromise is to split the text representation: There could be a "low-level" text format that mirrors the spec and has "branch", and a "web text format" which is free to find ways to make code more accessible to a wide audience. Again, the downsides are clear, but this does give both sides in the debate most of what they want (I think).

If we can't find a way to keep the text format as readable as possible (either one of the compromises, or moving to "break"), then we need to update the FAQ entry on view-source. Right now, that entry says

Will WebAssembly support View Source on the Web?

Yes! WebAssembly defines a text format to be rendered when developers view the source of a WebAssembly module in any developer tool. Also, a specific goal of the text format is to allow developers to write WebAssembly modules by hand for testing, experimenting, optimizing, learning and teaching purposes. In fact, by dropping all the coercions required by asm.js validation, the WebAssembly text format should be much more natural to read and write than asm.js. Outside the browser, command-line and online tools that convert between text and binary will also be made readily available. Lastly, a scalable form of source maps is also being considered as part of the WebAssembly tooling story.

In particular, the part saying "the WebAssembly text format should be much more natural to read and write than asm.js" would no longer be true if we make control flow less natural. And the rest would need updates as well, as the current text is very optimistic, but some voices here imply that really, view-source is already a losing battle, and WebAssembly is just continuing that trend, and that that isn't a problem. The only positive part of the current text that should clearly remain is the bit about source maps, but that doesn't address view-source on general web content.

Moving forward: If opponents of "break" are still not swayed, and my proposed compromises are not acceptable either, then we seem to be at an impasse. If so, then I think a next step should be getting feedback from the wider web community, since view-source is an important part of web culture - and we are just a small group of compiler hackers over here. I think input from a wider group could give us a broader perspective (and maybe it'll prove me wrong, and if so, of course I'm fine with that).

@ghost
Copy link

ghost commented Nov 9, 2015

The current loop/br approach had some claimed benefits, and might these also have some utility in reading the code too? If you were going to ask people then to be fair you would want to note the pros and cons of both approaches, not just ask them which is more familiar.

How would it affected these benefits if there were separate break and a continue operators? Where the continue operator was the only way to repeat a loop, but the break could be used to break from the loop and other blocks. Is a producer always going to be clear which it needs to emit?

Could the break label by split out from loop, so (block $break (loop $continue (blah) (blah) (continue $continue)))?

@lukewagner
Copy link
Member

If this is just an issue of what goes in the text format; then it seems like it can wait until we start defining the text format in earnest. Seems premature to attempt to micro-optimize usability before there is an established context or set of design criteria.

@kripken
Copy link
Member Author

kripken commented Nov 9, 2015

@lukewagner: I would be totally cool with that, if it is clear that the decision here does not constrain the text format (since break/branch in the text format is a major issue IMHO). That is precisely what I was asking in my last comment - what are your thoughts there?

@lukewagner
Copy link
Member

Well, if we're postponing what we decide for the text format, then there isn't a decision "here".

@kripken
Copy link
Member Author

kripken commented Nov 9, 2015

I meant: Are we all good with saying that we are ok with "branch" in the spec, and would be ok with "break" in the text format (should that be seen as the best option there regardless of the spec)?

In other words, I want to know we won't see this argument later on: "It's 'branch' in the spec, and it would be inconsistent to call it anything else in the text format, so because of our previous decisions it has to be 'branch' there too." (I see the appeal of such an argument myself, hence I am raising it; but as mentioned earlier, in the interests of compromise, I would be willing to overlook it.)

Or to put it another way: I want to know that we are not constraining the text format now, on the important matter of control flow. Totally cool to defer that discussion, as long as we are not making calls now that limit our options later.

@lukewagner
Copy link
Member

If we defer now, then we're not limiting our options later. I'm happy to defer now, since we've generally deferred defining the text format.

Also, fwiw, I am increasingly in favor of the proposal made above of having a "low-level" text format which is just branches and labels (we could even drop the syntactic block structure since it is kinda implied by the placement of labels and thus unnecessary) and a "high-level" text format which has the high-level control flow operators and other sugar we might want for the View Source and source-to-source translation use cases.

@ghost
Copy link

ghost commented Nov 9, 2015

@lukewagner Yes, I agree that the text format needs to be deferred, it would be a huge distraction, but is there any technical reason not to always use break to break out of an explicit block, and continue to repeat a loop, as asked above? I do recall some claimed benefit in the current design, that it supported a simpler algorithm than the relooper, and this might be a significant technical point to at least note in the rationale now. Also a change to flatten the control flow structure would be a huge change, and it would make huge difference to view-source and consumers.

@lukewagner
Copy link
Member

@JSStats The reason I'm aware of is so that tableswitch can simply list targets (w/o having to specify break vs. continue for each one). Discussion of this choice is probably out of scope for this issue and should get an issue of its own, though.

@ghost
Copy link

ghost commented Nov 10, 2015

I don't think tableswitch makes much of a difference as the view-source decoder could just emit break/continue in switch cases. I can't see any issue here for a view-source decoder, it seems to be free to emit a branch or break/continue as it pleases. Even at the binary level, it might just mean a difference between having separate label stacks for the break and continue targets versus a combined stack. So if it's just a naming issue, not a structural issue, it seems fine to defer it.

@kripken
Copy link
Member Author

kripken commented Nov 10, 2015

@lukewagner: great, glad to hear you don't think this limits our options later, and that you're starting to like the "split the text representation" proposal from before. I'll open a pull request with some of this, that we can discuss further.

@kripken
Copy link
Member Author

kripken commented Nov 10, 2015

Opened #457 to continue discussion of the "split the text representation" proposal.

@kripken
Copy link
Member Author

kripken commented Nov 19, 2015

Discussion in that pull (splitting the text format) has not managed to proceed. Meanwhile, I've been convinced by @ncbray and others that a "higher"-level text format is a harder problem than I had considered, since the decisions already made limit our text format options more than I appreciated. That makes me almost want to give up on trying to do this "right" from my perspective, but on the other hand, also that it really matters that we at least try to get the easier things that we can get right, which includes break/branch.

Have those opposed to "break" given some more thought to the compromise proposals given here?

If there is no change, then I have two concrete suggestions for how to move forward towards a resolution on break/branch (possibly leaving bigger text format issues for later). First, I've noticed that in practice, several of us say "branch" and "break" interchangeably, when speaking on this repo and the spec repo and irl. And, the actual name of the thing we are talking about is "br". This suggests that perhaps a simple solution is then to have the design (and future official docs) say something like "br: Move control flow to another location, with a structural limitation that it must be relative to an enclosing block. Looking at things as an AST, a br is equivalent to a labeled break in high-level languages like Java and JavaScript, i.e. it breaks out of the relevant block; or, looking at things as a low-level assembly, you can see the block's label as being at the end of the block, and the br is a branch to that location. The two perspectives are functionally equivalent.".

I think this could be a good resolution. It would use the combined intuitions behind both interpretations of br, which I think is better together as a whole. Thoughts?

If that (or the previous compromises) is not convincing for people here, then I think we can't avoid starting to think about the text format. And a good way to start there is to get some wider feedback, as suggested before, since view source is going to affect a lot of people. I am thinking about doing a small blogpost, something along the lines of "We are starting to think about the WebAssembly text format, which is important since it is what will be showed when you do "view source" in a browser. There isn't a clear direction for what the format should look like at this early stage, and it would be helpful to get people's feedback and suggestions, to help point us in the right way. In particular, here are a few very broad options, what are your thoughts?" And I was thinking to give three sketches of possible syntaxes:

  1. S-expression (described as what we have now, and as a very direct and simple mapping)
  2. High-level with breaks (described as higher-level and aiming to look familiar to a wide audience)
  3. Low-level with branch and labels at the end (described as lower-level and aiming to look like an assembly language)

Currently I don't know which I think is the best out of those options. I lean towards 2, but I've heard arguments that sway me towards 1. I see the benefit of 3, but it feels more suitable as a .s format that the LLVM backend is creating anyhow. Regardless, if feedback shows the community has a preference on 1, 2, or 3 (or something else), I'd be ok with whichever - I just want to know we are heading in a direction that is acceptable to people. And I'd move on to prototyping in more detail (assuming it's 2 or 3 or something else, since 1 actually doesn't need further prototyping).

Let me know if anyone wants to work together on the wording of such a blogpost.

@lukewagner
Copy link
Member

I don't think we'll have a very productive conversation if we start with an open-ended discussion without identifying a more specific set of use cases and constraints we're trying to optimize. There's a lot more use cases to consider than just the reverse-engineering-without-source-maps use case and we haven't really started that yet.

@ghost
Copy link

ghost commented Nov 19, 2015

Looks like to good resolution for now, so add: 'Definitions: br - br(eak) from a block or br(anch) to the end of a block, depending on your perspective.'

I would like to see the text format a plugin, so I can have familiar (but unpopular) s-exp! I think option 2 is the realistic one, namely to use high level familiar structures.

I think option 3 would be unfortunate. Personally I think calling this 'assembly' and making it look like assembler code would be a mistake - and that is coming from someone who learnt to program in machine code. The operations being primitive and close-to-the-metal seems orthogonal to the structure to me. When I read machine/assembler code the typical process is to work out the structure, to isolate code, remove the copying from registers to stacks etc - it's a lot of unnecessary mental work and a distraction and I hope the text format can avoid this which should make wasm much more accessible and productive.

@dherman
Copy link

dherman commented Nov 19, 2015

@kripken I'd strongly recommend against that approach. I'll be happy to give you some tips on ways to approach this process. I've been through a lot of language design processes. :)

@ghost
Copy link

ghost commented Nov 19, 2015

@lukewagner 'reverse-engineering' is a very loaded word and such claims might disadvantage web users rights. It is all source code to me. It might not all be the preferred source format of the code author, but it's still source code. The binary format is just a source encoding, not machine code and not even a virtual machine code, and printing it in some arbitrary text format is not disassembly or reverse engineering, just a mechanical translation.

@lukewagner
Copy link
Member

Sure, we can call it the "View Source without source maps" use case instead if we want to be careful about legal triggers. It's definitely a very important use case, but not the only one.

@ghost
Copy link

ghost commented Jan 6, 2016

Withdrawing objections. It seems practical to just add a flag in an optional source code meta data section to change the presentation of br to break or branch or continue even per instance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants