Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Annex B reform next steps #1595

Open
2 of 22 tasks
littledan opened this issue Jun 19, 2019 · 27 comments
Open
2 of 22 tasks

Annex B reform next steps #1595

littledan opened this issue Jun 19, 2019 · 27 comments

Comments

@littledan
Copy link
Member

littledan commented Jun 19, 2019

In the June 2019 TC39 meeting, @erights raised the topic of Annex B reform. We reached consensus on two high-level points, but there remain many details to work out, which I'd like to elaborate on in this thread:

  • Many Annex B things could be made normative
  • The remaining Annex B items would be placed inline, with markup indicating that they are normative optional

In some more detail:

Making some Annex B things normative

The lens proposed by @erights was that we make normative everything which is "perfectly safe from a non-locality, causality perspective".

Of particular concern are grammar issues, where having multiple divergent grammars (both for HTML comments and RegExp grammars) leads to security and implementer confusion issues. Mark also proposed that other parts of the specification which are simply ugly, but not harmful from an SES/ocap perspective, be considered normative.

We didn't discuss what this means in detail, in particular, which things will go into the main spec. Based on the notes and my reading of the specification, I'd say that includes

Is any of the above in error?

Making Annex B inline

The general strategy for inline Annex B would be to follow what we've done with Intl for legacy constructor semantics. These are also "normative optional", but listed interspersed with other specification text. The idea is that this phrasing makes the text more readable, while preserving its optionality outside of web browsers. This should help avoid situations where people have read part of the specification, not realizing that another part modified it, resulting in confusion and non-Annex-B-compatible implementations when the intention was to be web-compatible.

See an example in the WeakRefs proposal of adding some inline Annex B text (PR), and similar text in Intl (PR).

@bterlson raised concerns about the accessibility of the Intl specification's normative optional text. I'm not an expert in this area; if someone has an idea for better CSS, or HTML generated from ecmarkup, it'd be great to have your help.

Items which @erights's presentation proposed to leave as normative optional, which I'd suggest should be inline normative optional:

Future proposals which could be inline normative optional, per @erights's suggestions:

Next steps

  • Discuss the above plan in this thread, so we can iterate on it as needed (discussion is not blocking drafting PRs, but blocks landing them)
  • Make a label for PRs towards this effort
  • Write a PR for each of the bullet points above (fine if you decide to group or split these out differently)
    • Several people can collaborate here. Check off the checklist items above once it's done (I believe all delegates should be able to edit this comment).
  • Iterate on the styling/HTML of the normative optional section (if needed)
  • In my opinion, "deprecation" language is not necessary, but if we decide on that as part of making some of these "normative", then we'd add a checkbox for this too.
  • Bring the package of PRs to a TC39 meeting, asking for consensus
@jmdyck
Copy link
Collaborator

jmdyck commented Jun 20, 2019

(Not sure if this is up for debate, but here goes anyway.)

See an example [...] of adding some inline Annex B text [...] in Intl (PR).

The markup is a little hard to discern between all the comments, so I'm going to work through a simple example. Say we have a 2-step normative algorithm:

<emu-alg>
  1. Normative step before.
  1. Normative step after.
</emu-alg>

and we want to inline a single normative-optional step into the middle. It looks like the markup as proposed in that PR would be:

<emu-alg>
  1. Normative step before.
</emu-alg>
<emu-normative-optional><span class="normative-optional">Normative Optional</span><div class="normative-optional-contents">
<emu-alg>
  2. Normative-optional step.
</div></emu-normative-optional>
</emu-alg>
<emu-alg>
  1. Normative step after.
</emu-alg>

Have I got that right?

So the HTML structure goes from

- emu-alg
   |
   |- (ecmarkdown text)

to

|- emu-alg
|  |
|  |- (ecmarkdown text)
|
|- emu-normative-optional
|  |
|  |- span
|  |- div
|     |
|     |- emu-alg
|        |
|        |- (ecmarkdown text)
|
|- emu-alg
   |
   |- (ecmarkdown text)

I.e., the insertion of a normative-optional step completely disrupts the ecmarkdown text, splitting it into 3 chunks that aren't complete algorithms and aren't even at the same 'level' of the markup tree. I don't know about anyone else, but this would certainly complicate the way that I process the spec.


Here's a radically different suggestion:

<emu-alg>
  1. Normative step before.
  1. If Something-Normative-Optional, then
    1. Normative-optional step.
  1. Normative step after.
</emu-alg>

and then have the rendering process detect "Something-Normative-Optional" and inject whatever HTML markup is necessary to achieve the desired appearance.

(I think this would also answer @bterlson's accessibility concerns, since everything you need to know appears in the algorithm text, so we're not requiring the reader to be able to discern the styling.)


Of course, Something-Normative-Optional is just a strawman placeholder. One interesting possibility would be to have a name for each "unit" of optionality, and then reference that, something like:

<emu-alg>
  1. Normative step before.
  1. If HostImplementsOptional("legacy_constructor_semantics"), then
    1. Normative-optional step.
  1. Normative step after.
</emu-alg>

Other benefits of naming each unit of optionality:

  • When a unit involves multiple discontiguous insertions, this alerts the reader that they constitute a whole: an implementation must have either all of them or none.
  • It provides a compact standard way for implementations to say which options they implement.

@littledan
Copy link
Member Author

@jmdyck All this is definitely up for discussion, thanks for your feedback. That idea could be good. I am wondering, how would you represent that an entire property or section is normative optional, as in the RegExp.prototype.compile or document.all cases?

@jmdyck
Copy link
Collaborator

jmdyck commented Jun 20, 2019

how would you represent that an entire property or section is normative optional, as in the RegExp.prototype.compile or document.all cases?

That I'm not so concerned about, but an attribute on the <emu-clause> seems fine, e.g. something like <emu-clause ... optional-unit="document.all">.

@littledan
Copy link
Member Author

How about <emu-clause ... normative-optional> ? I'm not sure if I want to break things up into multiple optional units.

@jmdyck
Copy link
Collaborator

jmdyck commented Jun 20, 2019

I'm not sure if I want to break things up into multiple optional units.

I thought it already was broken up into units. (I.e., each current B.x.y clause constitutes a unit that a non-browser implementation can choose to implement.) Are you saying that if a non-browser chooses to implement any of Annex B then it must implement all of it? (The spec doesn't seem to be clear on this point.)

@littledan
Copy link
Member Author

Some people have claimed that; I guess this is a point where there's disagreement between different people who read and edit the specification. I don't work on an environment which doesn't implement Annex B, so I can't really provide input as to what's needed. But I'd prefer to keep things somehow the same as before with respect to the optionality.

@erights
Copy link

erights commented Jun 20, 2019 via email

@littledan
Copy link
Member Author

For what it's worth, I didn't understand that we were asking the committee for consensus on whether it's ala carte or all-or-nothing. Anyway, I have no particular objection to ala carte, especially if the size of the "menu" is getting drastically smaller (two items?).

@erights
Copy link

erights commented Jun 20, 2019

drastically smaller (two items?)

Two now. More soon. That's why getting this resolved was so timely.

You get the items exactly right above:

Now:

  • RegExp.prototype.compile. I think you're right that the move to accessors may make this exception no longer needed.
  • document.all

Later. Some in proposals in progress:

  • WeakRef.prototype.constructor
  • Function.prototype.caller, Function.prototype.callee, Function.prototype.arguments
  • Error.prototype.stack
  • RegExp constructor legacy properties

@ljharb
Copy link
Member

ljharb commented Jun 20, 2019

compile still allows mutating otherwise immutable slots; I’d hope we can keep that normative optional.

@erights
Copy link

erights commented Jun 20, 2019

@ljharb That was the question that needed answering. Thanks.

@littledan
Copy link
Member Author

littledan commented Jun 20, 2019

@ljharb Is your goal here optionality for non-web environments, or is the underlying goal to make it formally deprecated? Any guarantee of immutability-by-default for the RegExp seems somehow weak when many environments will have it be mutable.

@syg
Copy link
Contributor

syg commented Jun 20, 2019

Should we shift up the other annexes or give Annex B an intentionally left blank tombstone? 🙃

@leobalter
Copy link
Member

Should we shift up the other annexes or give Annex B an intentionally left blank tombstone? 🙃

@syg maybe just reuse Annex B to describe the normative optional features?

@syg
Copy link
Contributor

syg commented Jun 20, 2019

maybe just reuse Annex B to describe the normative optional features?

Summarizing the inlined items sounds good to me!

@ljharb
Copy link
Member

ljharb commented Jun 20, 2019

@littledan the current status is that it's optional for non-web environments; at least I want to maintain that. It would be super great to remove it from the web entirely, of course, but that seems a separate effort from this issue.

@bakkot
Copy link
Contributor

bakkot commented Jun 24, 2019

@erights Are the slides from your talk at this past meeting in Berlin publicly available?

@erights
Copy link

erights commented Jun 25, 2019

attached:

annex-b.pdf

@chicoxyzzy
Copy link
Member

Possibly off-topic: There is Stage 0 proposal named "Annex B — HTML Attribute Event Handlers" in Stage 0 proposals list. I don't know what's the status of that proposal and what it is about exactly (it doesn't have its own repo or gist and I can't find any related discussions in notes repo), but maybe it could be important to mention it here. Sorry if it's not.

@ljharb
Copy link
Member

ljharb commented Jul 7, 2019

cc @allenwb ^ should that item still be on the active proposals list?

@allenwb
Copy link
Member

allenwb commented Jul 8, 2019

@ljharb

The motivation was the need to specify, from an ES perspective, the semantics of source code used as the value of an event handler HTML attribute. EG,

   <body onload="alert(this)" onclick="alert(this)">

This item was added to the strawman proposal list probably in early 2014. I believe this was before the HTML spec. had such a detailed specification of the processing of such attributes.

The current HTML specification probably eliminates the need to handle this as an Annex B item, but I think it illustrates that there are still specification layering issues regarding a clear semantics that supports host environments that want to provide a mechanism that takes JS source code and uses it as the body of a synthesized function definition. HTML event handler attributes do this, so does CJS when defining its modules. I believe other host also do similar things. The HTML spec. does a fair amount of low level ES spec. hackery to define its behavior, some of which would not be directly applicable to other hosts. It seems to me that it would be desirable to have a more generalized Host* ES interface that allows various hosts to define such functions without doing fragile spec. hacking.

So, maybe not Annex B issue anymore but probably something that should be ticketed as a spec. layering issue.

@ljharb
Copy link
Member

ljharb commented Jul 8, 2019

Sounds like we should remove the proposal, but someone who's involved with HTML should file that layering issue to pursue that goal. Thanks for the history!

@jmdyck
Copy link
Collaborator

jmdyck commented Jul 23, 2019

Is there more that needs to be decided here, or can people start submitting PRs? I'd gladly submit one that merges the Annex B grammar modifications into the main body.

@erights
Copy link

erights commented Jul 24, 2019

I'd gladly submit one that merges the Annex B grammar modifications into the main body.

Hi @jmdyck , thanks for the offer! Please proceed. I'm sure that we'll still run into controversy, but this is a good way forward. I am hopeful.

jmdyck added a commit to jmdyck/ecma262 that referenced this issue Jul 31, 2019
(Part of Annex B reform, see PR tc39#1595.)
@jmdyck
Copy link
Collaborator

jmdyck commented Jul 31, 2019

Draft PR #1651 available for early review.

jmdyck added a commit to jmdyck/ecma262 that referenced this issue Aug 1, 2019
(Part of Annex B reform, see PR tc39#1595.)
jmdyck added a commit to jmdyck/ecma262 that referenced this issue Aug 1, 2019
(Part of Annex B reform, see PR tc39#1595.)
jmdyck added a commit to jmdyck/ecma262 that referenced this issue Jun 10, 2021
jmdyck added a commit to jmdyck/ecma262 that referenced this issue Jun 10, 2021
(Part of Annex B reform, see PR tc39#1595.)

B.1.2 makes 2 changes to the EscapeSequence production:
(1) It adds the rhs `NonOctalDecimalEscapeSequence`.
(2) It replaces the rhs:
        `0` [lookahead &lt;! DecimalDigit]
    with:
        LegacyOctalEscapeSequence
    where the latter nonterminal generates `0` among lots of other things.

We want to continue to disallow such syntax in strict mode,
but the mechanism to do so must change.
Formerly, the spec would say that in such contexts,
it's forbidden to extend the syntax in this way.
But since (with this PR), this is no longer an extension,
we instead use early error rules to say that in such contexts,
occurrences of the 'new' parts of the syntax are Syntax Errors.

For change 1, making it a Syntax Error is fairly straightforward.

But for change 2, we can't simply say that
LegacyOctalEscapeSequence is a Syntax Error in strict mode,
because strict mode still has to allow the restricted syntax.

Instead, we say that if we're in strict mode code,
an instance of LegacyOctalEscapeSequence is a Syntax Error
*unless* it's an instance of the restricted syntax.
To express the latter condition,
we use the cover grammar machinery.
(It could be done in other ways, but I think this is clearest.)
jmdyck added a commit to jmdyck/ecma262 that referenced this issue Jun 15, 2021
jmdyck added a commit to jmdyck/ecma262 that referenced this issue Jun 15, 2021
(Merge its Syntax, Static Semantics, and Runtime Semantics into the main body.)

(Part of Annex B reform, see PR tc39#1595.)
jmdyck added a commit to jmdyck/ecma262 that referenced this issue Jun 23, 2021
jmdyck added a commit to jmdyck/ecma262 that referenced this issue Jun 23, 2021
(Merge its Syntax, Static Semantics, and Runtime Semantics into the main body.)

(Part of Annex B reform, see PR tc39#1595.)
jmdyck added a commit to jmdyck/ecma262 that referenced this issue Jul 11, 2021
jmdyck added a commit to jmdyck/ecma262 that referenced this issue Jul 11, 2021
(Part of Annex B reform, see PR tc39#1595.)

B.1.2 makes 2 changes to the EscapeSequence production:
(1) It adds the rhs `NonOctalDecimalEscapeSequence`.
(2) It replaces the rhs:
        `0` [lookahead &lt;! DecimalDigit]
    with:
        LegacyOctalEscapeSequence
    where the latter nonterminal generates `0` among lots of other things.

We want to continue to disallow such syntax in strict mode,
but the mechanism to do so must change.
Formerly, the spec would say that in such contexts,
it's forbidden to extend the syntax in this way.
But since (with this PR), this is no longer an extension,
we instead use early error rules to say that in such contexts,
occurrences of the 'new' parts of the syntax are Syntax Errors.

For change 1, making it a Syntax Error is fairly straightforward.

But for change 2, we can't simply say that
LegacyOctalEscapeSequence is a Syntax Error in strict mode,
because strict mode still has to allow the restricted syntax.

Instead, we say that if we're in strict mode code,
an instance of LegacyOctalEscapeSequence is a Syntax Error
*unless* it's an instance of the restricted syntax.
To express the latter condition,
we use the cover grammar machinery.
(It could be done in other ways, but I think this is clearest.)
jmdyck added a commit to jmdyck/ecma262 that referenced this issue Jul 11, 2021
jmdyck added a commit to jmdyck/ecma262 that referenced this issue Jul 11, 2021
(Merge its Syntax, Static Semantics, and Runtime Semantics into the main body.)

(Part of Annex B reform, see PR tc39#1595.)
jmdyck added a commit to jmdyck/ecma262 that referenced this issue Jul 18, 2021
jmdyck added a commit to jmdyck/ecma262 that referenced this issue Jul 18, 2021
(Merge its Syntax, Static Semantics, and Runtime Semantics into the main body.)

(Part of Annex B reform, see PR tc39#1595.)
jmdyck added a commit to jmdyck/ecma262 that referenced this issue Jul 24, 2021
jmdyck added a commit to jmdyck/ecma262 that referenced this issue Jul 24, 2021
(Part of Annex B reform, see PR tc39#1595.)

B.1.2 makes 2 changes to the EscapeSequence production:
(1) It adds the rhs `NonOctalDecimalEscapeSequence`.
(2) It replaces the rhs:
        `0` [lookahead &lt;! DecimalDigit]
    with:
        LegacyOctalEscapeSequence
    where the latter nonterminal generates `0` among lots of other things.

We want to continue to disallow such syntax in strict mode,
but the mechanism to do so must change.
Formerly, the spec would say that in such contexts,
it's forbidden to extend the syntax in this way.
But since (with this PR), this is no longer an extension,
we instead use early error rules to say that in such contexts,
occurrences of the 'new' parts of the syntax are Syntax Errors.

For change 1, making it a Syntax Error is fairly straightforward.

But for change 2, we can't simply say that
LegacyOctalEscapeSequence is a Syntax Error in strict mode,
because strict mode still has to allow the restricted syntax.

Instead, we say that if we're in strict mode code,
an instance of LegacyOctalEscapeSequence is a Syntax Error
*unless* it's an instance of the restricted syntax.
To express the latter condition,
we use the cover grammar machinery.
(It could be done in other ways, but I think this is clearest.)
jmdyck added a commit to jmdyck/ecma262 that referenced this issue Jul 24, 2021
jmdyck added a commit to jmdyck/ecma262 that referenced this issue Jul 24, 2021
(Merge its Syntax, Static Semantics, and Runtime Semantics into the main body.)

(Part of Annex B reform, see PR tc39#1595.)
jmdyck added a commit to jmdyck/ecma262 that referenced this issue Aug 12, 2021
jmdyck added a commit to jmdyck/ecma262 that referenced this issue Aug 12, 2021
(Part of Annex B reform, see PR tc39#1595.)

B.1.2 makes 2 changes to the EscapeSequence production:
(1) It adds the rhs `NonOctalDecimalEscapeSequence`.
(2) It replaces the rhs:
        `0` [lookahead &lt;! DecimalDigit]
    with:
        LegacyOctalEscapeSequence
    where the latter nonterminal generates `0` among lots of other things.

Change 1 is straightforward, but change 2 is tricky.
In the EscapeSequence production, we can't simply replace
the `0` alternative with LegacyOctalEscapeSequence (as B.1.2 does),
because the `0` alternative must be treated differently
from everything else that LegacyOctalEscapeSequence derives.
(The `0` alternative is allowed in contexts where
everything else that LegacyOctalEscapeSequence derives is forbidden.)
So instead, we redefine LegacyOctalEscapeSequence to exclude the `0` alternative.
Specifically, the 'overlap' comes from:

    LegacyOctalEscapeSequence ::
        OctalDigit [lookahead &notin; OctalDigit]

so we replace that with:

    LegacyOctalEscapeSequence ::
        `0` [lookahead &isin; {`8`, `9`}]
        NonZeroOctalDigit [lookahead &notin; OctalDigit]

(See Issue tc39#1975 for more details.)
Resolves tc39#1975.
jmdyck added a commit to jmdyck/ecma262 that referenced this issue Aug 17, 2021
jmdyck added a commit to jmdyck/ecma262 that referenced this issue Aug 17, 2021
(Merge its Syntax, Static Semantics, and Runtime Semantics into the main body.)

(Part of Annex B reform, see PR tc39#1595.)
jmdyck added a commit to jmdyck/ecma262 that referenced this issue Sep 14, 2021
jmdyck added a commit to jmdyck/ecma262 that referenced this issue Sep 14, 2021
(Merge its Syntax, Static Semantics, and Runtime Semantics into the main body.)

(Part of Annex B reform, see PR tc39#1595.)
jmdyck added a commit to jmdyck/ecma262 that referenced this issue Sep 24, 2021
jmdyck added a commit to jmdyck/ecma262 that referenced this issue Sep 24, 2021
(Merge its Syntax, Static Semantics, and Runtime Semantics into the main body.)

(Part of Annex B reform, see PR tc39#1595.)
jmdyck added a commit to jmdyck/ecma262 that referenced this issue Sep 29, 2021
jmdyck added a commit to jmdyck/ecma262 that referenced this issue Sep 29, 2021
(Merge its Syntax, Static Semantics, and Runtime Semantics into the main body.)

(Part of Annex B reform, see PR tc39#1595.)
mathiasbynens pushed a commit to mathiasbynens/ecma262 that referenced this issue Oct 18, 2021
mathiasbynens pushed a commit to mathiasbynens/ecma262 that referenced this issue Oct 18, 2021
(Part of Annex B reform, see PR tc39#1595.)

B.1.2 makes 2 changes to the EscapeSequence production:
(1) It adds the rhs `NonOctalDecimalEscapeSequence`.
(2) It replaces the rhs:
        `0` [lookahead &lt;! DecimalDigit]
    with:
        LegacyOctalEscapeSequence
    where the latter nonterminal generates `0` among lots of other things.

Change 1 is straightforward, but change 2 is tricky.
In the EscapeSequence production, we can't simply replace
the `0` alternative with LegacyOctalEscapeSequence (as B.1.2 does),
because the `0` alternative must be treated differently
from everything else that LegacyOctalEscapeSequence derives.
(The `0` alternative is allowed in contexts where
everything else that LegacyOctalEscapeSequence derives is forbidden.)
So instead, we redefine LegacyOctalEscapeSequence to exclude the `0` alternative.
Specifically, the 'overlap' comes from:

    LegacyOctalEscapeSequence ::
        OctalDigit [lookahead &notin; OctalDigit]

so we replace that with:

    LegacyOctalEscapeSequence ::
        `0` [lookahead &isin; {`8`, `9`}]
        NonZeroOctalDigit [lookahead &notin; OctalDigit]

(See Issue tc39#1975 for more details.)
Resolves tc39#1975.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants