B.1.2 (LegacyOctalEscapeSequences) could be clearer #1975

jorendorff · 2020-04-27T14:03:35Z

Currently, with Annex B.1.2:

There are two lexical grammars, one with the extension for octal escapes and one without. Implementations that include B.1.2 must support both grammars, and even switch between them in the course of parsing a single script.

Implementing the spec directly seems logically impossible. For example, function f() { "\033"; "use strict"; } does not parse—it fails to match the lexical grammar, even though the octal escape is scanned before we know we're in strict mode code.
'\0' is an octal escape in nonstrict code, but not in strict mode code (where it is also legal, and has the same value).

This poses no serious implementation difficulties, but the spec is a real puzzle. I think the same things could be said more clearly:

Syntax that's illegal in strict mode is usually allowed in the grammar, then prohibited using an Early Error rule (see the spec for delete expressions, with statements, three rules about keywords, and indirectly in assigning to arguments or eval). B.1.2 should use this specification technique.
Instead of specifying with prose that the grammar extension must exist in some contexts and not others, B.1.2 should unconditionally extend the grammar with "EscapeSequence :: LegacyOctalEscapeSequence".
B.1.2 should not remove the production "EscapeSequence :: 0 [lookahead ∉ DecimalDigit]" from the grammar. To avoid ambiguity, the production

LegacyOctalEscapeSequence :: OctalDigit [lookahead ∉ OctalDigit]

can be changed to

LegacyOctalEscapeSequence :: NonZeroOctalDigit [lookahead ∉ OctalDigit]

NonZeroOctalDigit :: OctalDigit but not 0

Parsing \0 as "legacy" in any context is misleading.

I know you have better things to do, but I don't know how to notice something like this and not file the bug. Cheers. :)

The text was updated successfully, but these errors were encountered:

jmdyck · 2020-04-27T20:18:55Z

See PR #1867's ~~second~~ third commit. I believe my approach agrees with bullets 1 + 2 of your proposed solution, but I did something different for bullet 3. I don't remember if I considered your approach, but it looks better than what I came up with. (It's possible I thought that LegacyOctalEscapeSequence should derive the same forms before as after, but now I don't see why that should be required.)

jmdyck · 2021-05-26T04:27:33Z

@jorendorff, looking at this again, I see a small problem.

In the current spec, the first RHS for LegacyOctalEscapeSequence is

    OctalDigit [lookahead <! OctalDigit]

Switching to a compact notation, we can think of this as:

    [0-7] ~ [^0-7]

(where '~' means "if followed by") which we can rewrite as:

    (0 | [1-7]) ~ ([^0-9] | [89])

i.e., all of these combinations:

    0     ~ [^0-9]    (a)
    0     ~ [89]      (b)
    [1-7] ~ [^0-9]    (c)
    [1-7] ~ [89]      (d)

Now, if B.1.2 doesn't remove

    EscapeSequence :: 0 [lookahead <! DecimalDigit]

then that production takes care of (a), so LegacyOctalEscapeSequence must be defined to accept (b) + (c) + (d).

Your proposed

    NonZeroOctalDigit [lookahead <! OctalDigit]

only handles

    [1-7] [^0-7]

i.e., (c) + (d), and so misses (b).

(E.g., "\08" and "\09" are currently valid according to B.1.2 but not in your rewrite.)

So I think we'd need to add

    `0` [lookahead in {`8`, `9`}]

as another RHS for LegacyOctalEscapeSequence. Do you agree?

(Part of Annex B reform, see PR tc39#1595.) B.1.2 makes 2 changes to the EscapeSequence production: (1) It adds the rhs `NonOctalDecimalEscapeSequence`. (2) It replaces the rhs: `0` [lookahead <! DecimalDigit] with: LegacyOctalEscapeSequence where the latter nonterminal generates `0` among lots of other things. Change 1 is straightforward, but change 2 is tricky. In the EscapeSequence production, we can't simply replace the `0` alternative with LegacyOctalEscapeSequence (as B.1.2 does), because the `0` alternative must be treated differently from everything else that LegacyOctalEscapeSequence derives. (The `0` alternative is allowed in contexts where everything else that LegacyOctalEscapeSequence derives is forbidden.) So instead, we redefine LegacyOctalEscapeSequence to exclude the `0` alternative. Specifically, the 'overlap' comes from: LegacyOctalEscapeSequence :: OctalDigit [lookahead ∉ OctalDigit] so we replace that with: LegacyOctalEscapeSequence :: `0` [lookahead ∈ {`8`, `9`}] NonZeroOctalDigit [lookahead ∉ OctalDigit] (See Issue tc39#1975 for more details.) Resolves tc39#1975.

jorendorff referenced this issue in mozilla-spidermonkey/jsparagus Apr 27, 2020

js-quirks: Mention annex B.1.1 (legacy octal integer literals).

f60f293

michaelficarra added the editorial change label Apr 27, 2020

jmdyck mentioned this issue May 21, 2021

Normative: Make B.1.{1,2} normative #1867

Merged

ljharb closed this as completed in f79dfd2 Aug 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

B.1.2 (LegacyOctalEscapeSequences) could be clearer #1975

B.1.2 (LegacyOctalEscapeSequences) could be clearer #1975

jorendorff commented Apr 27, 2020 •

edited

Loading

jmdyck commented Apr 27, 2020 •

edited

Loading

jmdyck commented May 26, 2021

B.1.2 (LegacyOctalEscapeSequences) could be clearer #1975

B.1.2 (LegacyOctalEscapeSequences) could be clearer #1975

Comments

jorendorff commented Apr 27, 2020 • edited Loading

jmdyck commented Apr 27, 2020 • edited Loading

jmdyck commented May 26, 2021

jorendorff commented Apr 27, 2020 •

edited

Loading

jmdyck commented Apr 27, 2020 •

edited

Loading