Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

B.1.2 (LegacyOctalEscapeSequences) could be clearer #1975

Closed
jorendorff opened this issue Apr 27, 2020 · 2 comments
Closed

B.1.2 (LegacyOctalEscapeSequences) could be clearer #1975

jorendorff opened this issue Apr 27, 2020 · 2 comments

Comments

@jorendorff
Copy link
Contributor

jorendorff commented Apr 27, 2020

Currently, with Annex B.1.2:

  • There are two lexical grammars, one with the extension for octal escapes and one without. Implementations that include B.1.2 must support both grammars, and even switch between them in the course of parsing a single script.

    Implementing the spec directly seems logically impossible. For example, function f() { "\033"; "use strict"; } does not parse—it fails to match the lexical grammar, even though the octal escape is scanned before we know we're in strict mode code.

  • '\0' is an octal escape in nonstrict code, but not in strict mode code (where it is also legal, and has the same value).

This poses no serious implementation difficulties, but the spec is a real puzzle. I think the same things could be said more clearly:

  • Syntax that's illegal in strict mode is usually allowed in the grammar, then prohibited using an Early Error rule (see the spec for delete expressions, with statements, three rules about keywords, and indirectly in assigning to arguments or eval). B.1.2 should use this specification technique.

  • Instead of specifying with prose that the grammar extension must exist in some contexts and not others, B.1.2 should unconditionally extend the grammar with "EscapeSequence :: LegacyOctalEscapeSequence".

  • B.1.2 should not remove the production "EscapeSequence :: 0 [lookahead ∉ DecimalDigit]" from the grammar. To avoid ambiguity, the production

    LegacyOctalEscapeSequence :: OctalDigit [lookahead ∉ OctalDigit]

    can be changed to

    LegacyOctalEscapeSequence :: NonZeroOctalDigit [lookahead ∉ OctalDigit]

    NonZeroOctalDigit :: OctalDigit but not 0

    Parsing \0 as "legacy" in any context is misleading.

I know you have better things to do, but I don't know how to notice something like this and not file the bug. Cheers. :)

@jmdyck
Copy link
Collaborator

jmdyck commented Apr 27, 2020

See PR #1867's second third commit. I believe my approach agrees with bullets 1 + 2 of your proposed solution, but I did something different for bullet 3. I don't remember if I considered your approach, but it looks better than what I came up with. (It's possible I thought that LegacyOctalEscapeSequence should derive the same forms before as after, but now I don't see why that should be required.)

@jmdyck
Copy link
Collaborator

jmdyck commented May 26, 2021

@jorendorff, looking at this again, I see a small problem.

In the current spec, the first RHS for LegacyOctalEscapeSequence is

    OctalDigit [lookahead <! OctalDigit]

Switching to a compact notation, we can think of this as:

    [0-7] ~ [^0-7]

(where '~' means "if followed by") which we can rewrite as:

    (0 | [1-7]) ~ ([^0-9] | [89])

i.e., all of these combinations:

    0     ~ [^0-9]    (a)
    0     ~ [89]      (b)
    [1-7] ~ [^0-9]    (c)
    [1-7] ~ [89]      (d)

Now, if B.1.2 doesn't remove

    EscapeSequence :: 0 [lookahead <! DecimalDigit]

then that production takes care of (a), so LegacyOctalEscapeSequence must be defined to accept (b) + (c) + (d).

Your proposed

    NonZeroOctalDigit [lookahead <! OctalDigit]

only handles

    [1-7] [^0-7]

i.e., (c) + (d), and so misses (b).

(E.g., "\08" and "\09" are currently valid according to B.1.2 but not in your rewrite.)

So I think we'd need to add

    `0` [lookahead in {`8`, `9`}]

as another RHS for LegacyOctalEscapeSequence. Do you agree?

@ljharb ljharb closed this as completed in f79dfd2 Aug 15, 2021
mathiasbynens pushed a commit to mathiasbynens/ecma262 that referenced this issue Oct 18, 2021
(Part of Annex B reform, see PR tc39#1595.)

B.1.2 makes 2 changes to the EscapeSequence production:
(1) It adds the rhs `NonOctalDecimalEscapeSequence`.
(2) It replaces the rhs:
        `0` [lookahead &lt;! DecimalDigit]
    with:
        LegacyOctalEscapeSequence
    where the latter nonterminal generates `0` among lots of other things.

Change 1 is straightforward, but change 2 is tricky.
In the EscapeSequence production, we can't simply replace
the `0` alternative with LegacyOctalEscapeSequence (as B.1.2 does),
because the `0` alternative must be treated differently
from everything else that LegacyOctalEscapeSequence derives.
(The `0` alternative is allowed in contexts where
everything else that LegacyOctalEscapeSequence derives is forbidden.)
So instead, we redefine LegacyOctalEscapeSequence to exclude the `0` alternative.
Specifically, the 'overlap' comes from:

    LegacyOctalEscapeSequence ::
        OctalDigit [lookahead &notin; OctalDigit]

so we replace that with:

    LegacyOctalEscapeSequence ::
        `0` [lookahead &isin; {`8`, `9`}]
        NonZeroOctalDigit [lookahead &notin; OctalDigit]

(See Issue tc39#1975 for more details.)
Resolves tc39#1975.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants