-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
B.1.2 (LegacyOctalEscapeSequences) could be clearer #1975
Comments
See PR #1867's |
@jorendorff, looking at this again, I see a small problem. In the current spec, the first RHS for LegacyOctalEscapeSequence is
Switching to a compact notation, we can think of this as:
(where '~' means "if followed by") which we can rewrite as:
i.e., all of these combinations:
Now, if B.1.2 doesn't remove
then that production takes care of (a), so LegacyOctalEscapeSequence must be defined to accept (b) + (c) + (d). Your proposed
only handles
i.e., (c) + (d), and so misses (b). (E.g., "\08" and "\09" are currently valid according to B.1.2 but not in your rewrite.) So I think we'd need to add
as another RHS for LegacyOctalEscapeSequence. Do you agree? |
(Part of Annex B reform, see PR tc39#1595.) B.1.2 makes 2 changes to the EscapeSequence production: (1) It adds the rhs `NonOctalDecimalEscapeSequence`. (2) It replaces the rhs: `0` [lookahead <! DecimalDigit] with: LegacyOctalEscapeSequence where the latter nonterminal generates `0` among lots of other things. Change 1 is straightforward, but change 2 is tricky. In the EscapeSequence production, we can't simply replace the `0` alternative with LegacyOctalEscapeSequence (as B.1.2 does), because the `0` alternative must be treated differently from everything else that LegacyOctalEscapeSequence derives. (The `0` alternative is allowed in contexts where everything else that LegacyOctalEscapeSequence derives is forbidden.) So instead, we redefine LegacyOctalEscapeSequence to exclude the `0` alternative. Specifically, the 'overlap' comes from: LegacyOctalEscapeSequence :: OctalDigit [lookahead ∉ OctalDigit] so we replace that with: LegacyOctalEscapeSequence :: `0` [lookahead ∈ {`8`, `9`}] NonZeroOctalDigit [lookahead ∉ OctalDigit] (See Issue tc39#1975 for more details.) Resolves tc39#1975.
Currently, with Annex B.1.2:
There are two lexical grammars, one with the extension for octal escapes and one without. Implementations that include B.1.2 must support both grammars, and even switch between them in the course of parsing a single script.
Implementing the spec directly seems logically impossible. For example,
function f() { "\033"; "use strict"; }
does not parse—it fails to match the lexical grammar, even though the octal escape is scanned before we know we're in strict mode code.'\0'
is an octal escape in nonstrict code, but not in strict mode code (where it is also legal, and has the same value).This poses no serious implementation difficulties, but the spec is a real puzzle. I think the same things could be said more clearly:
Syntax that's illegal in strict mode is usually allowed in the grammar, then prohibited using an Early Error rule (see the spec for
delete
expressions,with
statements, three rules about keywords, and indirectly in assigning toarguments
oreval
). B.1.2 should use this specification technique.Instead of specifying with prose that the grammar extension must exist in some contexts and not others, B.1.2 should unconditionally extend the grammar with "EscapeSequence :: LegacyOctalEscapeSequence".
B.1.2 should not remove the production "EscapeSequence ::
0
[lookahead ∉ DecimalDigit]" from the grammar. To avoid ambiguity, the productioncan be changed to
Parsing
\0
as "legacy" in any context is misleading.I know you have better things to do, but I don't know how to notice something like this and not file the bug. Cheers. :)
The text was updated successfully, but these errors were encountered: