Skip to content

Commit

Permalink
Normative: Make B.1.2 "String Literals" normative.
Browse files Browse the repository at this point in the history
(Part of Annex B reform, see PR #1595.)

B.1.2 makes 2 changes to the EscapeSequence production:
(1) It adds the rhs `NonOctalDecimalEscapeSequence`.
(2) It replaces the rhs:
        `0` [lookahead <! DecimalDigit]
    with:
        LegacyOctalEscapeSequence
    where the latter nonterminal generates `0` among lots of other things.

We want to continue to disallow such syntax in strict mode and templates.
but the mechanism to do much change.
Formerly, the spec would say that in such contexts,
it's forbidden to extend the syntax in this way.
But since (with this PR), this is no longer an extension,
we instead use early error rules to say that in such contexts,
occurrences of the 'new' parts of the syntax are Syntax Errors.

For change 1, making it a Syntax Error is fairly straightforward.

But for change 2, we can't simply say that
LegacyOctalEscapeSequence is a Syntax Error in strict mode,
because strict mode still has to allow the restricted syntax.

Instead, we say that if we're in strict mode code,
an instance of LegacyOctalEscapeSequence is a Syntax Error
*unless* it's an instance of the restricted syntax.
To express the latter condition,
we use the cover grammar machinery.
(It could be done in other ways, but I think this is clearest.)
  • Loading branch information
jmdyck committed Oct 17, 2020
1 parent e21f0b4 commit 6186f0d
Showing 1 changed file with 115 additions and 93 deletions.
208 changes: 115 additions & 93 deletions spec.html
Original file line number Diff line number Diff line change
Expand Up @@ -11322,12 +11322,11 @@ <h2>Syntax</h2>

EscapeSequence ::
CharacterEscapeSequence
`0` [lookahead &lt;! DecimalDigit]
LegacyOctalEscapeSequence
NonOctalDecimalEscapeSequence
HexEscapeSequence
UnicodeEscapeSequence
</emu-grammar>
<p>A conforming implementation, when processing strict mode code, must not extend the syntax of |EscapeSequence| to include <emu-xref href="#prod-annexB-LegacyOctalEscapeSequence"></emu-xref> or <emu-xref href="#prod-annexB-NonOctalDecimalEscapeSequence"></emu-xref> as described in <emu-xref href="#sec-additional-syntax-string-literals"></emu-xref>.</p>
<emu-grammar type="definition">

CharacterEscapeSequence ::
SingleEscapeCharacter
NonEscapeCharacter
Expand All @@ -11344,6 +11343,21 @@ <h2>Syntax</h2>
`x`
`u`

LegacyOctalEscapeSequence ::
OctalDigit [lookahead &lt;! OctalDigit]
ZeroToThree OctalDigit [lookahead &lt;! OctalDigit]
FourToSeven OctalDigit
ZeroToThree OctalDigit OctalDigit

ZeroToThree :: one of
`0` `1` `2` `3`

FourToSeven :: one of
`4` `5` `6` `7`

NonOctalDecimalEscapeSequence :: one of
`8` `9`

HexEscapeSequence ::
`x` HexDigit HexDigit

Expand All @@ -11359,6 +11373,36 @@ <h2>Syntax</h2>
<p>&lt;LF&gt; and &lt;CR&gt; cannot appear in a string literal, except as part of a |LineContinuation| to produce the empty code points sequence. The proper way to include either in the String value of a string literal is to use an escape sequence such as `\\n` or `\\u000A`.</p>
</emu-note>

<h2>Supplemental Syntax</h2>
<p>When processing an instance of the production <emu-grammar>LegacyOctalEscapeSequence :: OctalDigit</emu-grammar> the following production is used to refine the interpretation of |LegacyOctalEscapeSequence|.</p>
<emu-grammar type="definition">
StrictZeroEscapeSequence ::
`0` [lookahead &lt;! DecimalDigit]
</emu-grammar>

<emu-clause id="sec-string-literals-early-errors">
<h1>Static Semantics: Early Errors</h1>
<emu-grammar>
EscapeSequence :: LegacyOctalEscapeSequence
</emu-grammar>
<ul>
<li>It is a Syntax Error if |EscapeSequence| is not covering a |StrictZeroEscapeSequence| and either the source code matching this production is strict mode code or |EscapeSequence| is contained within a |TemplateCharacter|.</li>
</ul>
<emu-grammar>
EscapeSequence :: NonOctalDecimalEscapeSequence
</emu-grammar>
<ul>
<li>It is a Syntax Error if the source code matching this production is strict mode code or |EscapeSequence| is contained within a |TemplateCharacter|.</li>
</ul>
<emu-note>In non-strict code, this syntax is allowed, but deprecated.</emu-note>
<emu-note>
<p>It is possible for string literals to precede a Use Strict Directive that places the enclosing code in <emu-xref href="#sec-strict-mode-code">strict mode</emu-xref>, and implementations must take care to enforce the above rules for such literals. For example, the following source text contains a Syntax Error:</p>
<pre><code class="javascript">
function invalid() { "\7"; "use strict"; }
</code></pre>
</emu-note>
</emu-clause>

<emu-clause id="sec-string-literals-static-semantics-stringvalue">
<h1>Static Semantics: StringValue</h1>
<emu-see-also-para op="StringValue"></emu-see-also-para>
Expand All @@ -11372,7 +11416,7 @@ <h1>Static Semantics: StringValue</h1>
</emu-alg>
</emu-clause>

<emu-clause id="sec-static-semantics-sv">
<emu-clause id="sec-static-semantics-sv" oldids="sec-additional-syntax-string-literals-static-semantics">
<h1>Static Semantics: SV</h1>
<p>A string literal stands for a value of the String type. The String value (SV) of the literal is described in terms of code unit values contributed by the various parts of the string literal. As part of this process, some Unicode code points within the string literal are interpreted as having a mathematical value (MV), as described below or in <emu-xref href="#sec-literals-numeric-literals"></emu-xref>.</p>
<ul>
Expand Down Expand Up @@ -11418,9 +11462,6 @@ <h1>Static Semantics: SV</h1>
<li>
The SV of <emu-grammar>SingleStringCharacter :: LineContinuation</emu-grammar> is the empty code unit sequence.
</li>
<li>
The SV of <emu-grammar>EscapeSequence :: `0`</emu-grammar> is the code unit 0x0000 (NULL).
</li>
<li>
The SV of <emu-grammar>CharacterEscapeSequence :: SingleEscapeCharacter</emu-grammar> is the code unit whose value is determined by the |SingleEscapeCharacter| according to <emu-xref href="#table-string-single-character-escape-sequences"></emu-xref>.
</li>
Expand Down Expand Up @@ -11575,6 +11616,24 @@ <h1>Static Semantics: SV</h1>
<li>
The SV of <emu-grammar>NonEscapeCharacter :: SourceCharacter but not one of EscapeCharacter or LineTerminator</emu-grammar> is the result of performing CodePointToUTF16CodeUnits on the code point value of |SourceCharacter|.
</li>
<li>
The SV of <emu-grammar>LegacyOctalEscapeSequence :: OctalDigit</emu-grammar> is the code unit whose value is the MV of |OctalDigit|.
</li>
<li>
The SV of <emu-grammar>LegacyOctalEscapeSequence :: ZeroToThree OctalDigit</emu-grammar> is the code unit whose value is (8 times the MV of |ZeroToThree|) plus the MV of |OctalDigit|.
</li>
<li>
The SV of <emu-grammar>LegacyOctalEscapeSequence :: FourToSeven OctalDigit</emu-grammar> is the code unit whose value is (8 times the MV of |FourToSeven|) plus the MV of |OctalDigit|.
</li>
<li>
The SV of <emu-grammar>LegacyOctalEscapeSequence :: ZeroToThree OctalDigit OctalDigit</emu-grammar> is the code unit whose value is (64 (that is, 8<sup>2</sup>) times the MV of |ZeroToThree|) plus (8 times the MV of the first |OctalDigit|) plus the MV of the second |OctalDigit|.
</li>
<li>
The SV of <emu-grammar>NonOctalDecimalEscapeSequence :: `8`</emu-grammar> is the code unit 0x0038 (DIGIT EIGHT).
</li>
<li>
The SV of <emu-grammar>NonOctalDecimalEscapeSequence :: `9`</emu-grammar> is the code unit 0x0039 (DIGIT NINE).
</li>
<li>
The SV of <emu-grammar>HexEscapeSequence :: `x` HexDigit HexDigit</emu-grammar> is the code unit whose value is (16 times the MV of the first |HexDigit|) plus the MV of the second |HexDigit|.
</li>
Expand All @@ -11586,6 +11645,36 @@ <h1>Static Semantics: SV</h1>
</li>
</ul>
</emu-clause>

<emu-clause id="sec-string-literals-static-semantics-mv">
<h1>Static Semantics: MV</h1>
<ul>
<li>
The MV of <emu-grammar>ZeroToThree :: `0`</emu-grammar> is 0.
</li>
<li>
The MV of <emu-grammar>ZeroToThree :: `1`</emu-grammar> is 1.
</li>
<li>
The MV of <emu-grammar>ZeroToThree :: `2`</emu-grammar> is 2.
</li>
<li>
The MV of <emu-grammar>ZeroToThree :: `3`</emu-grammar> is 3.
</li>
<li>
The MV of <emu-grammar>FourToSeven :: `4`</emu-grammar> is 4.
</li>
<li>
The MV of <emu-grammar>FourToSeven :: `5`</emu-grammar> is 5.
</li>
<li>
The MV of <emu-grammar>FourToSeven :: `6`</emu-grammar> is 6.
</li>
<li>
The MV of <emu-grammar>FourToSeven :: `7`</emu-grammar> is 7.
</li>
</ul>
</emu-clause>
</emu-clause>

<emu-clause id="sec-literals-regular-expression-literals">
Expand Down Expand Up @@ -11723,10 +11812,12 @@ <h2>Syntax</h2>
CodePoint ::
HexDigits[~Sep] [> but only if MV of |HexDigits| &le; 0x10FFFF]
</emu-grammar>
<p>A conforming implementation must not use the extended definition of |EscapeSequence| described in <emu-xref href="#sec-additional-syntax-string-literals"></emu-xref> when parsing a |TemplateCharacter|.</p>
<emu-note>
<p>|TemplateSubstitutionTail| is used by the |InputElementTemplateTail| alternative lexical goal.</p>
</emu-note>
<emu-note>
<p>Instances of the production <emu-grammar>TemplateCharacter :: `\` EscapeSequence</emu-grammar> are restricted by early error rules in <emu-xref href="#sec-string-literals-early-errors"></emu-xref>.</p>
</emu-note>

<emu-clause id="sec-static-semantics-tv-and-trv">
<h1>Static Semantics: TV and TRV</h1>
Expand Down Expand Up @@ -11781,7 +11872,7 @@ <h1>Static Semantics: TV and TRV</h1>
The TRV of <emu-grammar>TemplateCharacter :: `\` NotEscapeSequence</emu-grammar> is the sequence consisting of the code unit 0x005C (REVERSE SOLIDUS) followed by the code units of TRV of |NotEscapeSequence|.
</li>
<li>
The TRV of <emu-grammar>EscapeSequence :: `0`</emu-grammar> is the code unit 0x0030 (DIGIT ZERO).
The TRV of <emu-grammar>EscapeSequence :: LegacyOctalEscapeSequence</emu-grammar> is the code unit 0x0030 (DIGIT ZERO).
</li>
<li>
The TRV of <emu-grammar>NotEscapeSequence :: `0` DecimalDigit</emu-grammar> is the sequence consisting of the code unit 0x0030 (DIGIT ZERO) followed by the code units of the TRV of |DecimalDigit|.
Expand Down Expand Up @@ -24723,9 +24814,6 @@ <h1>Forbidden Extensions</h1>
<li>
The Syntactic Grammar must not be extended in any manner that allows the token `:` to immediately follow source text that matches the |BindingIdentifier| nonterminal symbol.
</li>
<li>
|TemplateCharacter| must not be extended to include <emu-xref href="#prod-annexB-LegacyOctalEscapeSequence"></emu-xref> or <emu-xref href="#prod-annexB-NonOctalDecimalEscapeSequence"></emu-xref> as defined in <emu-xref href="#sec-additional-syntax-string-literals"></emu-xref>.
</li>
<li>
When processing strict mode code, the extensions defined in <emu-xref href="#sec-labelled-function-declarations"></emu-xref>, <emu-xref href="#sec-block-level-function-declarations-web-legacy-compatibility-semantics"></emu-xref>, <emu-xref href="#sec-functiondeclarations-in-ifstatement-statement-clauses"></emu-xref>, and <emu-xref href="#sec-initializers-in-forin-statement-heads"></emu-xref> must not be supported.
</li>
Expand Down Expand Up @@ -41568,9 +41656,16 @@ <h1>Lexical Grammar</h1>
<emu-prodref name="SingleEscapeCharacter"></emu-prodref>
<emu-prodref name="NonEscapeCharacter"></emu-prodref>
<emu-prodref name="EscapeCharacter"></emu-prodref>
<emu-prodref name="LegacyOctalEscapeSequence"></emu-prodref>
<emu-prodref name="ZeroToThree"></emu-prodref>
<emu-prodref name="FourToSeven"></emu-prodref>
<emu-prodref name="NonOctalDecimalEscapeSequence"></emu-prodref>
<emu-prodref name="HexEscapeSequence"></emu-prodref>
<emu-prodref name="UnicodeEscapeSequence"></emu-prodref>
<emu-prodref name="Hex4Digits"></emu-prodref>
<p>When processing an instance of the production <emu-prodref name="LegacyOctalEscapeSequence"></emu-prodref> the following production is used to refine the interpretation of |LegacyOctalEscapeSequence|.</p>
<emu-prodref name="StrictZeroEscapeSequence"></emu-prodref>
<p>&nbsp;</p>
<emu-prodref name="RegularExpressionLiteral"></emu-prodref>
<emu-prodref name="RegularExpressionBody"></emu-prodref>
<emu-prodref name="RegularExpressionChars"></emu-prodref>
Expand Down Expand Up @@ -41921,86 +42016,13 @@ <h1>Numeric Literals</h1>

<emu-annex id="sec-additional-syntax-string-literals">
<h1>String Literals</h1>
<p>The syntax and semantics of <emu-xref href="#sec-literals-string-literals"></emu-xref> is extended as follows except that this extension is not allowed for strict mode code:</p>
<h2>Syntax</h2>
<emu-grammar type="definition">
EscapeSequence ::
CharacterEscapeSequence
LegacyOctalEscapeSequence
NonOctalDecimalEscapeSequence
HexEscapeSequence
UnicodeEscapeSequence

LegacyOctalEscapeSequence ::
OctalDigit [lookahead &lt;! OctalDigit]
ZeroToThree OctalDigit [lookahead &lt;! OctalDigit]
FourToSeven OctalDigit
ZeroToThree OctalDigit OctalDigit

ZeroToThree :: one of
`0` `1` `2` `3`

FourToSeven :: one of
`4` `5` `6` `7`
<p>The following syntax from <emu-xref href="#sec-literals-string-literals"></emu-xref>, and its associated semantics, used to be normative optional:</p>
<emu-grammar>
EscapeSequence :: LegacyOctalEscapeSequence

NonOctalDecimalEscapeSequence :: one of
`8` `9`
EscapeSequence :: NonOctalDecimalEscapeSequence
</emu-grammar>
<p>This definition of |EscapeSequence| is not used in strict mode or when parsing |TemplateCharacter|.</p>
<emu-note>
<p>It is possible for string literals to precede a Use Strict Directive that places the enclosing code in <emu-xref href="#sec-strict-mode-code">strict mode</emu-xref>, and implementations must take care to not use this extended definition of |EscapeSequence| with such literals. For example, attempting to parse the following source text must fail:</p>
<pre><code class="javascript">
function invalid() { "\7"; "use strict"; }
</code></pre>
</emu-note>

<emu-annex id="sec-additional-syntax-string-literals-static-semantics">
<h1>Static Semantics</h1>
<ul>
<li>
The SV of <emu-grammar>LegacyOctalEscapeSequence :: OctalDigit</emu-grammar> is the code unit whose value is the MV of |OctalDigit|.
</li>
<li>
The SV of <emu-grammar>LegacyOctalEscapeSequence :: ZeroToThree OctalDigit</emu-grammar> is the code unit whose value is (8 times the MV of |ZeroToThree|) plus the MV of |OctalDigit|.
</li>
<li>
The SV of <emu-grammar>LegacyOctalEscapeSequence :: FourToSeven OctalDigit</emu-grammar> is the code unit whose value is (8 times the MV of |FourToSeven|) plus the MV of |OctalDigit|.
</li>
<li>
The SV of <emu-grammar>LegacyOctalEscapeSequence :: ZeroToThree OctalDigit OctalDigit</emu-grammar> is the code unit whose value is (64 (that is, 8<sup>2</sup>) times the MV of |ZeroToThree|) plus (8 times the MV of the first |OctalDigit|) plus the MV of the second |OctalDigit|.
</li>
<li>
The SV of <emu-grammar>NonOctalDecimalEscapeSequence :: `8`</emu-grammar> is the code unit 0x0038 (DIGIT EIGHT).
</li>
<li>
The SV of <emu-grammar>NonOctalDecimalEscapeSequence :: `9`</emu-grammar> is the code unit 0x0039 (DIGIT NINE).
</li>
<li>
The MV of <emu-grammar>ZeroToThree :: `0`</emu-grammar> is 0.
</li>
<li>
The MV of <emu-grammar>ZeroToThree :: `1`</emu-grammar> is 1.
</li>
<li>
The MV of <emu-grammar>ZeroToThree :: `2`</emu-grammar> is 2.
</li>
<li>
The MV of <emu-grammar>ZeroToThree :: `3`</emu-grammar> is 3.
</li>
<li>
The MV of <emu-grammar>FourToSeven :: `4`</emu-grammar> is 4.
</li>
<li>
The MV of <emu-grammar>FourToSeven :: `5`</emu-grammar> is 5.
</li>
<li>
The MV of <emu-grammar>FourToSeven :: `6`</emu-grammar> is 6.
</li>
<li>
The MV of <emu-grammar>FourToSeven :: `7`</emu-grammar> is 7.
</li>
</ul>
</emu-annex>
<p>and the productions for |LegacyOctalEscapeSequence|, |ZeroToThree|, and |FourToSeven|.</p>
</emu-annex>

<emu-annex id="sec-html-like-comments">
Expand Down Expand Up @@ -42207,7 +42229,7 @@ <h1>Static Semantics: CharacterValue</h1>
</emu-alg>
<emu-grammar>CharacterEscape :: LegacyOctalEscapeSequence</emu-grammar>
<emu-alg>
1. Evaluate the SV of |LegacyOctalEscapeSequence| (see <emu-xref href="#sec-additional-syntax-string-literals"></emu-xref>) to obtain a code unit _cu_.
1. Evaluate the SV of |LegacyOctalEscapeSequence| (see <emu-xref href="#sec-static-semantics-sv"></emu-xref>) to obtain a code unit _cu_.
1. Return the numeric value of _cu_.
</emu-alg>
</emu-annex>
Expand Down Expand Up @@ -43190,7 +43212,7 @@ <h1>The Strict Mode of ECMAScript</h1>
A conforming implementation, when processing strict mode code, must disallow instances of the productions <emu-grammar>NumericLiteral :: LegacyOctalIntegerLiteral</emu-grammar> and <emu-grammar>DecimalIntegerLiteral :: NonOctalDecimalIntegerLiteral</emu-grammar>.
</li>
<li>
A conforming implementation, when processing strict mode code, may not extend the syntax of |EscapeSequence| to include <emu-xref href="#prod-annexB-LegacyOctalEscapeSequence"></emu-xref> or <emu-xref href="#prod-annexB-NonOctalDecimalEscapeSequence"></emu-xref> as described in <emu-xref href="#sec-additional-syntax-string-literals"></emu-xref>.
A conforming implementation, when processing strict mode code, must disallow instances of the production <emu-grammar>EscapeSequence :: LegacyOctalEscapeSequence</emu-grammar> that do not cover a |StrictZeroEscapeSequence|, and instances of the production <emu-grammar>EscapeSequence :: NonOctalDecimalEscapeSequence</emu-grammar>.
</li>
<li>
Assignment to an undeclared identifier or otherwise unresolvable reference does not create a property in the global object. When a simple assignment occurs within strict mode code, its |LeftHandSideExpression| must not evaluate to an unresolvable Reference. If it does a *ReferenceError* exception is thrown (<emu-xref href="#sec-putvalue"></emu-xref>). The |LeftHandSideExpression| also may not be a reference to a data property with the attribute value { [[Writable]]: *false* }, to an accessor property with the attribute value { [[Set]]: *undefined* }, nor to a non-existent property of an object whose [[Extensible]] internal slot has the value *false*. In these cases a `TypeError` exception is thrown (<emu-xref href="#sec-assignment-operators"></emu-xref>).
Expand Down

0 comments on commit 6186f0d

Please sign in to comment.