tc39 · jmdyck · Aug 12, 2024 · Jun 27, 2021 · Aug 13, 2024 · michaelficarra
diff --git a/spec.html b/spec.html
@@ -819,8 +819,24 @@ <h1>[empty]</h1>
 
       <emu-clause id="sec-lookahead-restrictions">
         <h1>Lookahead Restrictions</h1>
-        <p>If the phrase “[lookahead = _seq_]” appears in the right-hand side of a production, it indicates that the production may only be used if the token sequence _seq_ is a prefix of the immediately following input token sequence. Similarly, “[lookahead ∈ _set_]”, where _set_ is a finite non-empty set of token sequences, indicates that the production may only be used if some element of _set_ is a prefix of the immediately following token sequence. For convenience, the set can also be written as a nonterminal, in which case it represents the set of all token sequences to which that nonterminal could expand. It is considered an editorial error if the nonterminal could expand to infinitely many distinct token sequences.</p>
-        <p>These conditions may be negated. “[lookahead ≠ _seq_]” indicates that the containing production may only be used if _seq_ is <em>not</em> a prefix of the immediately following input token sequence, and “[lookahead ∉ _set_]” indicates that the production may only be used if <em>no</em> element of _set_ is a prefix of the immediately following token sequence.</p>
+        <p>When a phrase of the form “[lookahead …]” appears in the right-hand side of a production, it indicates that the production may only be used if the “lookahead” (the items that immediately follow the corresponding point in the input) satisfies a specified constraint. If the production is in the syntactic grammar, the items of the lookahead are input elements (mainly tokens); otherwise, they are code points.</p>
+        <p>The forms of lookahead restriction, along with the constraint that each imposes on the lookahead, are as follows:</p>
+        <ul>
+          <li>“[lookahead = _seq_]”: _seq_ matches a prefix of the lookahead</li>
+          <li>“[lookahead ≠ _seq_]”: _seq_ does <em>not</em> match any prefix of the lookahead</li>
+          <li>“[lookahead ∈ _set_]”: some element of _set_ matches a prefix of the lookahead</li>
+          <li>“[lookahead ∉ _set_]”: <em>no</em> element of _set_ matches any prefix of the lookahead</li>
+        </ul>
+        <p>In the above:</p>
+        <ul>
+          <li>_seq_ is a sequence of terminal symbols from the production's grammar; and</li>
+          <li>_set_ is either:
+            <ul>
+              <li>an explicit non-empty set of terminal sequences. In the syntactic grammar, such a sequence can also include a "[no LineTerminator here]" phrase.</li>
+              <li>a non-empty sequence of symbols from the production's grammar, including one nonterminal. This sequence represents the set of all terminal sequences to which that sequence could expand. In the syntactic grammar, it is considered an editorial error if the nonterminal could expand to infinitely many distinct terminal sequences. In other grammars, it is considered an editorial error if the nonterminal's expansion is not a regular set (i.e., if it isn't equivalent to a regular expression over code points).</li>
+            </ul>
+          </li>
+        </ul>
         <p>As an example, given the definitions:</p>
         <emu-grammar type="definition" example>
           DecimalDigit :: one of
@@ -837,7 +853,7 @@ <h1>Lookahead Restrictions</h1>
             DecimalDigit [lookahead &notin; DecimalDigit]
         </emu-grammar>
         <p>matches either the letter `n` followed by one or more decimal digits the first of which is even, or a decimal digit not followed by another decimal digit.</p>
-        <p>Note that when these phrases are used in the syntactic grammar, it may not be possible to unambiguously identify the immediately following token sequence because determining later tokens requires knowing which lexical goal symbol to use at later positions. As such, when these are used in the syntactic grammar, it is considered an editorial error for a token sequence _seq_ to appear in a lookahead restriction (including as part of a set of sequences) if the choices of lexical goal symbols to use could change whether or not _seq_ would be a prefix of the resulting token sequence.</p>
+        <p>Note that when these phrases are used in the syntactic grammar, it may not be possible to unambiguously identify the tokens in the lookahead because determining later tokens requires knowing which lexical goal symbol to use at later positions. As such, when these are used in the syntactic grammar, it is considered an editorial error for a token sequence _seq_ to appear in a lookahead restriction (including as part of a set of sequences) if the choices of lexical goal symbols to use could change whether or not _seq_ would be a prefix of the resulting token sequence.</p>
       </emu-clause>
 
       <emu-clause id="sec-no-lineterminator-here">
@@ -50304,7 +50320,7 @@ <h2>Syntax</h2>
 
     <emu-annex id="sec-regular-expressions-patterns">
       <h1>Regular Expressions Patterns</h1>
-      <p>The syntax of <emu-xref href="#sec-patterns"></emu-xref> is modified and extended as follows. These changes introduce ambiguities that are broken by the ordering of grammar productions and by contextual information. When parsing using the following grammar, each alternative is considered only if previous production alternatives do not match.</p>
+      <p>The syntax of <emu-xref href="#sec-patterns"></emu-xref> is modified and extended as follows.</p>
       <p>This alternative pattern grammar and semantics only changes the syntax and semantics of BMP patterns. The following grammar extensions include productions parameterized with the [UnicodeMode] parameter. However, none of these extensions change the syntax of Unicode patterns recognized when parsing with the [UnicodeMode] parameter present on the goal symbol.</p>
       <h2>Syntax</h2>
       <emu-grammar type="definition">
@@ -50334,13 +50350,13 @@ <h2>Syntax</h2>
 
         ExtendedAtom[NamedCaptureGroups] ::
           `.`
-          `\` AtomEscape[~UnicodeMode, ?NamedCaptureGroups]
-          `\` [lookahead == `c`]
+          `\` [lookahead &notin; { `b`, `B` }] AtomEscape[~UnicodeMode, ?NamedCaptureGroups]
+          `\` [lookahead == `c`] [lookahead != `c` AsciiLetter]
           CharacterClass[~UnicodeMode, ~UnicodeSetsMode]
           `(` GroupSpecifier[~UnicodeMode]? Disjunction[~UnicodeMode, ~UnicodeSetsMode, ?NamedCaptureGroups] `)`
           `(?:` Disjunction[~UnicodeMode, ~UnicodeSetsMode, ?NamedCaptureGroups] `)`
           InvalidBracedQuantifier
-          ExtendedPatternCharacter
+          [lookahead &notin; InvalidBracedQuantifier] ExtendedPatternCharacter
 
         InvalidBracedQuantifier ::
           `{` DecimalDigits[~Sep] `}`
@@ -50352,40 +50368,47 @@ <h2>Syntax</h2>
 
         AtomEscape[UnicodeMode, NamedCaptureGroups] ::
           [+UnicodeMode] DecimalEscape
-          [~UnicodeMode] DecimalEscape [> but only if the CapturingGroupNumber of |DecimalEscape| is &le; CountLeftCapturingParensWithin(the |Pattern| containing |DecimalEscape|)]
+          [~UnicodeMode] ConstrainedDecimalEscape
           CharacterClassEscape[?UnicodeMode]
-          CharacterEscape[?UnicodeMode, ?NamedCaptureGroups]
+          [+UnicodeMode] CharacterEscape[?UnicodeMode, ?NamedCaptureGroups]
+          [~UnicodeMode] [lookahead &notin; ConstrainedDecimalEscape] CharacterEscape[?UnicodeMode, ?NamedCaptureGroups]
           [+NamedCaptureGroups] `k` GroupName[?UnicodeMode]
 
+        ConstrainedDecimalEscape ::
+          DecimalEscape [> but only if the CapturingGroupNumber of |DecimalEscape| is &le; CountLeftCapturingParensWithin(the |Pattern| containing |DecimalEscape|)]
+
         CharacterEscape[UnicodeMode, NamedCaptureGroups] ::
           ControlEscape
           `c` AsciiLetter
           `0` [lookahead &notin; DecimalDigit]
           HexEscapeSequence
           RegExpUnicodeEscapeSequence[?UnicodeMode]
           [~UnicodeMode] LegacyOctalEscapeSequence
-          IdentityEscape[?UnicodeMode, ?NamedCaptureGroups]
+          [lookahead &notin; HexEscapeSequence] [lookahead &notin; RegExpUnicodeEscapeSequence] IdentityEscape[?UnicodeMode, ?NamedCaptureGroups]
 
         IdentityEscape[UnicodeMode, NamedCaptureGroups] ::
           [+UnicodeMode] SyntaxCharacter
           [+UnicodeMode] `/`
           [~UnicodeMode] SourceCharacterIdentityEscape[?NamedCaptureGroups]
 
         SourceCharacterIdentityEscape[NamedCaptureGroups] ::
-          [~NamedCaptureGroups] SourceCharacter but not `c`
-          [+NamedCaptureGroups] SourceCharacter but not one of `c` or `k`
+          [~NamedCaptureGroups] SourceCharacter but not one of `0` `1` `2` `3` `4` `5` `6` `7` `c` `f` `n` `r` `t` `v` `d` `s` `w` `D` `S` `W`
-          [~NamedCaptureGroups] SourceCharacter but not one of `0` `1` `2` `3` `4` `5` `6` `7` `c` `f` `n` `r` `t` `v` `d` `s` `w` `D` `S` `W`
+          [~NamedCaptureGroups] [lookahead &notin; OctalDigit] [lookahead &notin; ControlEscape] [lookahead &notin; CharacterClassEscape[?UnicodeMode]] SourceCharacter
-          [~NamedCaptureGroups] SourceCharacter but not one of `0` `1` `2` `3` `4` `5` `6` `7` `c` `f` `n` `r` `t` `v` `d` `s` `w` `D` `S` `W`
+          [~NamedCaptureGroups] [lookahead &notin; OctalDigit] [lookahead &notin; ControlEscape] [lookahead &notin; CharacterClassEscape[?UnicodeMode]] SourceCharacter
+          [+NamedCaptureGroups] SourceCharacter but not one of `0` `1` `2` `3` `4` `5` `6` `7` `c` `f` `n` `r` `t` `v` `d` `s` `w` `D` `S` `W` `k`
+          `or`
+          [~NamedCaptureGroups] SourceCharacter but not one of OctalDigit or ControlEscape or CharacterClassEscape or `c`
+          [+NamedCaptureGroups] SourceCharacter but not one of OctalDigit or ControlEscape or CharacterClassEscape or `c` or `k`
 
         ClassAtomNoDash[UnicodeMode, NamedCaptureGroups] ::
           SourceCharacter but not one of `\` or `]` or `-`
           `\` ClassEscape[?UnicodeMode, ?NamedCaptureGroups]
-          `\` [lookahead == `c`]
+          `\` [lookahead == `c`] [lookahead != `c` ClassControlLetter] [lookahead != `c` AsciiLetter]
 
         ClassEscape[UnicodeMode, NamedCaptureGroups] ::
           `b`
           [+UnicodeMode] `-`
           [~UnicodeMode] `c` ClassControlLetter
           CharacterClassEscape[?UnicodeMode]
-          CharacterEscape[?UnicodeMode, ?NamedCaptureGroups]
+          [lookahead != `b`] CharacterEscape[?UnicodeMode, ?NamedCaptureGroups]
 
         ClassControlLetter ::
           DecimalDigit