From 0e25cc26f8a8444b3aa60819483a62f76a7042a2 Mon Sep 17 00:00:00 2001 From: Mathias Bynens Date: Sun, 15 Aug 2021 13:52:59 -0700 Subject: [PATCH] =?UTF-8?q?Editorial:=20Clarify=20RegExp=20grammar=20param?= =?UTF-8?q?eter=20U=20=E2=86=92=20UnicodeMode=20(#2411)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This makes the parameter more easily searchable while clarifying its meaning. --- spec.html | 254 +++++++++++++++++++++++++++--------------------------- 1 file changed, 127 insertions(+), 127 deletions(-) diff --git a/spec.html b/spec.html index 7c6fbee6d9..8a72011b56 100644 --- a/spec.html +++ b/spec.html @@ -27789,7 +27789,7 @@

Forbidden Extensions

The behaviour of built-in methods which are specified in ECMA-402, such as those named `toLocaleString`, must not be extended except as specified in ECMA-402.
  • - The RegExp pattern grammars in and must not be extended to recognize any of the source characters A-Z or a-z as |IdentityEscape[+U]| when the [U] grammar parameter is present. + The RegExp pattern grammars in and must not be extended to recognize any of the source characters A-Z or a-z as |IdentityEscape[+UnicodeMode]| when the [UnicodeMode] grammar parameter is present.
  • The Syntactic Grammar must not be extended in any manner that allows the token `:` to immediately follow source text that matches the |BindingIdentifier| nonterminal symbol. @@ -33850,31 +33850,31 @@

    Patterns

    The RegExp constructor applies the following grammar to the input pattern String. An error occurs if the grammar cannot interpret the String as an expansion of |Pattern|.

    Syntax

    - Pattern[U, N] :: - Disjunction[?U, ?N] + Pattern[UnicodeMode, N] :: + Disjunction[?UnicodeMode, ?N] - Disjunction[U, N] :: - Alternative[?U, ?N] - Alternative[?U, ?N] `|` Disjunction[?U, ?N] + Disjunction[UnicodeMode, N] :: + Alternative[?UnicodeMode, ?N] + Alternative[?UnicodeMode, ?N] `|` Disjunction[?UnicodeMode, ?N] - Alternative[U, N] :: + Alternative[UnicodeMode, N] :: [empty] - Alternative[?U, ?N] Term[?U, ?N] + Alternative[?UnicodeMode, ?N] Term[?UnicodeMode, ?N] - Term[U, N] :: - Assertion[?U, ?N] - Atom[?U, ?N] - Atom[?U, ?N] Quantifier + Term[UnicodeMode, N] :: + Assertion[?UnicodeMode, ?N] + Atom[?UnicodeMode, ?N] + Atom[?UnicodeMode, ?N] Quantifier - Assertion[U, N] :: + Assertion[UnicodeMode, N] :: `^` `$` `\` `b` `\` `B` - `(` `?` `=` Disjunction[?U, ?N] `)` - `(` `?` `!` Disjunction[?U, ?N] `)` - `(` `?` `<=` Disjunction[?U, ?N] `)` - `(` `?` `<!` Disjunction[?U, ?N] `)` + `(` `?` `=` Disjunction[?UnicodeMode, ?N] `)` + `(` `?` `!` Disjunction[?UnicodeMode, ?N] `)` + `(` `?` `<=` Disjunction[?UnicodeMode, ?N] `)` + `(` `?` `<!` Disjunction[?UnicodeMode, ?N] `)` Quantifier :: QuantifierPrefix @@ -33888,13 +33888,13 @@

    Syntax

    `{` DecimalDigits[~Sep] `,` `}` `{` DecimalDigits[~Sep] `,` DecimalDigits[~Sep] `}` - Atom[U, N] :: + Atom[UnicodeMode, N] :: PatternCharacter `.` - `\` AtomEscape[?U, ?N] - CharacterClass[?U] - `(` GroupSpecifier[?U] Disjunction[?U, ?N] `)` - `(` `?` `:` Disjunction[?U, ?N] `)` + `\` AtomEscape[?UnicodeMode, ?N] + CharacterClass[?UnicodeMode] + `(` GroupSpecifier[?UnicodeMode] Disjunction[?UnicodeMode, ?N] `)` + `(` `?` `:` Disjunction[?UnicodeMode, ?N] `)` SyntaxCharacter :: one of `^` `$` `\` `.` `*` `+` `?` `(` `)` `[` `]` `{` `}` `|` @@ -33902,19 +33902,19 @@

    Syntax

    PatternCharacter :: SourceCharacter but not SyntaxCharacter - AtomEscape[U, N] :: + AtomEscape[UnicodeMode, N] :: DecimalEscape - CharacterClassEscape[?U] - CharacterEscape[?U] - [+N] `k` GroupName[?U] + CharacterClassEscape[?UnicodeMode] + CharacterEscape[?UnicodeMode] + [+N] `k` GroupName[?UnicodeMode] - CharacterEscape[U] :: + CharacterEscape[UnicodeMode] :: ControlEscape `c` ControlLetter `0` [lookahead ∉ DecimalDigit] HexEscapeSequence - RegExpUnicodeEscapeSequence[?U] - IdentityEscape[?U] + RegExpUnicodeEscapeSequence[?UnicodeMode] + IdentityEscape[?UnicodeMode] ControlEscape :: one of `f` `n` `r` `t` `v` @@ -33923,39 +33923,39 @@

    Syntax

    `a` `b` `c` `d` `e` `f` `g` `h` `i` `j` `k` `l` `m` `n` `o` `p` `q` `r` `s` `t` `u` `v` `w` `x` `y` `z` `A` `B` `C` `D` `E` `F` `G` `H` `I` `J` `K` `L` `M` `N` `O` `P` `Q` `R` `S` `T` `U` `V` `W` `X` `Y` `Z` - GroupSpecifier[U] :: + GroupSpecifier[UnicodeMode] :: [empty] - `?` GroupName[?U] + `?` GroupName[?UnicodeMode] - GroupName[U] :: - `<` RegExpIdentifierName[?U] `>` + GroupName[UnicodeMode] :: + `<` RegExpIdentifierName[?UnicodeMode] `>` - RegExpIdentifierName[U] :: - RegExpIdentifierStart[?U] - RegExpIdentifierName[?U] RegExpIdentifierPart[?U] + RegExpIdentifierName[UnicodeMode] :: + RegExpIdentifierStart[?UnicodeMode] + RegExpIdentifierName[?UnicodeMode] RegExpIdentifierPart[?UnicodeMode] - RegExpIdentifierStart[U] :: + RegExpIdentifierStart[UnicodeMode] :: UnicodeIDStart `$` `_` - `\` RegExpUnicodeEscapeSequence[+U] - [~U] UnicodeLeadSurrogate UnicodeTrailSurrogate + `\` RegExpUnicodeEscapeSequence[+UnicodeMode] + [~UnicodeMode] UnicodeLeadSurrogate UnicodeTrailSurrogate - RegExpIdentifierPart[U] :: + RegExpIdentifierPart[UnicodeMode] :: UnicodeIDContinue `$` - `\` RegExpUnicodeEscapeSequence[+U] - [~U] UnicodeLeadSurrogate UnicodeTrailSurrogate + `\` RegExpUnicodeEscapeSequence[+UnicodeMode] + [~UnicodeMode] UnicodeLeadSurrogate UnicodeTrailSurrogate <ZWNJ> <ZWJ> - RegExpUnicodeEscapeSequence[U] :: - [+U] `u` HexLeadSurrogate `\u` HexTrailSurrogate - [+U] `u` HexLeadSurrogate - [+U] `u` HexTrailSurrogate - [+U] `u` HexNonSurrogate - [~U] `u` Hex4Digits - [+U] `u{` CodePoint `}` + RegExpUnicodeEscapeSequence[UnicodeMode] :: + [+UnicodeMode] `u` HexLeadSurrogate `\u` HexTrailSurrogate + [+UnicodeMode] `u` HexLeadSurrogate + [+UnicodeMode] `u` HexTrailSurrogate + [+UnicodeMode] `u` HexNonSurrogate + [~UnicodeMode] `u` Hex4Digits + [+UnicodeMode] `u{` CodePoint `}` UnicodeLeadSurrogate :: > any Unicode code point in the inclusive range 0xD800 to 0xDBFF @@ -33974,23 +33974,23 @@

    Syntax

    HexNonSurrogate :: Hex4Digits [> but only if the MV of |Hex4Digits| is not in the inclusive range 0xD800 to 0xDFFF] - IdentityEscape[U] :: - [+U] SyntaxCharacter - [+U] `/` - [~U] SourceCharacter but not UnicodeIDContinue + IdentityEscape[UnicodeMode] :: + [+UnicodeMode] SyntaxCharacter + [+UnicodeMode] `/` + [~UnicodeMode] SourceCharacter but not UnicodeIDContinue DecimalEscape :: NonZeroDigit DecimalDigits[~Sep]? [lookahead ∉ DecimalDigit] - CharacterClassEscape[U] :: + CharacterClassEscape[UnicodeMode] :: `d` `D` `s` `S` `w` `W` - [+U] `p{` UnicodePropertyValueExpression `}` - [+U] `P{` UnicodePropertyValueExpression `}` + [+UnicodeMode] `p{` UnicodePropertyValueExpression `}` + [+UnicodeMode] `P{` UnicodePropertyValueExpression `}` UnicodePropertyValueExpression :: UnicodePropertyName `=` UnicodePropertyValue @@ -34019,37 +34019,37 @@

    Syntax

    ControlLetter `_` - CharacterClass[U] :: - `[` [lookahead != `^`] ClassRanges[?U] `]` - `[` `^` ClassRanges[?U] `]` + CharacterClass[UnicodeMode] :: + `[` [lookahead != `^`] ClassRanges[?UnicodeMode] `]` + `[` `^` ClassRanges[?UnicodeMode] `]` - ClassRanges[U] :: + ClassRanges[UnicodeMode] :: [empty] - NonemptyClassRanges[?U] + NonemptyClassRanges[?UnicodeMode] - NonemptyClassRanges[U] :: - ClassAtom[?U] - ClassAtom[?U] NonemptyClassRangesNoDash[?U] - ClassAtom[?U] `-` ClassAtom[?U] ClassRanges[?U] + NonemptyClassRanges[UnicodeMode] :: + ClassAtom[?UnicodeMode] + ClassAtom[?UnicodeMode] NonemptyClassRangesNoDash[?UnicodeMode] + ClassAtom[?UnicodeMode] `-` ClassAtom[?UnicodeMode] ClassRanges[?UnicodeMode] - NonemptyClassRangesNoDash[U] :: - ClassAtom[?U] - ClassAtomNoDash[?U] NonemptyClassRangesNoDash[?U] - ClassAtomNoDash[?U] `-` ClassAtom[?U] ClassRanges[?U] + NonemptyClassRangesNoDash[UnicodeMode] :: + ClassAtom[?UnicodeMode] + ClassAtomNoDash[?UnicodeMode] NonemptyClassRangesNoDash[?UnicodeMode] + ClassAtomNoDash[?UnicodeMode] `-` ClassAtom[?UnicodeMode] ClassRanges[?UnicodeMode] - ClassAtom[U] :: + ClassAtom[UnicodeMode] :: `-` - ClassAtomNoDash[?U] + ClassAtomNoDash[?UnicodeMode] - ClassAtomNoDash[U] :: + ClassAtomNoDash[UnicodeMode] :: SourceCharacter but not one of `\` or `]` or `-` - `\` ClassEscape[?U] + `\` ClassEscape[?UnicodeMode] - ClassEscape[U] :: + ClassEscape[UnicodeMode] :: `b` - [+U] `-` - CharacterClassEscape[?U] - CharacterEscape[?U] + [+UnicodeMode] `-` + CharacterClassEscape[?UnicodeMode] + CharacterEscape[?UnicodeMode]
    @@ -35413,11 +35413,11 @@

    1. If _u_ is *true*, then - 1. Let _parseResult_ be ParseText(_patternText_, |Pattern[+U, +N]|). + 1. Let _parseResult_ be ParseText(_patternText_, |Pattern[+UnicodeMode, +N]|). 1. Else, - 1. Let _parseResult_ be ParseText(_patternText_, |Pattern[~U, ~N]|). + 1. Let _parseResult_ be ParseText(_patternText_, |Pattern[~UnicodeMode, ~N]|). 1. If _parseResult_ is a Parse Node and _parseResult_ contains a |GroupName|, then - 1. Set _parseResult_ to ParseText(_patternText_, |Pattern[~U, +N]|). + 1. Set _parseResult_ to ParseText(_patternText_, |Pattern[~UnicodeMode, +N]|). 1. Return _parseResult_. @@ -35447,7 +35447,7 @@

    - 1. Let _S_ be a String in the form of a |Pattern[~U]| (|Pattern[+U]| if _F_ contains *"u"*) equivalent to _P_ interpreted as UTF-16 encoded Unicode code points (), in which certain code points are escaped as described below. _S_ may or may not be identical to _P_; however, the Abstract Closure that would result from evaluating _S_ as a |Pattern[~U]| (|Pattern[+U]| if _F_ contains *"u"*) must behave identically to the Abstract Closure given by the constructed object's [[RegExpMatcher]] internal slot. Multiple calls to this abstract operation using the same values for _P_ and _F_ must produce identical results. + 1. Let _S_ be a String in the form of a |Pattern[~UnicodeMode]| (|Pattern[+UnicodeMode]| if _F_ contains *"u"*) equivalent to _P_ interpreted as UTF-16 encoded Unicode code points (), in which certain code points are escaped as described below. _S_ may or may not be identical to _P_; however, the Abstract Closure that would result from evaluating _S_ as a |Pattern[~UnicodeMode]| (|Pattern[+UnicodeMode]| if _F_ contains *"u"*) must behave identically to the Abstract Closure given by the constructed object's [[RegExpMatcher]] internal slot. Multiple calls to this abstract operation using the same values for _P_ and _F_ must produce identical results. 1. The code points `/` or any |LineTerminator| occurring in the pattern shall be escaped in _S_ as necessary to ensure that the string-concatenation of *"/"*, _S_, *"/"*, and _F_ can be parsed (in an appropriate lexical context) as a |RegularExpressionLiteral| that behaves identically to the constructed regular expression. For example, if _P_ is *"/"*, then _S_ could be *"\\/"* or *"\\u002F"*, among other possibilities, but not *"/"*, because `///` followed by _F_ would be parsed as a |SingleLineComment| rather than a |RegularExpressionLiteral|. If _P_ is the empty String, this specification can be met by letting _S_ be *"(?:)"*. 1. Return _S_. @@ -46176,40 +46176,40 @@

    Syntax

    Regular Expressions Patterns

    The syntax of is modified and extended as follows. These changes introduce ambiguities that are broken by the ordering of grammar productions and by contextual information. When parsing using the following grammar, each alternative is considered only if previous production alternatives do not match.

    -

    This alternative pattern grammar and semantics only changes the syntax and semantics of BMP patterns. The following grammar extensions include productions parameterized with the [U] parameter. However, none of these extensions change the syntax of Unicode patterns recognized when parsing with the [U] parameter present on the goal symbol.

    +

    This alternative pattern grammar and semantics only changes the syntax and semantics of BMP patterns. The following grammar extensions include productions parameterized with the [UnicodeMode] parameter. However, none of these extensions change the syntax of Unicode patterns recognized when parsing with the [UnicodeMode] parameter present on the goal symbol.

    Syntax

    - Term[U, N] :: - [+U] Assertion[+U, ?N] - [+U] Atom[+U, ?N] Quantifier - [+U] Atom[+U, ?N] - [~U] QuantifiableAssertion[?N] Quantifier - [~U] Assertion[~U, ?N] - [~U] ExtendedAtom[?N] Quantifier - [~U] ExtendedAtom[?N] - - Assertion[U, N] :: + Term[UnicodeMode, N] :: + [+UnicodeMode] Assertion[+UnicodeMode, ?N] + [+UnicodeMode] Atom[+UnicodeMode, ?N] Quantifier + [+UnicodeMode] Atom[+UnicodeMode, ?N] + [~UnicodeMode] QuantifiableAssertion[?N] Quantifier + [~UnicodeMode] Assertion[~UnicodeMode, ?N] + [~UnicodeMode] ExtendedAtom[?N] Quantifier + [~UnicodeMode] ExtendedAtom[?N] + + Assertion[UnicodeMode, N] :: `^` `$` `\` `b` `\` `B` - [+U] `(` `?` `=` Disjunction[+U, ?N] `)` - [+U] `(` `?` `!` Disjunction[+U, ?N] `)` - [~U] QuantifiableAssertion[?N] - `(` `?` `<=` Disjunction[?U, ?N] `)` - `(` `?` `<!` Disjunction[?U, ?N] `)` + [+UnicodeMode] `(` `?` `=` Disjunction[+UnicodeMode, ?N] `)` + [+UnicodeMode] `(` `?` `!` Disjunction[+UnicodeMode, ?N] `)` + [~UnicodeMode] QuantifiableAssertion[?N] + `(` `?` `<=` Disjunction[?UnicodeMode, ?N] `)` + `(` `?` `<!` Disjunction[?UnicodeMode, ?N] `)` QuantifiableAssertion[N] :: - `(` `?` `=` Disjunction[~U, ?N] `)` - `(` `?` `!` Disjunction[~U, ?N] `)` + `(` `?` `=` Disjunction[~UnicodeMode, ?N] `)` + `(` `?` `!` Disjunction[~UnicodeMode, ?N] `)` ExtendedAtom[N] :: `.` - `\` AtomEscape[~U, ?N] + `\` AtomEscape[~UnicodeMode, ?N] `\` [lookahead == `c`] - CharacterClass[~U] - `(` Disjunction[~U, ?N] `)` - `(` `?` `:` Disjunction[~U, ?N] `)` + CharacterClass[~UnicodeMode] + `(` Disjunction[~UnicodeMode, ?N] `)` + `(` `?` `:` Disjunction[~UnicodeMode, ?N] `)` InvalidBracedQuantifier ExtendedPatternCharacter @@ -46221,49 +46221,49 @@

    Syntax

    ExtendedPatternCharacter :: SourceCharacter but not one of `^` `$` `\` `.` `*` `+` `?` `(` `)` `[` `|` - AtomEscape[U, N] :: - [+U] DecimalEscape - [~U] DecimalEscape [> but only if the CapturingGroupNumber of |DecimalEscape| is ≤ _NcapturingParens_] - CharacterClassEscape[?U] - CharacterEscape[?U, ?N] - [+N] `k` GroupName[?U] + AtomEscape[UnicodeMode, N] :: + [+UnicodeMode] DecimalEscape + [~UnicodeMode] DecimalEscape [> but only if the CapturingGroupNumber of |DecimalEscape| is ≤ _NcapturingParens_] + CharacterClassEscape[?UnicodeMode] + CharacterEscape[?UnicodeMode, ?N] + [+N] `k` GroupName[?UnicodeMode] - CharacterEscape[U, N] :: + CharacterEscape[UnicodeMode, N] :: ControlEscape `c` ControlLetter `0` [lookahead ∉ DecimalDigit] HexEscapeSequence - RegExpUnicodeEscapeSequence[?U] - [~U] LegacyOctalEscapeSequence - IdentityEscape[?U, ?N] + RegExpUnicodeEscapeSequence[?UnicodeMode] + [~UnicodeMode] LegacyOctalEscapeSequence + IdentityEscape[?UnicodeMode, ?N] - IdentityEscape[U, N] :: - [+U] SyntaxCharacter - [+U] `/` - [~U] SourceCharacterIdentityEscape[?N] + IdentityEscape[UnicodeMode, N] :: + [+UnicodeMode] SyntaxCharacter + [+UnicodeMode] `/` + [~UnicodeMode] SourceCharacterIdentityEscape[?N] SourceCharacterIdentityEscape[N] :: [~N] SourceCharacter but not `c` [+N] SourceCharacter but not one of `c` or `k` - ClassAtomNoDash[U, N] :: + ClassAtomNoDash[UnicodeMode, N] :: SourceCharacter but not one of `\` or `]` or `-` - `\` ClassEscape[?U, ?N] + `\` ClassEscape[?UnicodeMode, ?N] `\` [lookahead == `c`] - ClassEscape[U, N] :: + ClassEscape[UnicodeMode, N] :: `b` - [+U] `-` - [~U] `c` ClassControlLetter - CharacterClassEscape[?U] - CharacterEscape[?U, ?N] + [+UnicodeMode] `-` + [~UnicodeMode] `c` ClassControlLetter + CharacterClassEscape[?UnicodeMode] + CharacterEscape[?UnicodeMode, ?N] ClassControlLetter :: DecimalDigit `_`
    -

    When the same left hand sides occurs with both [+U] and [\~U] guards it is to control the disambiguation priority.

    +

    When the same left-hand sides occurs with both [+UnicodeMode] and [\~UnicodeMode] guards it is to control the disambiguation priority.

    @@ -46279,7 +46279,7 @@

    Static Semantics: Early Errors

    NonemptyClassRanges :: ClassAtom `-` ClassAtom ClassRanges
    • - It is a Syntax Error if IsCharacterClass of the first |ClassAtom| is *true* or IsCharacterClass of the second |ClassAtom| is *true* and this production has a [U] parameter. + It is a Syntax Error if IsCharacterClass of the first |ClassAtom| is *true* or IsCharacterClass of the second |ClassAtom| is *true* and this production has a [UnicodeMode] parameter.
    • It is a Syntax Error if IsCharacterClass of the first |ClassAtom| is *false* and IsCharacterClass of the second |ClassAtom| is *false* and the CharacterValue of the first |ClassAtom| is larger than the CharacterValue of the second |ClassAtom|. @@ -46288,7 +46288,7 @@

      Static Semantics: Early Errors

      NonemptyClassRangesNoDash :: ClassAtomNoDash `-` ClassAtom ClassRanges
      • - It is a Syntax Error if IsCharacterClass of |ClassAtomNoDash| is *true* or IsCharacterClass of |ClassAtom| is *true* and this production has a [U] parameter. + It is a Syntax Error if IsCharacterClass of |ClassAtomNoDash| is *true* or IsCharacterClass of |ClassAtom| is *true* and this production has a [UnicodeMode] parameter.
      • It is a Syntax Error if IsCharacterClass of |ClassAtomNoDash| is *false* and IsCharacterClass of |ClassAtom| is *false* and the CharacterValue of |ClassAtomNoDash| is larger than the CharacterValue of |ClassAtom|.