diff --git a/spec.html b/spec.html index ea3cb4f174..c578e778f8 100644 --- a/spec.html +++ b/spec.html @@ -520,8 +520,8 @@

Context-Free Grammars

The Lexical and RegExp Grammars

A lexical grammar for ECMAScript is given in clause . This grammar has as its terminal symbols Unicode code points that conform to the rules for |SourceCharacter| defined in . It defines a set of productions, starting from the goal symbol |InputElementDiv|, |InputElementTemplateTail|, or |InputElementRegExp|, or |InputElementRegExpOrTemplateTail|, that describe how sequences of such code points are translated into a sequence of input elements.

-

Input elements other than white space and comments form the terminal symbols for the syntactic grammar for ECMAScript and are called ECMAScript tokens. These tokens are the reserved words, identifiers, literals, and punctuators of the ECMAScript language. Moreover, line terminators, although not considered to be tokens, also become part of the stream of input elements and guide the process of automatic semicolon insertion (). Simple white space and single-line comments are discarded and do not appear in the stream of input elements for the syntactic grammar. A |MultiLineComment| (that is, a comment of the form `/*`…`*/` regardless of whether it spans more than one line) is likewise simply discarded if it contains no line terminator; but if a |MultiLineComment| contains one or more line terminators, then it is replaced by a single line terminator, which becomes part of the stream of input elements for the syntactic grammar.

-

A RegExp grammar for ECMAScript is given in . This grammar also has as its terminal symbols the code points as defined by |SourceCharacter|. It defines a set of productions, starting from the goal symbol |Pattern|, that describe how sequences of code points are translated into regular expression patterns.

+

Input elements other than white space and comments form the terminal symbols for the syntactic grammar for ECMAScript and are called ECMAScript tokens. These tokens are the reserved words, identifiers, literals, and punctuators of the ECMAScript language. Moreover, line terminators, although not considered to be tokens, also become part of the stream of input elements and guide the process of automatic semicolon insertion (). Simple white space and single-line comments are discarded and do not appear in the stream of input elements for the syntactic grammar. A |MultiLineComment| (that is, a comment of the form `/*`…`*/` that spans more than one line) is replaced by a single line terminator, which becomes part of the stream of input elements for the syntactic grammar.

+

A RegExp grammar for ECMAScript is given in . This grammar also has as its terminal symbols the code points as defined by |SourceCharacter|. It defines a set of productions, starting from the goal symbol |Pattern|, that describe how sequences of code points are translated into regular expression patterns.

Productions of the lexical and RegExp grammars are distinguished by having two colons “::” as separating punctuation. The lexical and RegExp grammars share some productions.

@@ -16018,7 +16018,7 @@

Syntax

Line Terminators

Like white space code points, line terminator code points are used to improve source text readability and to separate tokens (indivisible lexical units) from each other. However, unlike white space code points, line terminators have some influence over the behaviour of the syntactic grammar. In general, line terminators may occur between any two tokens, but there are a few places where they are forbidden by the syntactic grammar. Line terminators also affect the process of automatic semicolon insertion (). A line terminator cannot occur within any token except a |StringLiteral|, |Template|, or |TemplateSubstitutionTail|. <LF> and <CR> line terminators cannot occur within a |StringLiteral| token except as part of a |LineContinuation|.

-

A line terminator can occur within a |MultiLineComment| but cannot occur within a |SingleLineComment|.

+

A line terminator must occur within a |MultiLineComment| but cannot occur within a |SingleLineDelimitedComment| or a |SingleLineComment|.

Line terminators are included in the set of white space code points that are matched by the `\\s` class in regular expressions.

The ECMAScript line terminator code points are listed in .

@@ -16104,15 +16104,21 @@

Syntax

Comments

Comments can be either single or multi-line. Multi-line comments cannot nest.

Because a single-line comment can contain any Unicode code point except a |LineTerminator| code point, and because of the general rule that a token is always as long as possible, a single-line comment always consists of all code points from the `//` marker to the end of the line. However, the |LineTerminator| at the end of the line is not considered to be part of the single-line comment; it is recognized separately by the lexical grammar and becomes part of the stream of input elements for the syntactic grammar. This point is very important, because it implies that the presence or absence of single-line comments does not affect the process of automatic semicolon insertion (see ).

-

Comments behave like white space and are discarded except that, if a |MultiLineComment| contains a line terminator code point, then the entire comment is considered to be a |LineTerminator| for purposes of parsing by the syntactic grammar.

+

Comments behave like white space and are discarded except that a |MultiLineComment| or a |SingleLineHTMLCloseComment| is considered to be a |LineTerminator| for purposes of parsing by the syntactic grammar.

Syntax

Comment :: MultiLineComment SingleLineComment + SingleLineHTMLOpenComment + SingleLineHTMLCloseComment + SingleLineDelimitedComment MultiLineComment :: - `/*` MultiLineCommentChars? `*/` + `/*` FirstCommentLine? LineTerminator MultiLineCommentChars? `*/` HTMLCloseComment? + + FirstCommentLine :: + SingleLineDelimitedCommentChars MultiLineCommentChars :: MultiLineNotAsteriskChar MultiLineCommentChars? @@ -16131,13 +16137,59 @@

Syntax

SingleLineComment :: `//` SingleLineCommentChars? + SingleLineHTMLOpenComment :: + `<!--` SingleLineCommentChars? + + SingleLineHTMLCloseComment :: + LineTerminatorSequence HTMLCloseComment + + HTMLCloseComment :: + WhiteSpaceSequence? SingleLineDelimitedCommentSequence? `-->` SingleLineCommentChars? + + SingleLineDelimitedCommentSequence :: + SingleLineDelimitedComment WhiteSpaceSequence? SingleLineDelimitedCommentSequence? + + WhiteSpaceSequence :: + WhiteSpace WhiteSpaceSequence? + SingleLineCommentChars :: SingleLineCommentChar SingleLineCommentChars? SingleLineCommentChar :: SourceCharacter but not LineTerminator + + SingleLineDelimitedComment :: + `/*` SingleLineDelimitedCommentChars? `*/` + + SingleLineDelimitedCommentChars :: + SingleLineNotAsteriskChar SingleLineDelimitedCommentChars? + `*` SingleLinePostAsteriskCommentChars? + + SingleLineNotAsteriskChar :: + SourceCharacter but not one of `*` or LineTerminator + + SingleLinePostAsteriskCommentChars :: + SingleLineNotForwardSlashOrAsteriskChar SingleLineDelimitedCommentChars? + `*` SingleLinePostAsteriskCommentChars? + + SingleLineNotForwardSlashOrAsteriskChar :: + SourceCharacter but not one of `/` or `*` or LineTerminator
-

A number of productions in this section are given alternative definitions in section

+ + +

Static Semantics: Early Errors

+ + SingleLineHTMLOpenComment :: + `<!--` SingleLineCommentChars? + + HTMLCloseComment :: + WhiteSpaceSequence? SingleLineDelimitedCommentSequence? `-->` SingleLineCommentChars? + +
    +
  • It is a Syntax Error if a |Module| contains the source code matching this production.
  • +
+ In a |Script|, this syntax is allowed, but deprecated. +
@@ -17059,8 +17111,8 @@

Regular Expression Literals

A regular expression literal is an input element that is converted to a RegExp object (see ) each time the literal is evaluated. Two regular expression literals in a program evaluate to regular expression objects that never compare as `===` to each other even if the two literals' contents are identical. A RegExp object may also be created at runtime by `new RegExp` or calling the RegExp constructor as a function (see ).

-

The productions below describe the syntax for a regular expression literal and are used by the input element scanner to find the end of the regular expression literal. The source text comprising the |RegularExpressionBody| and the |RegularExpressionFlags| are subsequently parsed again using the more stringent ECMAScript Regular Expression grammar ().

-

An implementation may extend the ECMAScript Regular Expression grammar defined in , but it must not extend the |RegularExpressionBody| and |RegularExpressionFlags| productions defined below or the productions used by these productions.

+

The productions below describe the syntax for a regular expression literal and are used by the input element scanner to find the end of the regular expression literal. The source text comprising the |RegularExpressionBody| and the |RegularExpressionFlags| are subsequently parsed again using the more stringent ECMAScript Regular Expression grammar ().

+

An implementation may extend the ECMAScript Regular Expression grammar defined in , but it must not extend the |RegularExpressionBody| and |RegularExpressionFlags| productions defined below or the productions used by these productions.

Syntax

RegularExpressionLiteral :: @@ -28284,7 +28336,7 @@

Forbidden Extensions

The behaviour of built-in methods which are specified in ECMA-402, such as those named `toLocaleString`, must not be extended except as specified in ECMA-402.
  • - The RegExp pattern grammars in and must not be extended to recognize any of the source characters A-Z or a-z as |IdentityEscape[+UnicodeMode]| when the [UnicodeMode] grammar parameter is present. + The RegExp pattern grammars in must not be extended to recognize any of the source characters A-Z or a-z as |IdentityEscape| when the [UnicodeMode] grammar parameter is present.
  • The Syntactic Grammar must not be extended in any manner that allows the token `:` to immediately follow source text that matches the |BindingIdentifier| nonterminal symbol. @@ -28298,9 +28350,6 @@

    Forbidden Extensions

  • When processing strict mode code, the extensions defined in , , , and must not be supported.
  • -
  • - When parsing for the |Module| goal symbol, the lexical grammar extensions defined in must not be supported. -
  • |ImportCall| must not be extended. @@ -34191,10 +34240,11 @@

    RegExp (Regular Expression) Objects

    The form and functionality of regular expressions is modelled after the regular expression facility in the Perl 5 programming language.

    - -

    Patterns

    -

    The RegExp constructor applies the following grammar to the input pattern String. An error occurs if the grammar cannot interpret the String as an expansion of |Pattern|.

    -

    Syntax

    + +

    Syntax for Patterns

    +

    The `RegExp` constructor applies the following grammar to the input pattern String. An error occurs if the grammar cannot interpret the String as an expansion of |Pattern|.

    +

    Some of these productions (indicated by “::!”) introduce ambiguities that are broken by the ordering of alternatives. When parsing using such productions, each alternative is considered only if previous alternatives do not match.

    +

    Patterns

    Pattern[UnicodeMode, N] :: Disjunction[?UnicodeMode, ?N] @@ -34207,21 +34257,25 @@

    Syntax

    [empty] Alternative[?UnicodeMode, ?N] Term[?UnicodeMode, ?N] - Term[UnicodeMode, N] :: + Term[UnicodeMode, N] ::! + [~UnicodeMode] QuantifiableAssertion[~UnicodeMode, ?N] Quantifier Assertion[?UnicodeMode, ?N] - Atom[?UnicodeMode, ?N] Atom[?UnicodeMode, ?N] Quantifier + Atom[?UnicodeMode, ?N] Assertion[UnicodeMode, N] :: `^` `$` `\` `b` `\` `B` - `(` `?` `=` Disjunction[?UnicodeMode, ?N] `)` - `(` `?` `!` Disjunction[?UnicodeMode, ?N] `)` + QuantifiableAssertion[?UnicodeMode, ?N] `(` `?` `<=` Disjunction[?UnicodeMode, ?N] `)` `(` `?` `<!` Disjunction[?UnicodeMode, ?N] `)` + QuantifiableAssertion[UnicodeMode, N] :: + `(` `?` `=` Disjunction[?UnicodeMode, ?N] `)` + `(` `?` `!` Disjunction[?UnicodeMode, ?N] `)` + Quantifier :: QuantifierPrefix QuantifierPrefix `?` @@ -34234,44 +34288,34 @@

    Syntax

    `{` DecimalDigits[~Sep] `,` `}` `{` DecimalDigits[~Sep] `,` DecimalDigits[~Sep] `}` - Atom[UnicodeMode, N] :: - PatternCharacter + Atom[UnicodeMode, N] ::! `.` `\` AtomEscape[?UnicodeMode, ?N] - CharacterClass[?UnicodeMode] + [~UnicodeMode] `\` [lookahead == `c`] + CharacterClass[?UnicodeMode, ?N] `(` GroupSpecifier[?UnicodeMode] Disjunction[?UnicodeMode, ?N] `)` `(` `?` `:` Disjunction[?UnicodeMode, ?N] `)` + [~UnicodeMode] InvalidBracedQuantifier + PatternCharacter[?UnicodeMode] - SyntaxCharacter :: one of - `^` `$` `\` `.` `*` `+` `?` `(` `)` `[` `]` `{` `}` `|` - - PatternCharacter :: - SourceCharacter but not SyntaxCharacter - - AtomEscape[UnicodeMode, N] :: - DecimalEscape - CharacterClassEscape[?UnicodeMode] - CharacterEscape[?UnicodeMode] - [+N] `k` GroupName[?UnicodeMode] - - CharacterEscape[UnicodeMode] :: - ControlEscape - `c` ControlLetter - `0` [lookahead ∉ DecimalDigit] - HexEscapeSequence - RegExpUnicodeEscapeSequence[?UnicodeMode] - IdentityEscape[?UnicodeMode] + InvalidBracedQuantifier :: + `{` DecimalDigits[~Sep] `}` + `{` DecimalDigits[~Sep] `,` `}` + `{` DecimalDigits[~Sep] `,` DecimalDigits[~Sep] `}` - ControlEscape :: one of - `f` `n` `r` `t` `v` + PatternCharacter[UnicodeMode] :: + [+UnicodeMode] SourceCharacter but not SyntaxCharacter + [~UnicodeMode] SourceCharacter but not one of `^` `$` `\` `.` `*` `+` `?` `(` `)` `[` `|` - ControlLetter :: one of - `a` `b` `c` `d` `e` `f` `g` `h` `i` `j` `k` `l` `m` `n` `o` `p` `q` `r` `s` `t` `u` `v` `w` `x` `y` `z` - `A` `B` `C` `D` `E` `F` `G` `H` `I` `J` `K` `L` `M` `N` `O` `P` `Q` `R` `S` `T` `U` `V` `W` `X` `Y` `Z` + SyntaxCharacter :: one of + `^` `$` `\` `.` `*` `+` `?` `(` `)` `[` `]` `{` `}` `|` +
    +

    Group Specifiers

    + GroupSpecifier[UnicodeMode] :: [empty] - `?` GroupName[?UnicodeMode] + [+UnicodeMode] `?` GroupName[?UnicodeMode] GroupName[UnicodeMode] :: `<` RegExpIdentifierName[?UnicodeMode] `>` @@ -34290,35 +34334,62 @@

    Syntax

    `\` RegExpUnicodeEscapeSequence[+UnicodeMode] [~UnicodeMode] UnicodeLeadSurrogate UnicodeTrailSurrogate - RegExpUnicodeEscapeSequence[UnicodeMode] :: - [+UnicodeMode] `u` HexLeadSurrogate `\u` HexTrailSurrogate - [+UnicodeMode] `u` HexLeadSurrogate - [+UnicodeMode] `u` HexTrailSurrogate - [+UnicodeMode] `u` HexNonSurrogate - [~UnicodeMode] `u` Hex4Digits - [+UnicodeMode] `u{` CodePoint `}` - UnicodeLeadSurrogate :: > any Unicode code point in the inclusive range 0xD800 to 0xDBFF UnicodeTrailSurrogate :: > any Unicode code point in the inclusive range 0xDC00 to 0xDFFF
    -

    Each `\\u` |HexTrailSurrogate| for which the choice of associated `u` |HexLeadSurrogate| is ambiguous shall be associated with the nearest possible `u` |HexLeadSurrogate| that would otherwise have no corresponding `\\u` |HexTrailSurrogate|.

    + +

    Character Classes

    - HexLeadSurrogate :: - Hex4Digits [> but only if the MV of |Hex4Digits| is in the inclusive range 0xD800 to 0xDBFF] + CharacterClass[UnicodeMode, N] :: + `[` [lookahead != `^`] ClassRanges[?UnicodeMode, ?N] `]` + `[` `^` ClassRanges[?UnicodeMode, ?N] `]` - HexTrailSurrogate :: - Hex4Digits [> but only if the MV of |Hex4Digits| is in the inclusive range 0xDC00 to 0xDFFF] + ClassRanges[UnicodeMode, N] :: + [empty] + NonemptyClassRanges[?UnicodeMode, ?N] - HexNonSurrogate :: - Hex4Digits [> but only if the MV of |Hex4Digits| is not in the inclusive range 0xD800 to 0xDFFF] + NonemptyClassRanges[UnicodeMode, N] :: + ClassAtom[?UnicodeMode, ?N] + ClassAtom[?UnicodeMode, ?N] NonemptyClassRangesNoDash[?UnicodeMode, ?N] + ClassAtom[?UnicodeMode, ?N] `-` ClassAtom[?UnicodeMode, ?N] ClassRanges[?UnicodeMode, ?N] - IdentityEscape[UnicodeMode] :: - [+UnicodeMode] SyntaxCharacter - [+UnicodeMode] `/` - [~UnicodeMode] SourceCharacter but not UnicodeIDContinue + NonemptyClassRangesNoDash[UnicodeMode, N] :: + ClassAtom[?UnicodeMode, ?N] + ClassAtomNoDash[?UnicodeMode, ?N] NonemptyClassRangesNoDash[?UnicodeMode, ?N] + ClassAtomNoDash[?UnicodeMode, ?N] `-` ClassAtom[?UnicodeMode, ?N] ClassRanges[?UnicodeMode, ?N] + + ClassAtom[UnicodeMode, N] :: + `-` + ClassAtomNoDash[?UnicodeMode, ?N] + + ClassAtomNoDash[UnicodeMode, N] ::! + SourceCharacter but not one of `\` or `]` or `-` + `\` ClassEscape[?UnicodeMode, ?N] + `\` [lookahead == `c`] + + +

    Escapes

    + + ClassEscape[UnicodeMode, N] ::! + `b` + [+UnicodeMode] `-` + [~UnicodeMode] `c` ClassControlLetter + CharacterClassEscape[?UnicodeMode] + CharacterEscape[?UnicodeMode, ?N] + + ClassControlLetter :: + DecimalDigit + `_` + + AtomEscape[UnicodeMode, N] ::! + [+UnicodeMode] DecimalEscape + [~UnicodeMode] DecimalEscape [> but only if the CapturingGroupNumber of |DecimalEscape| is ≤ _NcapturingParens_] + CharacterClassEscape[?UnicodeMode] + CharacterEscape[?UnicodeMode, ?N] + [+N] `k` GroupName[?UnicodeMode] DecimalEscape :: NonZeroDigit DecimalDigits[~Sep]? [lookahead ∉ DecimalDigit] @@ -34360,48 +34431,71 @@

    Syntax

    ControlLetter `_` - CharacterClass[UnicodeMode] :: - `[` [lookahead != `^`] ClassRanges[?UnicodeMode] `]` - `[` `^` ClassRanges[?UnicodeMode] `]` + CharacterEscape[UnicodeMode, N] ::! + ControlEscape + `c` ControlLetter + `0` [lookahead ∉ DecimalDigit] + HexEscapeSequence + RegExpUnicodeEscapeSequence[?UnicodeMode] + [~UnicodeMode] LegacyOctalEscapeSequence + IdentityEscape[?UnicodeMode, ?N] - ClassRanges[UnicodeMode] :: - [empty] - NonemptyClassRanges[?UnicodeMode] + ControlEscape :: one of + `f` `n` `r` `t` `v` - NonemptyClassRanges[UnicodeMode] :: - ClassAtom[?UnicodeMode] - ClassAtom[?UnicodeMode] NonemptyClassRangesNoDash[?UnicodeMode] - ClassAtom[?UnicodeMode] `-` ClassAtom[?UnicodeMode] ClassRanges[?UnicodeMode] + ControlLetter :: one of + `a` `b` `c` `d` `e` `f` `g` `h` `i` `j` `k` `l` `m` `n` `o` `p` `q` `r` `s` `t` `u` `v` `w` `x` `y` `z` + `A` `B` `C` `D` `E` `F` `G` `H` `I` `J` `K` `L` `M` `N` `O` `P` `Q` `R` `S` `T` `U` `V` `W` `X` `Y` `Z` + + RegExpUnicodeEscapeSequence[UnicodeMode] :: + [+UnicodeMode] `u` HexLeadSurrogate `\u` HexTrailSurrogate + [+UnicodeMode] `u` HexLeadSurrogate + [+UnicodeMode] `u` HexTrailSurrogate + [+UnicodeMode] `u` HexNonSurrogate + [~UnicodeMode] `u` Hex4Digits + [+UnicodeMode] `u{` CodePoint `}` +
    +

    Each `\\u` |HexTrailSurrogate| for which the choice of associated `u` |HexLeadSurrogate| is ambiguous shall be associated with the nearest possible `u` |HexLeadSurrogate| that would otherwise have no corresponding `\\u` |HexTrailSurrogate|.

    + + HexLeadSurrogate :: + Hex4Digits [> but only if the MV of |Hex4Digits| is in the inclusive range 0xD800 to 0xDBFF] - NonemptyClassRangesNoDash[UnicodeMode] :: - ClassAtom[?UnicodeMode] - ClassAtomNoDash[?UnicodeMode] NonemptyClassRangesNoDash[?UnicodeMode] - ClassAtomNoDash[?UnicodeMode] `-` ClassAtom[?UnicodeMode] ClassRanges[?UnicodeMode] + HexTrailSurrogate :: + Hex4Digits [> but only if the MV of |Hex4Digits| is in the inclusive range 0xDC00 to 0xDFFF] - ClassAtom[UnicodeMode] :: - `-` - ClassAtomNoDash[?UnicodeMode] + HexNonSurrogate :: + Hex4Digits [> but only if the MV of |Hex4Digits| is not in the inclusive range 0xD800 to 0xDFFF] - ClassAtomNoDash[UnicodeMode] :: - SourceCharacter but not one of `\` or `]` or `-` - `\` ClassEscape[?UnicodeMode] + IdentityEscape[UnicodeMode, N] :: + [+UnicodeMode] SyntaxCharacter + [+UnicodeMode] `/` + [~UnicodeMode] SourceCharacterIdentityEscape[?N] - ClassEscape[UnicodeMode] :: - `b` - [+UnicodeMode] `-` - CharacterClassEscape[?UnicodeMode] - CharacterEscape[?UnicodeMode] + SourceCharacterIdentityEscape[N] :: + [~N] SourceCharacter but not `c` + [+N] SourceCharacter but not one of `c` or `k` - -

    A number of productions in this section are given alternative definitions in section .

    +

    Patterns that use the following productions are allowed, but deprecated:

    + + Term ::! QuantifiableAssertion Quantifier + + Atom ::! `\` [lookahead == `c`] + + ClassAtomNoDash ::! `\` [lookahead == `c`] + + ClassEscape ::! `c` ClassControlLetter + + CharacterEscape ::! LegacyOctalEscapeSequence +
    +
    + + +

    Static Semantics for Patterns

    - +

    Static Semantics: Early Errors

    - -

    This section is amended in .

    -
    Pattern :: Disjunction
    • @@ -34417,58 +34511,64 @@

      Static Semantics: Early Errors

      It is a Syntax Error if the MV of the first |DecimalDigits| is larger than the MV of the second |DecimalDigits|.
    - AtomEscape :: `k` GroupName + Atom ::! InvalidBracedQuantifier
    • - It is a Syntax Error if the enclosing |Pattern| does not contain a |GroupSpecifier| with an enclosed |RegExpIdentifierName| whose CapturingGroupName equals the CapturingGroupName of the |RegExpIdentifierName| of this production's |GroupName|. + It is a Syntax Error if any source text matches this rule.
    - AtomEscape :: DecimalEscape + RegExpIdentifierStart :: `\` RegExpUnicodeEscapeSequence
    • - It is a Syntax Error if the CapturingGroupNumber of |DecimalEscape| is larger than _NcapturingParens_ (). + It is a Syntax Error if the CharacterValue of |RegExpUnicodeEscapeSequence| is not the code point value of some code point matched by the |IdentifierStartChar| lexical grammar production.
    - NonemptyClassRanges :: ClassAtom `-` ClassAtom ClassRanges + RegExpIdentifierPart :: `\` RegExpUnicodeEscapeSequence
    • - It is a Syntax Error if IsCharacterClass of the first |ClassAtom| is *true* or IsCharacterClass of the second |ClassAtom| is *true*. -
    • -
    • - It is a Syntax Error if IsCharacterClass of the first |ClassAtom| is *false* and IsCharacterClass of the second |ClassAtom| is *false* and the CharacterValue of the first |ClassAtom| is larger than the CharacterValue of the second |ClassAtom|. + It is a Syntax Error if the CharacterValue of |RegExpUnicodeEscapeSequence| is not the code point value of some code point matched by the |IdentifierPartChar| lexical grammar production.
    - NonemptyClassRangesNoDash :: ClassAtomNoDash `-` ClassAtom ClassRanges + RegExpIdentifierStart :: UnicodeLeadSurrogate UnicodeTrailSurrogate
    • - It is a Syntax Error if IsCharacterClass of |ClassAtomNoDash| is *true* or IsCharacterClass of |ClassAtom| is *true*. + It is a Syntax Error if RegExpIdentifierCodePoint of |RegExpIdentifierStart| is not matched by the |UnicodeIDStart| lexical grammar production.
    • +
    + RegExpIdentifierPart :: UnicodeLeadSurrogate UnicodeTrailSurrogate +
    • - It is a Syntax Error if IsCharacterClass of |ClassAtomNoDash| is *false* and IsCharacterClass of |ClassAtom| is *false* and the CharacterValue of |ClassAtomNoDash| is larger than the CharacterValue of |ClassAtom|. + It is a Syntax Error if RegExpIdentifierCodePoint of |RegExpIdentifierPart| is not matched by the |UnicodeIDContinue| lexical grammar production.
    - RegExpIdentifierStart :: `\` RegExpUnicodeEscapeSequence + NonemptyClassRanges :: ClassAtom `-` ClassAtom ClassRanges
    • - It is a Syntax Error if the CharacterValue of |RegExpUnicodeEscapeSequence| is not the code point value of some code point matched by the |IdentifierStartChar| lexical grammar production. + It is a Syntax Error if IsCharacterClass of the first |ClassAtom| is *true* or IsCharacterClass of the second |ClassAtom| is *true* and this production has a [UnicodeMode] parameter. +
    • +
    • + It is a Syntax Error if IsCharacterClass of the first |ClassAtom| is *false* and IsCharacterClass of the second |ClassAtom| is *false* and the CharacterValue of the first |ClassAtom| is larger than the CharacterValue of the second |ClassAtom|.
    - RegExpIdentifierStart :: UnicodeLeadSurrogate UnicodeTrailSurrogate + NonemptyClassRangesNoDash :: ClassAtomNoDash `-` ClassAtom ClassRanges
    • - It is a Syntax Error if RegExpIdentifierCodePoint of |RegExpIdentifierStart| is not matched by the |UnicodeIDStart| lexical grammar production. + It is a Syntax Error if IsCharacterClass of |ClassAtomNoDash| is *true* or IsCharacterClass of |ClassAtom| is *true* and this production has a [UnicodeMode] parameter. +
    • +
    • + It is a Syntax Error if IsCharacterClass of |ClassAtomNoDash| is *false* and IsCharacterClass of |ClassAtom| is *false* and the CharacterValue of |ClassAtomNoDash| is larger than the CharacterValue of |ClassAtom|.
    - RegExpIdentifierPart :: `\` RegExpUnicodeEscapeSequence + AtomEscape ::! DecimalEscape
    • - It is a Syntax Error if the CharacterValue of |RegExpUnicodeEscapeSequence| is not the code point value of some code point matched by the |IdentifierPartChar| lexical grammar production. + It is a Syntax Error if the CapturingGroupNumber of |DecimalEscape| is larger than _NcapturingParens_ ().
    - RegExpIdentifierPart :: UnicodeLeadSurrogate UnicodeTrailSurrogate + AtomEscape ::! `k` GroupName
    • - It is a Syntax Error if RegExpIdentifierCodePoint of |RegExpIdentifierPart| is not matched by the |UnicodeIDContinue| lexical grammar production. + It is a Syntax Error if the enclosing |Pattern| does not contain a |GroupSpecifier| with an enclosed |RegExpIdentifierName| whose CapturingGroupName equals the CapturingGroupName of the |RegExpIdentifierName| of this production's |GroupName|.
    UnicodePropertyValueExpression :: UnicodePropertyName `=` UnicodePropertyValue @@ -34492,9 +34592,6 @@

    Static Semantics: Early Errors

    Static Semantics: CapturingGroupNumber

    - -

    This section is amended in .

    -
    DecimalEscape :: NonZeroDigit 1. Return the MV of |NonZeroDigit|. @@ -34507,40 +34604,36 @@

    Static Semantics: CapturingGroupNumber

    The definitions of “the MV of |NonZeroDigit|” and “the MV of |DecimalDigits|” are in .

    - +

    Static Semantics: IsCharacterClass

    - -

    This section is amended in .

    -
    ClassAtom :: `-` - ClassAtomNoDash :: SourceCharacter but not one of `\` or `]` or `-` + ClassAtomNoDash ::! SourceCharacter but not one of `\` or `]` or `-` + + ClassAtomNoDash ::! `\` [lookahead == `c`] - ClassEscape :: `b` + ClassEscape ::! `b` - ClassEscape :: `-` + ClassEscape ::! `-` - ClassEscape :: CharacterEscape + ClassEscape ::! CharacterEscape 1. Return *false*. - ClassEscape :: CharacterClassEscape + ClassEscape ::! CharacterClassEscape 1. Return *true*.
    - +

    Static Semantics: CharacterValue

    - -

    This section is amended in .

    -
    ClassAtom :: `-` @@ -34548,25 +34641,37 @@

    Static Semantics: CharacterValue

    1. Return the code point value of U+002D (HYPHEN-MINUS). - ClassAtomNoDash :: SourceCharacter but not one of `\` or `]` or `-` + ClassAtomNoDash ::! SourceCharacter but not one of `\` or `]` or `-` 1. Let _ch_ be the code point matched by |SourceCharacter|. 1. Return the code point value of _ch_. - ClassEscape :: `b` + ClassAtomNoDash ::! `\` [lookahead == `c`] + + + 1. Return the code point value of U+005C (REVERSE SOLIDUS). + + + ClassEscape ::! `b` 1. Return the code point value of U+0008 (BACKSPACE). - ClassEscape :: `-` + ClassEscape ::! `-` 1. Return the code point value of U+002D (HYPHEN-MINUS). - CharacterEscape :: ControlEscape + ClassEscape ::! `c` ClassControlLetter + + 1. Let _ch_ be the code point matched by |ClassControlLetter|. + 1. Let _i_ be _ch_'s code point value. + 1. Return the remainder of dividing _i_ by 32. + + CharacterEscape ::! ControlEscape 1. Return the code point value according to . @@ -34678,23 +34783,27 @@

    Static Semantics: CharacterValue

    - CharacterEscape :: `c` ControlLetter + CharacterEscape ::! `c` ControlLetter 1. Let _ch_ be the code point matched by |ControlLetter|. 1. Let _i_ be _ch_'s code point value. 1. Return the remainder of dividing _i_ by 32. - CharacterEscape :: `0` [lookahead ∉ DecimalDigit] + CharacterEscape ::! `0` [lookahead ∉ DecimalDigit] 1. Return the code point value of U+0000 (NULL).

    `\\0` represents the <NUL> character and cannot be followed by a decimal digit.

    - CharacterEscape :: HexEscapeSequence + CharacterEscape ::! HexEscapeSequence 1. Return the MV of |HexEscapeSequence|. + CharacterEscape ::! LegacyOctalEscapeSequence + + 1. Return the MV of |LegacyOctalEscapeSequence| (see ). + RegExpUnicodeEscapeSequence :: `u` HexLeadSurrogate `\u` HexTrailSurrogate 1. Let _lead_ be the CharacterValue of |HexLeadSurrogate|. @@ -34720,7 +34829,7 @@

    Static Semantics: CharacterValue

    1. Return the MV of |HexDigits|. - CharacterEscape :: IdentityEscape + CharacterEscape ::! IdentityEscape 1. Let _ch_ be the code point matched by |IdentityEscape|. 1. Return the code point value of _ch_. @@ -34806,11 +34915,8 @@

    Static Semantics: RegExpIdentifierCodePoint

    - -

    Pattern Semantics

    - -

    This section is amended in .

    -
    + +

    Runtime Semantics for Patterns

    A regular expression pattern is converted into an Abstract Closure using the process described below. An implementation is encouraged to use more efficient algorithms than the ones listed below, as long as the results are the same. The Abstract Closure is used as the value of a RegExp object's [[RegExpMatcher]] internal slot.

    A |Pattern| is either a BMP pattern or a Unicode pattern depending upon whether or not its associated flags contain a `u`. A BMP pattern matches against a String interpreted as consisting of a sequence of 16-bit values that are Unicode code points in the range of the Basic Multilingual Plane. A Unicode pattern matches against a String interpreted as consisting of Unicode code points encoded using UTF-16. In the context of describing the behaviour of a BMP pattern “character” means a single 16-bit Unicode BMP code point. In the context of describing the behaviour of a Unicode pattern “character” means a UTF-16 encoded code point (). In either context, “character value” means the numeric value of the corresponding non-encoded code point.

    The syntax and semantics of |Pattern| is defined as if the source code for the |Pattern| was a List of |SourceCharacter| values where each |SourceCharacter| corresponds to a Unicode code point. If a BMP pattern contains a non-BMP |SourceCharacter| the entire pattern is encoded using UTF-16 and the individual code units of that encoding are used as the elements of the List.

    @@ -34831,7 +34937,7 @@

    Notation

    _InputLength_ is the number of characters in _Input_.
  • - _NcapturingParens_ is the total number of left-capturing parentheses (i.e. the total number of Atom :: `(` GroupSpecifier Disjunction `)` Parse Nodes) in the pattern. A left-capturing parenthesis is any `(` pattern character that is matched by the `(` terminal of the Atom :: `(` GroupSpecifier Disjunction `)` production. + _NcapturingParens_ is the total number of left-capturing parentheses (i.e. the total number of Atom ::! `(` GroupSpecifier Disjunction `)` Parse Nodes) in the pattern. A left-capturing parenthesis is any `(` pattern character that is matched by the `(` terminal of the Atom ::! `(` GroupSpecifier Disjunction `)` production.
  • _DotAll_ is *true* if the RegExp object's [[OriginalFlags]] internal slot contains *"s"* and otherwise is *false*. @@ -34877,8 +34983,8 @@

    Pattern

    1. Return a new Abstract Closure with parameters (_str_, _index_) that captures _m_ and performs the following steps when called: 1. Assert: Type(_str_) is String. 1. Assert: _index_ is a non-negative integer which is ≤ the length of _str_. - 1. If _Unicode_ is *true*, let _Input_ be ! StringToCodePoints(_str_). Otherwise, let _Input_ be a List whose elements are the code units that are the elements of _str_. _Input_ will be used throughout the algorithms in . Each element of _Input_ is considered to be a character. - 1. Let _InputLength_ be the number of characters contained in _Input_. This alias will be used throughout the algorithms in . + 1. If _Unicode_ is *true*, let _Input_ be ! StringToCodePoints(_str_). Otherwise, let _Input_ be a List whose elements are the code units that are the elements of _str_. _Input_ will be used throughout the algorithms in . Each element of _Input_ is considered to be a character. + 1. Let _InputLength_ be the number of characters contained in _Input_. This alias will be used throughout the algorithms in . 1. Let _listIndex_ be the index into _Input_ of the character that was obtained from element _index_ of _str_. 1. Let _c_ be a new Continuation with parameters (_y_) that captures nothing and performs the following steps when called: 1. Assert: _y_ is a State. @@ -34888,7 +34994,7 @@

    Pattern

    1. Return _m_(_x_, _c_). -

    A Pattern evaluates (“compiles”) to an Abstract Closure value. RegExpBuiltinExec can then apply this procedure to a String and an offset within the String to determine whether the pattern would match starting at exactly that offset within the String, and, if it does match, what the values of the capturing parentheses would be. The algorithms in are designed so that compiling a pattern may throw a *SyntaxError* exception; on the other hand, once the pattern is successfully compiled, applying the resulting Abstract Closure to find a match in a String cannot throw an exception (except for any implementation-defined exceptions that can occur anywhere such as out-of-memory).

    +

    A Pattern evaluates (“compiles”) to an Abstract Closure value. RegExpBuiltinExec can then apply this procedure to a String and an offset within the String to determine whether the pattern would match starting at exactly that offset within the String, and, if it does match, what the values of the capturing parentheses would be. The algorithms in are designed so that compiling a pattern may throw a *SyntaxError* exception; on the other hand, once the pattern is successfully compiled, applying the resulting Abstract Closure to find a match in a String cannot throw an exception (except for any implementation-defined exceptions that can occur anywhere such as out-of-memory).

    @@ -34964,29 +35070,30 @@

    Alternative

    Term

    With parameter _direction_.

    -

    The production Term :: Assertion evaluates as follows:

    +

    The production Term ::! Assertion evaluates as follows:

    1. Return the Matcher that is the result of evaluating |Assertion|.

    The resulting Matcher is independent of _direction_.

    -

    The production Term :: Atom evaluates as follows:

    +

    The production Term ::! Atom evaluates as follows:

    1. Return the Matcher that is the result of evaluating |Atom| with argument _direction_. -

    The production Term :: Atom Quantifier evaluates as follows:

    +

    The production Term ::! Atom Quantifier evaluates as follows:

    1. Evaluate |Atom| with argument _direction_ to obtain a Matcher _m_. 1. Evaluate |Quantifier| to obtain the three results: a non-negative integer _min_, a non-negative integer (or +∞) _max_, and Boolean _greedy_. 1. Assert: _min_ ≤ _max_. - 1. Let _parenIndex_ be the number of left-capturing parentheses in the entire regular expression that occur to the left of this |Term|. This is the total number of Atom :: `(` GroupSpecifier Disjunction `)` Parse Nodes prior to or enclosing this |Term|. - 1. Let _parenCount_ be the number of left-capturing parentheses in |Atom|. This is the total number of Atom :: `(` GroupSpecifier Disjunction `)` Parse Nodes enclosed by |Atom|. + 1. Let _parenIndex_ be the number of left-capturing parentheses in the entire regular expression that occur to the left of this |Term|. This is the total number of Atom ::! `(` GroupSpecifier Disjunction `)` Parse Nodes prior to or enclosing this |Term|. + 1. Let _parenCount_ be the number of left-capturing parentheses in |Atom|. This is the total number of Atom ::! `(` GroupSpecifier Disjunction `)` Parse Nodes enclosed by |Atom|. 1. Return a new Matcher with parameters (_x_, _c_) that captures _m_, _min_, _max_, _greedy_, _parenIndex_, and _parenCount_ and performs the following steps when called: 1. Assert: _x_ is a State. 1. Assert: _c_ is a Continuation. 1. Return ! RepeatMatcher(_m_, _min_, _max_, _greedy_, _x_, _c_, _parenIndex_, _parenCount_). +

    The production Term ::! QuantifiableAssertion Quantifier evaluates the same as the production Term ::! Atom Quantifier but with |QuantifiableAssertion| substituted for |Atom|.

    @@ -35114,9 +35221,14 @@

    Assertion

    1. If _a_ is *true* and _b_ is *true*, or if _a_ is *false* and _b_ is *false*, return _c_(_x_). 1. Return ~failure~. -

    The production Assertion :: `(` `?` `=` Disjunction `)` evaluates as follows:

    +

    The production Assertion :: QuantifiableAssertion evaluates as follows:

    - 1. Evaluate |Disjunction| with 1 as its _direction_ argument to obtain a Matcher _m_. + 1. Evaluate |QuantifiableAssertion| to obtain a Matcher _m_. + 1. Return _m_. + +

    The production Assertion :: `(` `?` `<=` Disjunction `)` evaluates as follows:

    + + 1. Evaluate |Disjunction| with -1 as its _direction_ argument to obtain a Matcher _m_. 1. Return a new Matcher with parameters (_x_, _c_) that captures _m_ and performs the following steps when called: 1. Assert: _x_ is a State. 1. Assert: _c_ is a Continuation. @@ -35131,9 +35243,9 @@

    Assertion

    1. Let _z_ be the State (_xe_, _cap_). 1. Return _c_(_z_).
    -

    The production Assertion :: `(` `?` `!` Disjunction `)` evaluates as follows:

    +

    The production Assertion :: `(` `?` `<!` Disjunction `)` evaluates as follows:

    - 1. Evaluate |Disjunction| with 1 as its _direction_ argument to obtain a Matcher _m_. + 1. Evaluate |Disjunction| with -1 as its _direction_ argument to obtain a Matcher _m_. 1. Return a new Matcher with parameters (_x_, _c_) that captures _m_ and performs the following steps when called: 1. Assert: _x_ is a State. 1. Assert: _c_ is a Continuation. @@ -35144,9 +35256,9 @@

    Assertion

    1. If _r_ is not ~failure~, return ~failure~. 1. Return _c_(_x_).
    -

    The production Assertion :: `(` `?` `<=` Disjunction `)` evaluates as follows:

    +

    The production QuantifiableAssertion :: `(` `?` `=` Disjunction `)` evaluates as follows:

    - 1. Evaluate |Disjunction| with -1 as its _direction_ argument to obtain a Matcher _m_. + 1. Evaluate |Disjunction| with 1 as its _direction_ argument to obtain a Matcher _m_. 1. Return a new Matcher with parameters (_x_, _c_) that captures _m_ and performs the following steps when called: 1. Assert: _x_ is a State. 1. Assert: _c_ is a Continuation. @@ -35161,9 +35273,9 @@

    Assertion

    1. Let _z_ be the State (_xe_, _cap_). 1. Return _c_(_z_).
    -

    The production Assertion :: `(` `?` `<!` Disjunction `)` evaluates as follows:

    +

    The production QuantifiableAssertion :: `(` `?` `!` Disjunction `)` evaluates as follows:

    - 1. Evaluate |Disjunction| with -1 as its _direction_ argument to obtain a Matcher _m_. + 1. Evaluate |Disjunction| with 1 as its _direction_ argument to obtain a Matcher _m_. 1. Return a new Matcher with parameters (_x_, _c_) that captures _m_ and performs the following steps when called: 1. Assert: _x_ is a State. 1. Assert: _c_ is a Continuation. @@ -35236,32 +35348,31 @@

    Quantifier

    Atom

    With parameter _direction_.

    -

    The production Atom :: PatternCharacter evaluates as follows:

    - - 1. Let _ch_ be the character matched by |PatternCharacter|. - 1. Let _A_ be a one-element CharSet containing the character _ch_. - 1. Return ! CharacterSetMatcher(_A_, *false*, _direction_). - -

    The production Atom :: `.` evaluates as follows:

    +

    The production Atom ::! `.` evaluates as follows:

    1. Let _A_ be the CharSet of all characters. 1. If _DotAll_ is not *true*, then 1. Remove from _A_ all characters corresponding to a code point on the right-hand side of the |LineTerminator| production. 1. Return ! CharacterSetMatcher(_A_, *false*, _direction_). -

    The production Atom :: `\` AtomEscape evaluates as follows:

    +

    The production Atom ::! `\` AtomEscape evaluates as follows:

    1. Return the Matcher that is the result of evaluating |AtomEscape| with argument _direction_. -

    The production Atom :: CharacterClass evaluates as follows:

    +

    The production Atom ::! `\` [lookahead == `c`] evaluates as follows:

    + + 1. Let _A_ be the CharSet containing the single character `\\` U+005C (REVERSE SOLIDUS). + 1. Return ! CharacterSetMatcher(_A_, *false*, _direction_). + +

    The production Atom ::! CharacterClass evaluates as follows:

    1. Evaluate |CharacterClass| to obtain a CharSet _A_ and a Boolean _invert_. 1. Return ! CharacterSetMatcher(_A_, _invert_, _direction_). -

    The production Atom :: `(` GroupSpecifier Disjunction `)` evaluates as follows:

    +

    The production Atom ::! `(` GroupSpecifier Disjunction `)` evaluates as follows:

    1. Evaluate |Disjunction| with argument _direction_ to obtain a Matcher _m_. - 1. Let _parenIndex_ be the number of left-capturing parentheses in the entire regular expression that occur to the left of this |Atom|. This is the total number of Atom :: `(` GroupSpecifier Disjunction `)` Parse Nodes prior to or enclosing this |Atom|. + 1. Let _parenIndex_ be the number of left-capturing parentheses in the entire regular expression that occur to the left of this |Atom|. This is the total number of Atom ::! `(` GroupSpecifier Disjunction `)` Parse Nodes prior to or enclosing this |Atom|. 1. Return a new Matcher with parameters (_x_, _c_) that captures _direction_, _m_, and _parenIndex_ and performs the following steps when called: 1. Assert: _x_ is a State. 1. Assert: _c_ is a Continuation. @@ -35282,10 +35393,16 @@

    Atom

    1. Return _c_(_z_). 1. Return _m_(_x_, _d_).
    -

    The production Atom :: `(` `?` `:` Disjunction `)` evaluates as follows:

    +

    The production Atom ::! `(` `?` `:` Disjunction `)` evaluates as follows:

    1. Return the Matcher that is the result of evaluating |Disjunction| with argument _direction_. +

    The production Atom ::! PatternCharacter evaluates as follows:

    + + 1. Let _ch_ be the character matched by |PatternCharacter|. + 1. Let _A_ be a one-element CharSet containing the character _ch_. + 1. Return ! CharacterSetMatcher(_A_, *false*, _direction_). +

    @@ -35414,150 +35531,6 @@

    - -

    AtomEscape

    -

    With parameter _direction_.

    -

    The production AtomEscape :: DecimalEscape evaluates as follows:

    - - 1. Evaluate |DecimalEscape| to obtain an integer _n_. - 1. Assert: _n_ ≤ _NcapturingParens_. - 1. Return ! BackreferenceMatcher(_n_, _direction_). - -

    The production AtomEscape :: CharacterEscape evaluates as follows:

    - - 1. Evaluate |CharacterEscape| to obtain a character _ch_. - 1. Let _A_ be a one-element CharSet containing the character _ch_. - 1. Return ! CharacterSetMatcher(_A_, *false*, _direction_). - -

    The production AtomEscape :: CharacterClassEscape evaluates as follows:

    - - 1. Evaluate |CharacterClassEscape| to obtain a CharSet _A_. - 1. Return ! CharacterSetMatcher(_A_, *false*, _direction_). - - -

    An escape sequence of the form `\\` followed by a non-zero decimal number _n_ matches the result of the _n_th set of capturing parentheses (). It is an error if the regular expression has fewer than _n_ capturing parentheses. If the regular expression has _n_ or more capturing parentheses but the _n_th one is *undefined* because it has not captured anything, then the backreference always succeeds.

    -
    -

    The production AtomEscape :: `k` GroupName evaluates as follows:

    - - 1. Search the enclosing |Pattern| for an instance of a |GroupSpecifier| containing a |RegExpIdentifierName| which has a CapturingGroupName equal to the CapturingGroupName of the |RegExpIdentifierName| contained in |GroupName|. - 1. Assert: A unique such |GroupSpecifier| is found. - 1. Let _parenIndex_ be the number of left-capturing parentheses in the entire regular expression that occur to the left of the located |GroupSpecifier|. This is the total number of Atom :: `(` GroupSpecifier Disjunction `)` Parse Nodes prior to or enclosing the located |GroupSpecifier|, including its immediately enclosing |Atom|. - 1. Return ! BackreferenceMatcher(_parenIndex_, _direction_). - - - -

    - BackreferenceMatcher ( - _n_: a positive integer, - _direction_: 1 or -1, - ) -

    -
    -
    - - 1. Assert: _n_ ≥ 1. - 1. Return a new Matcher with parameters (_x_, _c_) that captures _n_ and _direction_ and performs the following steps when called: - 1. Assert: _x_ is a State. - 1. Assert: _c_ is a Continuation. - 1. Let _cap_ be _x_'s _captures_ List. - 1. Let _s_ be _cap_[_n_]. - 1. If _s_ is *undefined*, return _c_(_x_). - 1. Let _e_ be _x_'s _endIndex_. - 1. Let _len_ be the number of elements in _s_. - 1. Let _f_ be _e_ + _direction_ × _len_. - 1. If _f_ < 0 or _f_ > _InputLength_, return ~failure~. - 1. Let _g_ be min(_e_, _f_). - 1. If there exists an integer _i_ between 0 (inclusive) and _len_ (exclusive) such that Canonicalize(_s_[_i_]) is not the same character value as Canonicalize(_Input_[_g_ + _i_]), return ~failure~. - 1. Let _y_ be the State (_f_, _cap_). - 1. Return _c_(_y_). - -
    -
    - - -

    CharacterEscape

    -

    The |CharacterEscape| productions evaluate as follows:

    - - CharacterEscape :: - ControlEscape - `c` ControlLetter - `0` [lookahead ∉ DecimalDigit] - HexEscapeSequence - RegExpUnicodeEscapeSequence - IdentityEscape - - - 1. Let _cv_ be the CharacterValue of this |CharacterEscape|. - 1. Return the character whose character value is _cv_. - -
    - - -

    DecimalEscape

    -

    The |DecimalEscape| productions evaluate as follows:

    - DecimalEscape :: NonZeroDigit DecimalDigits? - - 1. Return the CapturingGroupNumber of this |DecimalEscape|. - - -

    If `\\` is followed by a decimal number _n_ whose first digit is not `0`, then the escape sequence is considered to be a backreference. It is an error if _n_ is greater than the total number of left-capturing parentheses in the entire regular expression.

    -
    -
    - - -

    CharacterClassEscape

    -

    The production CharacterClassEscape :: `d` evaluates as follows:

    - - 1. Return the ten-element CharSet containing the characters `0` through `9` inclusive. - -

    The production CharacterClassEscape :: `D` evaluates as follows:

    - - 1. Return the CharSet containing all characters not in the CharSet returned by CharacterClassEscape :: `d` . - -

    The production CharacterClassEscape :: `s` evaluates as follows:

    - - 1. Return the CharSet containing all characters corresponding to a code point on the right-hand side of the |WhiteSpace| or |LineTerminator| productions. - -

    The production CharacterClassEscape :: `S` evaluates as follows:

    - - 1. Return the CharSet containing all characters not in the CharSet returned by CharacterClassEscape :: `s` . - -

    The production CharacterClassEscape :: `w` evaluates as follows:

    - - 1. Return _WordCharacters_. - -

    The production CharacterClassEscape :: `W` evaluates as follows:

    - - 1. Return the CharSet containing all characters not in the CharSet returned by CharacterClassEscape :: `w` . - -

    The production CharacterClassEscape :: `p{` UnicodePropertyValueExpression `}` evaluates as follows:

    - - 1. Return the CharSet containing all Unicode code points included in the CharSet returned by |UnicodePropertyValueExpression|. - -

    The production CharacterClassEscape :: `P{` UnicodePropertyValueExpression `}` evaluates as follows:

    - - 1. Return the CharSet containing all Unicode code points not included in the CharSet returned by |UnicodePropertyValueExpression|. - -

    The production UnicodePropertyValueExpression :: UnicodePropertyName `=` UnicodePropertyValue evaluates as follows:

    - - 1. Let _ps_ be SourceText of |UnicodePropertyName|. - 1. Let _p_ be ! UnicodeMatchProperty(_ps_). - 1. Assert: _p_ is a Unicode property name or property alias listed in the “Property name and aliases” column of . - 1. Let _vs_ be SourceText of |UnicodePropertyValue|. - 1. Let _v_ be ! UnicodeMatchPropertyValue(_p_, _vs_). - 1. Return the CharSet containing all Unicode code points whose character database definition includes the property _p_ with value _v_. - -

    The production UnicodePropertyValueExpression :: LoneUnicodePropertyNameOrValue evaluates as follows:

    - - 1. Let _s_ be SourceText of |LoneUnicodePropertyNameOrValue|. - 1. If ! UnicodeMatchPropertyValue(`General_Category`, _s_) is identical to a List of Unicode code points that is the name of a Unicode general category or general category alias listed in the “Property value and aliases” column of , then - 1. Return the CharSet containing all Unicode code points whose character database definition includes the property “General_Category” with value _s_. - 1. Let _p_ be ! UnicodeMatchProperty(_s_). - 1. Assert: _p_ is a binary Unicode property or binary property alias listed in the “Property name and aliases” column of . - 1. Return the CharSet containing all Unicode code points whose character database definition includes the property _p_ with value “True”. - -
    -

    CharacterClass

    The production CharacterClass :: `[` ClassRanges `]` evaluates as follows:

    @@ -35601,10 +35574,28 @@

    NonemptyClassRanges

    1. Evaluate the first |ClassAtom| to obtain a CharSet _A_. 1. Evaluate the second |ClassAtom| to obtain a CharSet _B_. 1. Evaluate |ClassRanges| to obtain a CharSet _C_. - 1. Let _D_ be ! CharacterRange(_A_, _B_). + 1. Let _D_ be ! CharacterRangeOrUnion(_A_, _B_). 1. Return the union of _D_ and _C_. + +

    + CharacterRangeOrUnion ( + _A_: a CharSet, + _B_: a CharSet, + ) +

    +
    +
    + + 1. If _Unicode_ is *false*, then + 1. If _A_ does not contain exactly one character or _B_ does not contain exactly one character, then + 1. Let _C_ be the CharSet containing the single character `-` U+002D (HYPHEN-MINUS). + 1. Return the union of CharSets _A_, _B_ and _C_. + 1. Return ! CharacterRange(_A_, _B_). + +
    +

    CharacterRange ( @@ -35643,7 +35634,7 @@

    NonemptyClassRangesNoDash

    1. Evaluate |ClassAtomNoDash| to obtain a CharSet _A_. 1. Evaluate |ClassAtom| to obtain a CharSet _B_. 1. Evaluate |ClassRanges| to obtain a CharSet _C_. - 1. Let _D_ be ! CharacterRange(_A_, _B_). + 1. Let _D_ be ! CharacterRangeOrUnion(_A_, _B_). 1. Return the union of _D_ and _C_. @@ -35671,25 +35662,32 @@

    ClassAtom

    ClassAtomNoDash

    -

    The production ClassAtomNoDash :: SourceCharacter but not one of `\` or `]` or `-` evaluates as follows:

    +

    The production ClassAtomNoDash ::! SourceCharacter but not one of `\` or `]` or `-` evaluates as follows:

    1. Return the CharSet containing the character matched by |SourceCharacter|. -

    The production ClassAtomNoDash :: `\` ClassEscape evaluates as follows:

    +

    The production ClassAtomNoDash ::! `\` ClassEscape evaluates as follows:

    1. Return the CharSet that is the result of evaluating |ClassEscape|. +

    The production ClassAtomNoDash ::! `\` [lookahead == `c`] evaluates as follows:

    + + 1. Return the CharSet containing the single character `\\` U+005C (REVERSE SOLIDUS). + + This production can only be reached from the sequence `\c` within a character class where it is not followed by an acceptable control character.

    ClassEscape

    The |ClassEscape| productions evaluate as follows:

    - ClassEscape :: `b` + ClassEscape ::! `b` + + ClassEscape ::! `-` - ClassEscape :: `-` + ClassEscape ::! `c` ClassControlLetter - ClassEscape :: CharacterEscape + ClassEscape ::! CharacterEscape 1. Let _cv_ be the CharacterValue of this |ClassEscape|. @@ -35697,7 +35695,7 @@

    ClassEscape

    1. Return the CharSet containing the single character _c_.
    - ClassEscape :: CharacterClassEscape + ClassEscape ::! CharacterClassEscape 1. Return the CharSet that is the result of evaluating |CharacterClassEscape|. @@ -35706,6 +35704,151 @@

    ClassEscape

    A |ClassAtom| can use any of the escape sequences that are allowed in the rest of the regular expression except for `\\b`, `\\B`, and backreferences. Inside a |CharacterClass|, `\\b` means the backspace character, while `\\B` and backreferences raise errors. Using a backreference inside a |ClassAtom| causes an error.

    + + +

    AtomEscape

    +

    With parameter _direction_.

    +

    The production AtomEscape ::! DecimalEscape evaluates as follows:

    + + 1. Evaluate |DecimalEscape| to obtain an integer _n_. + 1. Assert: _n_ ≤ _NcapturingParens_. + 1. Return ! BackreferenceMatcher(_n_, _direction_). + +

    The production AtomEscape ::! CharacterClassEscape evaluates as follows:

    + + 1. Evaluate |CharacterClassEscape| to obtain a CharSet _A_. + 1. Return ! CharacterSetMatcher(_A_, *false*, _direction_). + + +

    An escape sequence of the form `\\` followed by a non-zero decimal number _n_ matches the result of the _n_th set of capturing parentheses (). It is an error if the regular expression has fewer than _n_ capturing parentheses. If the regular expression has _n_ or more capturing parentheses but the _n_th one is *undefined* because it has not captured anything, then the backreference always succeeds.

    +
    +

    The production AtomEscape ::! CharacterEscape evaluates as follows:

    + + 1. Evaluate |CharacterEscape| to obtain a character _ch_. + 1. Let _A_ be a one-element CharSet containing the character _ch_. + 1. Return ! CharacterSetMatcher(_A_, *false*, _direction_). + +

    The production AtomEscape ::! `k` GroupName evaluates as follows:

    + + 1. Search the enclosing |Pattern| for an instance of a |GroupSpecifier| containing a |RegExpIdentifierName| which has a CapturingGroupName equal to the CapturingGroupName of the |RegExpIdentifierName| contained in |GroupName|. + 1. Assert: A unique such |GroupSpecifier| is found. + 1. Let _parenIndex_ be the number of left-capturing parentheses in the entire regular expression that occur to the left of the located |GroupSpecifier|. This is the total number of Atom ::! `(` GroupSpecifier Disjunction `)` Parse Nodes prior to or enclosing the located |GroupSpecifier|, including its immediately enclosing |Atom|. + 1. Return ! BackreferenceMatcher(_parenIndex_, _direction_). + + + +

    + BackreferenceMatcher ( + _n_: a positive integer, + _direction_: 1 or -1, + ) +

    +
    +
    + + 1. Assert: _n_ ≥ 1. + 1. Return a new Matcher with parameters (_x_, _c_) that captures _n_ and _direction_ and performs the following steps when called: + 1. Assert: _x_ is a State. + 1. Assert: _c_ is a Continuation. + 1. Let _cap_ be _x_'s _captures_ List. + 1. Let _s_ be _cap_[_n_]. + 1. If _s_ is *undefined*, return _c_(_x_). + 1. Let _e_ be _x_'s _endIndex_. + 1. Let _len_ be the number of elements in _s_. + 1. Let _f_ be _e_ + _direction_ × _len_. + 1. If _f_ < 0 or _f_ > _InputLength_, return ~failure~. + 1. Let _g_ be min(_e_, _f_). + 1. If there exists an integer _i_ between 0 (inclusive) and _len_ (exclusive) such that Canonicalize(_s_[_i_]) is not the same character value as Canonicalize(_Input_[_g_ + _i_]), return ~failure~. + 1. Let _y_ be the State (_f_, _cap_). + 1. Return _c_(_y_). + +
    +
    + + +

    DecimalEscape

    +

    The |DecimalEscape| productions evaluate as follows:

    + DecimalEscape :: NonZeroDigit DecimalDigits? + + 1. Return the CapturingGroupNumber of this |DecimalEscape|. + + +

    If `\\` is followed by a decimal number _n_ whose first digit is not `0`, then the escape sequence is considered to be a backreference. It is an error if _n_ is greater than the total number of left-capturing parentheses in the entire regular expression.

    +
    +
    + + +

    CharacterClassEscape

    +

    The production CharacterClassEscape :: `d` evaluates as follows:

    + + 1. Return the ten-element CharSet containing the characters `0` through `9` inclusive. + +

    The production CharacterClassEscape :: `D` evaluates as follows:

    + + 1. Return the CharSet containing all characters not in the CharSet returned by CharacterClassEscape :: `d` . + +

    The production CharacterClassEscape :: `s` evaluates as follows:

    + + 1. Return the CharSet containing all characters corresponding to a code point on the right-hand side of the |WhiteSpace| or |LineTerminator| productions. + +

    The production CharacterClassEscape :: `S` evaluates as follows:

    + + 1. Return the CharSet containing all characters not in the CharSet returned by CharacterClassEscape :: `s` . + +

    The production CharacterClassEscape :: `w` evaluates as follows:

    + + 1. Return _WordCharacters_. + +

    The production CharacterClassEscape :: `W` evaluates as follows:

    + + 1. Return the CharSet containing all characters not in the CharSet returned by CharacterClassEscape :: `w` . + +

    The production CharacterClassEscape :: `p{` UnicodePropertyValueExpression `}` evaluates as follows:

    + + 1. Return the CharSet containing all Unicode code points included in the CharSet returned by |UnicodePropertyValueExpression|. + +

    The production CharacterClassEscape :: `P{` UnicodePropertyValueExpression `}` evaluates as follows:

    + + 1. Return the CharSet containing all Unicode code points not included in the CharSet returned by |UnicodePropertyValueExpression|. + +

    The production UnicodePropertyValueExpression :: UnicodePropertyName `=` UnicodePropertyValue evaluates as follows:

    + + 1. Let _ps_ be SourceText of |UnicodePropertyName|. + 1. Let _p_ be ! UnicodeMatchProperty(_ps_). + 1. Assert: _p_ is a Unicode property name or property alias listed in the “Property name and aliases” column of . + 1. Let _vs_ be SourceText of |UnicodePropertyValue|. + 1. Let _v_ be ! UnicodeMatchPropertyValue(_p_, _vs_). + 1. Return the CharSet containing all Unicode code points whose character database definition includes the property _p_ with value _v_. + +

    The production UnicodePropertyValueExpression :: LoneUnicodePropertyNameOrValue evaluates as follows:

    + + 1. Let _s_ be SourceText of |LoneUnicodePropertyNameOrValue|. + 1. If ! UnicodeMatchPropertyValue(`General_Category`, _s_) is identical to a List of Unicode code points that is the name of a Unicode general category or general category alias listed in the “Property value and aliases” column of , then + 1. Return the CharSet containing all Unicode code points whose character database definition includes the property “General_Category” with value _s_. + 1. Let _p_ be ! UnicodeMatchProperty(_s_). + 1. Assert: _p_ is a binary Unicode property or binary property alias listed in the “Property name and aliases” column of . + 1. Return the CharSet containing all Unicode code points whose character database definition includes the property _p_ with value “True”. + +
    + + +

    CharacterEscape

    +

    The |CharacterEscape| productions evaluate as follows:

    + + CharacterEscape ::! + ControlEscape + `c` ControlLetter + `0` [lookahead ∉ DecimalDigit] + HexEscapeSequence + RegExpUnicodeEscapeSequence + LegacyOctalEscapeSequence + IdentityEscape + + + 1. Let _cv_ be the CharacterValue of this |CharacterEscape|. + 1. Return the character whose character value is _cv_. + +
    @@ -35795,7 +35938,7 @@

    1. Assert: _parseResult_ is a |Pattern| Parse Node. 1. Set _obj_.[[OriginalSource]] to _P_. 1. Set _obj_.[[OriginalFlags]] to _F_. - 1. Set _obj_.[[RegExpMatcher]] to the Abstract Closure that evaluates _parseResult_ by applying the semantics provided in using _patternCharacters_ as the pattern's List of |SourceCharacter| values and _F_ as the flag parameters. + 1. Set _obj_.[[RegExpMatcher]] to the Abstract Closure that evaluates _parseResult_ by applying the semantics provided in using _patternCharacters_ as the pattern's List of |SourceCharacter| values and _F_ as the flag parameters. 1. Perform ? Set(_obj_, *"lastIndex"*, *+0*𝔽, *true*). 1. Return _obj_. @@ -46088,13 +46231,24 @@

    Lexical Grammar

    + + + + + + + + + + + @@ -46450,29 +46604,29 @@

    Regular Expressions

    + - + - - - - + - -

    Each `\\u` |HexTrailSurrogate| for which the choice of associated `u` |HexLeadSurrogate| is ambiguous shall be associated with the nearest possible `u` |HexLeadSurrogate| that would otherwise have no corresponding `\\u` |HexTrailSurrogate|.

    -

     

    - - - - + + + + + + + + + @@ -46483,13 +46637,15 @@

    Regular Expressions

    - - - - - - - + + + + + + + + + @@ -46506,305 +46662,12 @@

    Additional Syntax

    HTML-like Comments

    -

    The syntax and semantics of is extended as follows except that this extension is not allowed when parsing source code using the goal symbol |Module|:

    -

    Syntax

    - - Comment :: - MultiLineComment - SingleLineComment - SingleLineHTMLOpenComment - SingleLineHTMLCloseComment - SingleLineDelimitedComment - - MultiLineComment :: - `/*` FirstCommentLine? LineTerminator MultiLineCommentChars? `*/` HTMLCloseComment? - - FirstCommentLine :: - SingleLineDelimitedCommentChars - - SingleLineHTMLOpenComment :: - `<!--` SingleLineCommentChars? - - SingleLineHTMLCloseComment :: - LineTerminatorSequence HTMLCloseComment - - SingleLineDelimitedComment :: - `/*` SingleLineDelimitedCommentChars? `*/` - - HTMLCloseComment :: - WhiteSpaceSequence? SingleLineDelimitedCommentSequence? `-->` SingleLineCommentChars? - - SingleLineDelimitedCommentChars :: - SingleLineNotAsteriskChar SingleLineDelimitedCommentChars? - `*` SingleLinePostAsteriskCommentChars? - - SingleLineNotAsteriskChar :: - SourceCharacter but not one of `*` or LineTerminator - - SingleLinePostAsteriskCommentChars :: - SingleLineNotForwardSlashOrAsteriskChar SingleLineDelimitedCommentChars? - `*` SingleLinePostAsteriskCommentChars? - - SingleLineNotForwardSlashOrAsteriskChar :: - SourceCharacter but not one of `/` or `*` or LineTerminator - - WhiteSpaceSequence :: - WhiteSpace WhiteSpaceSequence? - - SingleLineDelimitedCommentSequence :: - SingleLineDelimitedComment WhiteSpaceSequence? SingleLineDelimitedCommentSequence? - -

    Similar to a |MultiLineComment| that contains a line terminator code point, a |SingleLineHTMLCloseComment| is considered to be a |LineTerminator| for purposes of parsing by the syntactic grammar.

    +

    The HTML-like comment syntax used to be normative optional outside |Module|s.

    Regular Expressions Patterns

    -

    The syntax of is modified and extended as follows. These changes introduce ambiguities that are broken by the ordering of grammar productions and by contextual information. When parsing using the following grammar, each alternative is considered only if previous production alternatives do not match.

    -

    This alternative pattern grammar and semantics only changes the syntax and semantics of BMP patterns. The following grammar extensions include productions parameterized with the [UnicodeMode] parameter. However, none of these extensions change the syntax of Unicode patterns recognized when parsing with the [UnicodeMode] parameter present on the goal symbol.

    -

    Syntax

    - - Term[UnicodeMode, N] :: - [+UnicodeMode] Assertion[+UnicodeMode, ?N] - [+UnicodeMode] Atom[+UnicodeMode, ?N] Quantifier - [+UnicodeMode] Atom[+UnicodeMode, ?N] - [~UnicodeMode] QuantifiableAssertion[?N] Quantifier - [~UnicodeMode] Assertion[~UnicodeMode, ?N] - [~UnicodeMode] ExtendedAtom[?N] Quantifier - [~UnicodeMode] ExtendedAtom[?N] - - Assertion[UnicodeMode, N] :: - `^` - `$` - `\` `b` - `\` `B` - [+UnicodeMode] `(` `?` `=` Disjunction[+UnicodeMode, ?N] `)` - [+UnicodeMode] `(` `?` `!` Disjunction[+UnicodeMode, ?N] `)` - [~UnicodeMode] QuantifiableAssertion[?N] - `(` `?` `<=` Disjunction[?UnicodeMode, ?N] `)` - `(` `?` `<!` Disjunction[?UnicodeMode, ?N] `)` - - QuantifiableAssertion[N] :: - `(` `?` `=` Disjunction[~UnicodeMode, ?N] `)` - `(` `?` `!` Disjunction[~UnicodeMode, ?N] `)` - - ExtendedAtom[N] :: - `.` - `\` AtomEscape[~UnicodeMode, ?N] - `\` [lookahead == `c`] - CharacterClass[~UnicodeMode] - `(` Disjunction[~UnicodeMode, ?N] `)` - `(` `?` `:` Disjunction[~UnicodeMode, ?N] `)` - InvalidBracedQuantifier - ExtendedPatternCharacter - - InvalidBracedQuantifier :: - `{` DecimalDigits[~Sep] `}` - `{` DecimalDigits[~Sep] `,` `}` - `{` DecimalDigits[~Sep] `,` DecimalDigits[~Sep] `}` - - ExtendedPatternCharacter :: - SourceCharacter but not one of `^` `$` `\` `.` `*` `+` `?` `(` `)` `[` `|` - - AtomEscape[UnicodeMode, N] :: - [+UnicodeMode] DecimalEscape - [~UnicodeMode] DecimalEscape [> but only if the CapturingGroupNumber of |DecimalEscape| is ≤ _NcapturingParens_] - CharacterClassEscape[?UnicodeMode] - CharacterEscape[?UnicodeMode, ?N] - [+N] `k` GroupName[?UnicodeMode] - - CharacterEscape[UnicodeMode, N] :: - ControlEscape - `c` ControlLetter - `0` [lookahead ∉ DecimalDigit] - HexEscapeSequence - RegExpUnicodeEscapeSequence[?UnicodeMode] - [~UnicodeMode] LegacyOctalEscapeSequence - IdentityEscape[?UnicodeMode, ?N] - - IdentityEscape[UnicodeMode, N] :: - [+UnicodeMode] SyntaxCharacter - [+UnicodeMode] `/` - [~UnicodeMode] SourceCharacterIdentityEscape[?N] - - SourceCharacterIdentityEscape[N] :: - [~N] SourceCharacter but not `c` - [+N] SourceCharacter but not one of `c` or `k` - - ClassAtomNoDash[UnicodeMode, N] :: - SourceCharacter but not one of `\` or `]` or `-` - `\` ClassEscape[?UnicodeMode, ?N] - `\` [lookahead == `c`] - - ClassEscape[UnicodeMode, N] :: - `b` - [+UnicodeMode] `-` - [~UnicodeMode] `c` ClassControlLetter - CharacterClassEscape[?UnicodeMode] - CharacterEscape[?UnicodeMode, ?N] - - ClassControlLetter :: - DecimalDigit - `_` - - -

    When the same left-hand sides occurs with both [+UnicodeMode] and [\~UnicodeMode] guards it is to control the disambiguation priority.

    -
    - - -

    Static Semantics: Early Errors

    -

    The semantics of is extended as follows:

    - ExtendedAtom :: InvalidBracedQuantifier -
      -
    • - It is a Syntax Error if any source text matches this rule. -
    • -
    -

    Additionally, the rules for the following productions are modified with the addition of the highlighted text:

    - NonemptyClassRanges :: ClassAtom `-` ClassAtom ClassRanges -
      -
    • - It is a Syntax Error if IsCharacterClass of the first |ClassAtom| is *true* or IsCharacterClass of the second |ClassAtom| is *true* and this production has a [UnicodeMode] parameter. -
    • -
    • - It is a Syntax Error if IsCharacterClass of the first |ClassAtom| is *false* and IsCharacterClass of the second |ClassAtom| is *false* and the CharacterValue of the first |ClassAtom| is larger than the CharacterValue of the second |ClassAtom|. -
    • -
    - NonemptyClassRangesNoDash :: ClassAtomNoDash `-` ClassAtom ClassRanges -
      -
    • - It is a Syntax Error if IsCharacterClass of |ClassAtomNoDash| is *true* or IsCharacterClass of |ClassAtom| is *true* and this production has a [UnicodeMode] parameter. -
    • -
    • - It is a Syntax Error if IsCharacterClass of |ClassAtomNoDash| is *false* and IsCharacterClass of |ClassAtom| is *false* and the CharacterValue of |ClassAtomNoDash| is larger than the CharacterValue of |ClassAtom|. -
    • -
    -
    - - -

    Static Semantics: IsCharacterClass

    -

    The semantics of is extended as follows:

    - - ClassAtomNoDash :: `\` [lookahead == `c`] - - - 1. Return *false*. - -
    - - -

    Static Semantics: CharacterValue

    -

    The semantics of is extended as follows:

    - - ClassAtomNoDash :: `\` [lookahead == `c`] - - - 1. Return the code point value of U+005C (REVERSE SOLIDUS). - - ClassEscape :: `c` ClassControlLetter - - 1. Let _ch_ be the code point matched by |ClassControlLetter|. - 1. Let _i_ be _ch_'s code point value. - 1. Return the remainder of dividing _i_ by 32. - - CharacterEscape :: LegacyOctalEscapeSequence - - 1. Return the MV of |LegacyOctalEscapeSequence| (see ). - -
    - - -

    Pattern Semantics

    -

    The semantics of is extended as follows:

    -

    Within reference to “Atom :: `(` GroupSpecifier Disjunction `)` ” are to be interpreted as meaning “Atom :: `(` GroupSpecifier Disjunction `)` ” or “ExtendedAtom :: `(` Disjunction `)` ”.

    - -

    Term () includes the following additional evaluation rules:

    -

    The production Term :: QuantifiableAssertion Quantifier evaluates the same as the production Term :: Atom Quantifier but with |QuantifiableAssertion| substituted for |Atom|.

    -

    The production Term :: ExtendedAtom Quantifier evaluates the same as the production Term :: Atom Quantifier but with |ExtendedAtom| substituted for |Atom|.

    -

    The production Term :: ExtendedAtom evaluates the same as the production Term :: Atom but with |ExtendedAtom| substituted for |Atom|.

    - -

    Assertion () includes the following additional evaluation rule:

    -

    The production Assertion :: QuantifiableAssertion evaluates as follows:

    - - 1. Evaluate |QuantifiableAssertion| to obtain a Matcher _m_. - 1. Return _m_. - - -

    Assertion () evaluation rules for the Assertion :: `(` `?` `=` Disjunction `)` and Assertion :: `(` `?` `!` Disjunction `)` productions are also used for the |QuantifiableAssertion| productions, but with |QuantifiableAssertion| substituted for |Assertion|.

    - -

    Atom () evaluation rules for the |Atom| productions except for Atom :: PatternCharacter are also used for the |ExtendedAtom| productions, but with |ExtendedAtom| substituted for |Atom|. The following evaluation rules, with parameter _direction_, are also added:

    -

    The production ExtendedAtom :: `\` [lookahead == `c`] evaluates as follows:

    - - 1. Let _A_ be the CharSet containing the single character `\\` U+005C (REVERSE SOLIDUS). - 1. Return ! CharacterSetMatcher(_A_, *false*, _direction_). - -

    The production ExtendedAtom :: ExtendedPatternCharacter evaluates as follows:

    - - 1. Let _ch_ be the character represented by |ExtendedPatternCharacter|. - 1. Let _A_ be a one-element CharSet containing the character _ch_. - 1. Return ! CharacterSetMatcher(_A_, *false*, _direction_). - - -

    CharacterEscape () includes the following additional evaluation rule:

    -

    The production CharacterEscape :: LegacyOctalEscapeSequence evaluates as follows:

    - - 1. Let _cv_ be the CharacterValue of this |CharacterEscape|. - 1. Return the character whose character value is _cv_. - - -

    NonemptyClassRanges () modifies the following evaluation rule:

    -

    The production NonemptyClassRanges :: ClassAtom `-` ClassAtom ClassRanges evaluates as follows:

    - - 1. Evaluate the first |ClassAtom| to obtain a CharSet _A_. - 1. Evaluate the second |ClassAtom| to obtain a CharSet _B_. - 1. Evaluate |ClassRanges| to obtain a CharSet _C_. - 1. Let _D_ be ! CharacterRangeOrUnion(_A_, _B_). - 1. Return the union of _D_ and _C_. - - -

    NonemptyClassRangesNoDash () modifies the following evaluation rule:

    -

    The production NonemptyClassRangesNoDash :: ClassAtomNoDash `-` ClassAtom ClassRanges evaluates as follows:

    - - 1. Evaluate |ClassAtomNoDash| to obtain a CharSet _A_. - 1. Evaluate |ClassAtom| to obtain a CharSet _B_. - 1. Evaluate |ClassRanges| to obtain a CharSet _C_. - 1. Let _D_ be ! CharacterRangeOrUnion(_A_, _B_). - 1. Return the union of _D_ and _C_. - - -

    ClassEscape () includes the following additional evaluation rule:

    -

    The production ClassEscape :: `c` ClassControlLetter evaluates as follows:

    - - 1. Let _cv_ be the CharacterValue of this |ClassEscape|. - 1. Let _c_ be the character whose character value is _cv_. - 1. Return the CharSet containing the single character _c_. - - -

    ClassAtomNoDash () includes the following additional evaluation rule:

    -

    The production ClassAtomNoDash :: `\` [lookahead == `c`] evaluates as follows:

    - - 1. Return the CharSet containing the single character `\\` U+005C (REVERSE SOLIDUS). - - - This production can only be reached from the sequence `\c` within a character class where it is not followed by an acceptable control character. - - -

    - CharacterRangeOrUnion ( - _A_: a CharSet, - _B_: a CharSet, - ) -

    -
    -
    - - 1. If _Unicode_ is *false*, then - 1. If _A_ does not contain exactly one character or _B_ does not contain exactly one character, then - 1. Let _C_ be the CharSet containing the single character `-` U+002D (HYPHEN-MINUS). - 1. Return the union of CharSets _A_, _B_ and _C_. - 1. Return ! CharacterRange(_A_, _B_). - -
    -
    +

    Some of the syntax and semantics of BMP patterns ([~UnicodeMode]) used to be normative optional.