diff --git a/spec.html b/spec.html index d4db9bf419..de4e20c83f 100644 --- a/spec.html +++ b/spec.html @@ -29196,6 +29196,266 @@

Static Semantics: Early Errors

It is a Syntax Error if _NcapturingParens_ ≥ 232-1. + QuantifierPrefix :: `{` DecimalDigits `,` DecimalDigits `}` + + AtomEscape :: DecimalEscape + + NonemptyClassRanges :: ClassAtom `-` ClassAtom ClassRanges + + NonemptyClassRangesNoDash :: ClassAtomNoDash `-` ClassAtom ClassRanges + + + + +

Static Semantics: CapturingGroupNumber

+ DecimalEscape :: NonZeroDigit + + 1. Return the MV of |NonZeroDigit|. + + DecimalEscape :: NonZeroDigit DecimalDigits + + 1. Let _n_ be the number of code points in |DecimalDigits|. + 1. Return (the MV of |NonZeroDigit| × 10_n_) plus the MV of |DecimalDigits|. + +

The definitions of “the MV of |NonZeroDigit|” and “the MV of |DecimalDigits|” are in .

+
+ + +

Static Semantics: IsCharacterClass

+ + ClassAtom :: `-` + ClassAtomNoDash :: SourceCharacter but not one of `\` or `]` or `-` + ClassEscape :: `b` + ClassEscape :: `-` + ClassEscape :: CharacterEscape + + + 1. Return *false*. + + ClassEscape :: CharacterClassEscape + + 1. Return *true*. + +
+ + +

Static Semantics: CharacterValue

+ + ClassAtom :: `-` + + + 1. Return the code point value of U+002D (HYPHEN-MINUS). + + + ClassAtomNoDash :: SourceCharacter but not one of `\` or `]` or `-` + + + 1. Let _ch_ be the code point matched by |SourceCharacter|. + 1. Return the code point value of _ch_. + + + ClassEscape :: `b` + + + 1. Return the code point value of U+0008 (BACKSPACE). + + + ClassEscape :: `-` + + + 1. Return the code point value of U+002D (HYPHEN-MINUS). + + CharacterEscape :: ControlEscape + + 1. Return the code point value according to . + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ ControlEscape + + Code Point Value + + Code Point + + Unicode Name + + Symbol +
+ `t` + + 9 + + `U+0009` + + CHARACTER TABULATION + + <HT> +
+ `n` + + 10 + + `U+000A` + + LINE FEED (LF) + + <LF> +
+ `v` + + 11 + + `U+000B` + + LINE TABULATION + + <VT> +
+ `f` + + 12 + + `U+000C` + + FORM FEED (FF) + + <FF> +
+ `r` + + 13 + + `U+000D` + + CARRIAGE RETURN (CR) + + <CR> +
+
+ CharacterEscape :: `c` ControlLetter + + 1. Let _ch_ be the code point matched by |ControlLetter|. + 1. Let _i_ be _ch_'s code point value. + 1. Return the remainder of dividing _i_ by 32. + + CharacterEscape :: `0` [lookahead <! DecimalDigit] + + 1. Return the code point value of U+0000 (NULL). + + +

`\\0` represents the <NUL> character and cannot be followed by a decimal digit.

+
+ CharacterEscape :: HexEscapeSequence + + 1. Return the code point value of the SV of |HexEscapeSequence|. + + RegExpUnicodeEscapeSequence :: `u` LeadSurrogate `\u` TrailSurrogate + + 1. Let _lead_ be the CharacterValue of |LeadSurrogate|. + 1. Let _trail_ be the CharacterValue of |TrailSurrogate|. + 1. Let _cp_ be UTF16Decode(_lead_, _trail_). + 1. Return the code point value of _cp_. + + RegExpUnicodeEscapeSequence :: `u` LeadSurrogate + + 1. Return the CharacterValue of |LeadSurrogate|. + + RegExpUnicodeEscapeSequence :: `u` TrailSurrogate + + 1. Return the CharacterValue of |TrailSurrogate|. + + RegExpUnicodeEscapeSequence :: `u` NonSurrogate + + 1. Return the CharacterValue of |NonSurrogate|. + + RegExpUnicodeEscapeSequence :: `u` Hex4Digits + + 1. Return the MV of |Hex4Digits|. + + RegExpUnicodeEscapeSequence :: `u{` CodePoint `}` + + 1. Return the MV of |CodePoint|. + + + LeadSurrogate :: Hex4Digits + TrailSurrogate :: Hex4Digits + NonSurrogate :: Hex4Digits + + + 1. Return the MV of |HexDigits|. + + CharacterEscape :: IdentityEscape + + 1. Let _ch_ be the code point matched by |IdentityEscape|. + 1. Return the code point value of _ch_. +
@@ -29347,7 +29607,7 @@

Term

1. Evaluate |Atom| to obtain a Matcher _m_. 1. Evaluate |Quantifier| to obtain the three results: an integer _min_, an integer (or ∞) _max_, and Boolean _greedy_. - 1. If _max_ is finite and less than _min_, throw a *SyntaxError* exception. + 1. Assert: If _max_ is finite, then _max_ is not less than _min_. 1. Let _parenIndex_ be the number of left-capturing parentheses in the entire regular expression that occur to the left of this |Term|. This is the total number of Atom :: `(` Disjunction `)` Parse Nodes prior to or enclosing this |Term|. 1. Let _parenCount_ be the number of left-capturing parentheses in |Atom|. This is the total number of Atom :: `(` Disjunction `)` Parse Nodes enclosed by |Atom|. 1. Return an internal Matcher closure that takes two arguments, a State _x_ and a Continuation _c_, and performs the following steps when evaluated: @@ -29904,7 +30164,7 @@

AtomEscape

The production AtomEscape :: DecimalEscape evaluates as follows:

1. Evaluate |DecimalEscape| to obtain an integer _n_. - 1. If _n_>_NcapturingParens_, throw a *SyntaxError* exception. + 1. Assert: _n_ ≤ _NcapturingParens_. 1. Return an internal Matcher closure that takes two arguments, a State _x_ and a Continuation _c_, and performs the following steps: 1. Let _cap_ be _x_'s _captures_ List. 1. Let _s_ be _cap_[_n_]. @@ -29936,198 +30196,34 @@

AtomEscape

CharacterEscape

-

The production CharacterEscape :: `0` evaluates as follows:

- - 1. Return the character U+0000 (NULL). - - -

`\\0` represents the <NUL> character and cannot be followed by a decimal digit.

-
-

The production CharacterEscape :: ControlEscape evaluates as follows:

- - 1. Return the character according to . - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- ControlEscape - - Character Value - - Code Point - - Unicode Name - - Symbol -
- `t` - - 9 - - `U+0009` - - CHARACTER TABULATION - - <HT> -
- `n` - - 10 - - `U+000A` - - LINE FEED (LF) - - <LF> -
- `v` - - 11 - - `U+000B` - - LINE TABULATION - - <VT> -
- `f` - - 12 - - `U+000C` - - FORM FEED (FF) - - <FF> -
- `r` - - 13 - - `U+000D` - - CARRIAGE RETURN (CR) - - <CR> -
-
-

The production CharacterEscape :: `c` ControlLetter evaluates as follows:

- - 1. Let _ch_ be the character matched by |ControlLetter|. - 1. Let _i_ be _ch_'s character value. - 1. Let _j_ be the remainder of dividing _i_ by 32. - 1. Return the character whose character value is _j_. - -

The production CharacterEscape :: HexEscapeSequence evaluates as follows:

- - 1. Return the character whose code is the SV of |HexEscapeSequence|. - -

The production CharacterEscape :: RegExpUnicodeEscapeSequence evaluates as follows:

- - 1. Return the result of evaluating |RegExpUnicodeEscapeSequence|. - -

The production CharacterEscape :: IdentityEscape evaluates as follows:

- - 1. Return the character matched by |IdentityEscape|. - -

The production RegExpUnicodeEscapeSequence :: `u` LeadSurrogate `\u` TrailSurrogate evaluates as follows:

- - 1. Let _lead_ be the result of evaluating |LeadSurrogate|. - 1. Let _trail_ be the result of evaluating |TrailSurrogate|. - 1. Let _cp_ be UTF16Decode(_lead_, _trail_). - 1. Return the character whose character value is _cp_. - -

The production RegExpUnicodeEscapeSequence :: `u` LeadSurrogate evaluates as follows:

- - 1. Return the character whose code is the result of evaluating |LeadSurrogate|. - -

The production RegExpUnicodeEscapeSequence :: `u` TrailSurrogate evaluates as follows:

- - 1. Return the character whose code is the result of evaluating |TrailSurrogate|. - -

The production RegExpUnicodeEscapeSequence :: `u` NonSurrogate evaluates as follows:

- - 1. Return the character whose code is the result of evaluating |NonSurrogate|. - -

The production RegExpUnicodeEscapeSequence :: `u` Hex4Digits evaluates as follows:

- - 1. Return the character whose code is the SV of |Hex4Digits|. - -

The production RegExpUnicodeEscapeSequence :: `u{` CodePoint `}` evaluates as follows:

- - 1. Return the character whose code is the MV of |CodePoint|. - -

The production LeadSurrogate :: Hex4Digits evaluates as follows:

- - 1. Return the character whose code is the SV of |Hex4Digits|. - -

The production TrailSurrogate :: Hex4Digits evaluates as follows:

- - 1. Return the character whose code is the SV of |Hex4Digits|. - -

The production NonSurrogate :: Hex4Digits evaluates as follows:

+

The |CharacterEscape| productions evaluate as follows:

+ + CharacterEscape :: + ControlEscape + `c` ControlLetter + `0` [lookahead <! DecimalDigit] + HexEscapeSequence + RegExpUnicodeEscapeSequence + IdentityEscape + - 1. Return the character whose code is the SV of |Hex4Digits|. + 1. Let _cv_ be the CharacterValue of this |CharacterEscape|. + 1. Return the character whose character value is _cv_.

DecimalEscape

-

The production DecimalEscape :: NonZeroDigit evaluates as follows:

- - 1. Return the MV of |NonZeroDigit|. - -

The production DecimalEscape :: NonZeroDigit DecimalDigits evaluates as follows:

+

The |DecimalEscape| productions evaluate as follows:

+ + DecimalEscape :: + NonZeroDigit + NonZeroDigit DecimalDigits + - 1. Let _n_ be the number of code points in |DecimalDigits|. - 1. Return (the MV of |NonZeroDigit| × 10_n_) plus the MV of |DecimalDigits|. + 1. Return the CapturingGroupNumber of this |DecimalEscape|. -

The definitions of “the MV of |NonZeroDigit|” and “the MV of |DecimalDigits|” are in .

If `\\` is followed by a decimal number _n_ whose first digit is not `0`, then the escape sequence is considered to be a backreference. It is an error if _n_ is greater than the total number of left-capturing parentheses in the entire regular expression.

@@ -30186,8 +30282,7 @@

ClassRanges

The production ClassRanges :: NonemptyClassRanges evaluates as follows:

- 1. Evaluate |NonemptyClassRanges| to obtain a CharSet _A_. - 1. Return _A_. + 1. Return the CharSet that is the result of evaluating |NonemptyClassRanges|. @@ -30218,12 +30313,12 @@

NonemptyClassRanges

Runtime Semantics: CharacterRange ( _A_, _B_ )

The abstract operation CharacterRange takes two CharSet parameters _A_ and _B_ and performs the following steps:

- 1. If _A_ does not contain exactly one character or _B_ does not contain exactly one character, throw a *SyntaxError* exception. + 1. Assert: _A_ and _B_ each contain exactly one character. 1. Let _a_ be the one character in CharSet _A_. 1. Let _b_ be the one character in CharSet _B_. 1. Let _i_ be the character value of character _a_. 1. Let _j_ be the character value of character _b_. - 1. If _i_ > _j_, throw a *SyntaxError* exception. + 1. Assert: _i_ ≤ _j_. 1. Return the set containing all characters numbered _i_ through _j_, inclusive. @@ -30270,8 +30365,7 @@

ClassAtom

The production ClassAtom :: ClassAtomNoDash evaluates as follows:

- 1. Evaluate |ClassAtomNoDash| to obtain a CharSet _A_. - 1. Return _A_. + 1. Return the CharSet that is the result of evaluating |ClassAtomNoDash|. @@ -30291,19 +30385,20 @@

ClassAtomNoDash

ClassEscape

-

The production ClassEscape :: `b` evaluates as follows:

- - 1. Return the CharSet containing the single character <BS> U+0008 (BACKSPACE). - -

The production ClassEscape :: `-` evaluates as follows:

- - 1. Return the CharSet containing the single character `-` U+002D (HYPHEN-MINUS). - -

The production ClassEscape :: CharacterEscape evaluates as follows:

+

The |ClassEscape| productions evaluate as follows:

+ + ClassEscape :: `b` + ClassEscape :: `-` + ClassEscape :: CharacterEscape + - 1. Return the CharSet containing the single character that is the result of evaluating |CharacterEscape|. + 1. Let _cv_ be the CharacterValue of this |ClassEscape|. + 1. Let _c_ be the character whose character value is _cv_. + 1. Return the CharSet containing the single character _c_. -

The production ClassEscape :: CharacterClassEscape evaluates as follows:

+ + ClassEscape :: CharacterClassEscape + 1. Return the CharSet that is the result of evaluating |CharacterClassEscape|. @@ -39000,7 +39095,7 @@

Syntax

AtomEscape[U] :: [+U] DecimalEscape - [~U] DecimalEscape [> but only if the integer value of |DecimalEscape| is <= _NcapturingParens_] + [~U] DecimalEscape [> but only if the CapturingGroupNumber of |DecimalEscape| is <= _NcapturingParens_] CharacterClassEscape CharacterEscape[~U] @@ -39018,6 +39113,11 @@

Syntax

[+U] `/` [~U] SourceCharacter but not `c` + ClassAtomNoDash[U] :: + SourceCharacter but not one of `\` or `]` or `-` + `\` ClassEscape[?U] + `\` [lookahead == `c`] + ClassEscape[U] :: `b` [+U] `-` @@ -39028,16 +39128,67 @@

Syntax

ClassControlLetter :: DecimalDigit `_` - - ClassAtomNoDash[U] :: - SourceCharacter but not one of `\` or `]` or `-` - `\` ClassEscape[?U] - `\` [lookahead == `c`]

When the same left hand sides occurs with both [+U] and [\~U] guards it is to control the disambiguation priority.

+ +

Static Semantics: Early Errors

+

The semantics of is extended as follows:

+ ExtendedAtom :: InvalidBracedQuantifier + + NonemptyClassRanges :: ClassAtom `-` ClassAtom ClassRanges + + NonemptyClassRangesNoDash :: ClassAtomNoDash `-` ClassAtom ClassRanges + +
+ + +

Static Semantics: IsCharacterClass

+

The semantics of is extended as follows:

+ + ClassAtomNoDash :: `\` [lookahead == `c`] + + + 1. Return *false*. + +
+ + +

Static Semantics: CharacterValue

+

The semantics of is extended as follows:

+ + ClassAtomNoDash :: `\` [lookahead == `c`] + + + 1. Return the code point value of U+005C (REVERSE SOLIDUS). + + ClassEscape :: `c` ClassControlLetter + + 1. Let _ch_ be the code point matched by |ClassControlLetter|. + 1. Let _i_ be _ch_'s code point value. + 1. Return the remainder of dividing _i_ by 32. + + CharacterEscape :: LegacyOctalEscapeSequence + + 1. Evaluate the SV of the |LegacyOctalEscapeSequence| (see ) to obtain a code unit _cu_. + 1. Return the code unit value of _cu_. + +
+

Pattern Semantics

@@ -39064,10 +39215,6 @@

Pattern Semantics

1. Let _A_ be the CharSet containing the single character `\\` U+005C (REVERSE SOLIDUS). 1. Call CharacterSetMatcher(_A_, *false*) and return its Matcher result. -

The production ExtendedAtom :: InvalidBracedQuantifier evaluates as follows:

- - 1. Throw a *SyntaxError* exception. -

The production ExtendedAtom :: ExtendedPatternCharacter evaluates as follows:

1. Let _ch_ be the character represented by |ExtendedPatternCharacter|. @@ -39078,8 +39225,8 @@

Pattern Semantics

CharacterEscape () includes the following additional evaluation rule:

The production CharacterEscape :: LegacyOctalEscapeSequence evaluates as follows:

- 1. Evaluate the SV of the |LegacyOctalEscapeSequence| (see ) to obtain a character _ch_. - 1. Return _ch_. + 1. Let _cv_ be the CharacterValue of this |CharacterEscape|. + 1. Return the character whose character value is _cv_.

NonemptyClassRanges () modifies the following evaluation rule:

@@ -39105,11 +39252,9 @@

Pattern Semantics

ClassEscape () includes the following additional evaluation rule:

The production ClassEscape :: `c` ClassControlLetter evaluates as follows:

- 1. Let _ch_ be the character matched by |ClassControlLetter|. - 1. Let _i_ be _ch_'s character value. - 1. Let _j_ be the remainder of dividing _i_ by 32. - 1. Let _d_ be the character whose character value is _j_. - 1. Return the CharSet containing the single character _d_. + 1. Let _cv_ be the CharacterValue of this |ClassEscape|. + 1. Let _c_ be the character whose character value is _cv_. + 1. Return the CharSet containing the single character _c_.

ClassAtomNoDash () includes the following additional evaluation rule: