diff --git a/spec.html b/spec.html index ea3cb4f174..c578e778f8 100644 --- a/spec.html +++ b/spec.html @@ -520,8 +520,8 @@
A lexical grammar for ECMAScript is given in clause
Input elements other than white space and comments form the terminal symbols for the syntactic grammar for ECMAScript and are called ECMAScript tokens. These tokens are the reserved words, identifiers, literals, and punctuators of the ECMAScript language. Moreover, line terminators, although not considered to be tokens, also become part of the stream of input elements and guide the process of automatic semicolon insertion (
A RegExp grammar for ECMAScript is given in
Input elements other than white space and comments form the terminal symbols for the syntactic grammar for ECMAScript and are called ECMAScript tokens. These tokens are the reserved words, identifiers, literals, and punctuators of the ECMAScript language. Moreover, line terminators, although not considered to be tokens, also become part of the stream of input elements and guide the process of automatic semicolon insertion (
A RegExp grammar for ECMAScript is given in
Productions of the lexical and RegExp grammars are distinguished by having two colons “::” as separating punctuation. The lexical and RegExp grammars share some productions.
Like white space code points, line terminator code points are used to improve source text readability and to separate tokens (indivisible lexical units) from each other. However, unlike white space code points, line terminators have some influence over the behaviour of the syntactic grammar. In general, line terminators may occur between any two tokens, but there are a few places where they are forbidden by the syntactic grammar. Line terminators also affect the process of automatic semicolon insertion (
A line terminator can occur within a |MultiLineComment| but cannot occur within a |SingleLineComment|.
+A line terminator must occur within a |MultiLineComment| but cannot occur within a |SingleLineDelimitedComment| or a |SingleLineComment|.
Line terminators are included in the set of white space code points that are matched by the `\\s` class in regular expressions.
The ECMAScript line terminator code points are listed in
Comments can be either single or multi-line. Multi-line comments cannot nest.
Because a single-line comment can contain any Unicode code point except a |LineTerminator| code point, and because of the general rule that a token is always as long as possible, a single-line comment always consists of all code points from the `//` marker to the end of the line. However, the |LineTerminator| at the end of the line is not considered to be part of the single-line comment; it is recognized separately by the lexical grammar and becomes part of the stream of input elements for the syntactic grammar. This point is very important, because it implies that the presence or absence of single-line comments does not affect the process of automatic semicolon insertion (see
Comments behave like white space and are discarded except that, if a |MultiLineComment| contains a line terminator code point, then the entire comment is considered to be a |LineTerminator| for purposes of parsing by the syntactic grammar.
+Comments behave like white space and are discarded except that a |MultiLineComment| or a |SingleLineHTMLCloseComment| is considered to be a |LineTerminator| for purposes of parsing by the syntactic grammar.
A number of productions in this section are given alternative definitions in section
A regular expression literal is an input element that is converted to a RegExp object (see
The productions below describe the syntax for a regular expression literal and are used by the input element scanner to find the end of the regular expression literal. The source text comprising the |RegularExpressionBody| and the |RegularExpressionFlags| are subsequently parsed again using the more stringent ECMAScript Regular Expression grammar (
An implementation may extend the ECMAScript Regular Expression grammar defined in
The productions below describe the syntax for a regular expression literal and are used by the input element scanner to find the end of the regular expression literal. The source text comprising the |RegularExpressionBody| and the |RegularExpressionFlags| are subsequently parsed again using the more stringent ECMAScript Regular Expression grammar (
An implementation may extend the ECMAScript Regular Expression grammar defined in
The form and functionality of regular expressions is modelled after the regular expression facility in the Perl 5 programming language.
-The RegExp constructor applies the following grammar to the input pattern String. An error occurs if the grammar cannot interpret the String as an expansion of |Pattern|.
-The `RegExp` constructor applies the following grammar to the input pattern String. An error occurs if the grammar cannot interpret the String as an expansion of |Pattern|.
+Some of these productions (indicated by “::!”) introduce ambiguities that are broken by the ordering of alternatives. When parsing using such productions, each alternative is considered only if previous alternatives do not match.
+Each `\\u` |HexTrailSurrogate| for which the choice of associated `u` |HexLeadSurrogate| is ambiguous shall be associated with the nearest possible `u` |HexLeadSurrogate| that would otherwise have no corresponding `\\u` |HexTrailSurrogate|.
+ +Each `\\u` |HexTrailSurrogate| for which the choice of associated `u` |HexLeadSurrogate| is ambiguous shall be associated with the nearest possible `u` |HexLeadSurrogate| that would otherwise have no corresponding `\\u` |HexTrailSurrogate|.
+A number of productions in this section are given alternative definitions in section
Patterns that use the following productions are allowed, but deprecated:
+This section is amended in
This section is amended in
The definitions of “the MV of |NonZeroDigit|” and “the MV of |DecimalDigits|” are in
This section is amended in
This section is amended in
`\\0` represents the <NUL> character and cannot be followed by a decimal digit.
This section is amended in
A regular expression pattern is converted into an Abstract Closure using the process described below. An implementation is encouraged to use more efficient algorithms than the ones listed below, as long as the results are the same. The Abstract Closure is used as the value of a RegExp object's [[RegExpMatcher]] internal slot.
A |Pattern| is either a BMP pattern or a Unicode pattern depending upon whether or not its associated flags contain a `u`. A BMP pattern matches against a String interpreted as consisting of a sequence of 16-bit values that are Unicode code points in the range of the Basic Multilingual Plane. A Unicode pattern matches against a String interpreted as consisting of Unicode code points encoded using UTF-16. In the context of describing the behaviour of a BMP pattern “character” means a single 16-bit Unicode BMP code point. In the context of describing the behaviour of a Unicode pattern “character” means a UTF-16 encoded code point (
The syntax and semantics of |Pattern| is defined as if the source code for the |Pattern| was a List of |SourceCharacter| values where each |SourceCharacter| corresponds to a Unicode code point. If a BMP pattern contains a non-BMP |SourceCharacter| the entire pattern is encoded using UTF-16 and the individual code units of that encoding are used as the elements of the List.
@@ -34831,7 +34937,7 @@A Pattern evaluates (“compiles”) to an Abstract Closure value. RegExpBuiltinExec can then apply this procedure to a String and an offset within the String to determine whether the pattern would match starting at exactly that offset within the String, and, if it does match, what the values of the capturing parentheses would be. The algorithms in
A Pattern evaluates (“compiles”) to an Abstract Closure value. RegExpBuiltinExec can then apply this procedure to a String and an offset within the String to determine whether the pattern would match starting at exactly that offset within the String, and, if it does match, what the values of the capturing parentheses would be. The algorithms in
With parameter _direction_.
-The production
The production
The resulting Matcher is independent of _direction_.
The production
The production
The production
The production
The production
The production
The production
The production
The production
The production
The production
The production
The production
The production
With parameter _direction_.
-The production
The production
The production
The production
The production
The production
The production
The production
The production
The production
The production
The production
The production
With parameter _direction_.
-The production
The production
The production
An escape sequence of the form `\\` followed by a non-zero decimal number _n_ matches the result of the _n_th set of capturing parentheses (
The production
The |CharacterEscape| productions evaluate as follows:
-The |DecimalEscape| productions evaluate as follows:
-If `\\` is followed by a decimal number _n_ whose first digit is not `0`, then the escape sequence is considered to be a backreference. It is an error if _n_ is greater than the total number of left-capturing parentheses in the entire regular expression.
-The production
The production
The production
The production
The production
The production
The production
The production
The production
The production
The production
The production
The production
The production
The production
The production
The |ClassEscape| productions evaluate as follows:
A |ClassAtom| can use any of the escape sequences that are allowed in the rest of the regular expression except for `\\b`, `\\B`, and backreferences. Inside a |CharacterClass|, `\\b` means the backspace character, while `\\B` and backreferences raise errors. Using a backreference inside a |ClassAtom| causes an error.
With parameter _direction_.
+The production
The production
An escape sequence of the form `\\` followed by a non-zero decimal number _n_ matches the result of the _n_th set of capturing parentheses (
The production
The production
The |DecimalEscape| productions evaluate as follows:
+If `\\` is followed by a decimal number _n_ whose first digit is not `0`, then the escape sequence is considered to be a backreference. It is an error if _n_ is greater than the total number of left-capturing parentheses in the entire regular expression.
+The production
The production
The production
The production
The production
The production
The production
The production
The production
The production
The |CharacterEscape| productions evaluate as follows:
+Each `\\u` |HexTrailSurrogate| for which the choice of associated `u` |HexLeadSurrogate| is ambiguous shall be associated with the nearest possible `u` |HexLeadSurrogate| that would otherwise have no corresponding `\\u` |HexTrailSurrogate|.
--
The syntax and semantics of
Similar to a |MultiLineComment| that contains a line terminator code point, a |SingleLineHTMLCloseComment| is considered to be a |LineTerminator| for purposes of parsing by the syntactic grammar.
+The HTML-like comment syntax used to be normative optional outside |Module|s.
The syntax of
This alternative pattern grammar and semantics only changes the syntax and semantics of BMP patterns. The following grammar extensions include productions parameterized with the [UnicodeMode] parameter. However, none of these extensions change the syntax of Unicode patterns recognized when parsing with the [UnicodeMode] parameter present on the goal symbol.
-When the same left-hand sides occurs with both [+UnicodeMode] and [\~UnicodeMode] guards it is to control the disambiguation priority.
-The semantics of
Additionally, the rules for the following productions are modified with the addition of the highlighted text:
-The semantics of
The semantics of
The semantics of
Within
Term (
The production
The production
The production
Assertion (
The production
Assertion (
Atom (
The production
The production
CharacterEscape (
The production
NonemptyClassRanges (
The production
NonemptyClassRangesNoDash (
The production
ClassEscape (
The production
ClassAtomNoDash (
The production
Some of the syntax and semantics of BMP patterns ([~UnicodeMode]) used to be normative optional.