From 8fe6c3af12e4547c2e6a8daac6b0a38bb303a3bd Mon Sep 17 00:00:00 2001 From: Michael Dyck Date: Sun, 4 Aug 2019 17:39:27 -0400 Subject: [PATCH] Normative: Make B.1.3 "HTML-like comments" normative (Part of Annex B reform, see PR #1595.) --- spec.html | 126 ++++++++++++++++++++++++++++++------------------------ 1 file changed, 69 insertions(+), 57 deletions(-) diff --git a/spec.html b/spec.html index 43f39fe00ad..8322ae745bd 100644 --- a/spec.html +++ b/spec.html @@ -520,7 +520,7 @@

Context-Free Grammars

The Lexical and RegExp Grammars

A lexical grammar for ECMAScript is given in clause . This grammar has as its terminal symbols Unicode code points that conform to the rules for |SourceCharacter| defined in . It defines a set of productions, starting from the goal symbol |InputElementDiv|, |InputElementTemplateTail|, or |InputElementRegExp|, or |InputElementRegExpOrTemplateTail|, that describe how sequences of such code points are translated into a sequence of input elements.

-

Input elements other than white space and comments form the terminal symbols for the syntactic grammar for ECMAScript and are called ECMAScript tokens. These tokens are the reserved words, identifiers, literals, and punctuators of the ECMAScript language. Moreover, line terminators, although not considered to be tokens, also become part of the stream of input elements and guide the process of automatic semicolon insertion (). Simple white space and single-line comments are discarded and do not appear in the stream of input elements for the syntactic grammar. A |MultiLineComment| (that is, a comment of the form `/*`…`*/` regardless of whether it spans more than one line) is likewise simply discarded if it contains no line terminator; but if a |MultiLineComment| contains one or more line terminators, then it is replaced by a single line terminator, which becomes part of the stream of input elements for the syntactic grammar.

+

Input elements other than white space and comments form the terminal symbols for the syntactic grammar for ECMAScript and are called ECMAScript tokens. These tokens are the reserved words, identifiers, literals, and punctuators of the ECMAScript language. Moreover, line terminators, although not considered to be tokens, also become part of the stream of input elements and guide the process of automatic semicolon insertion (). Simple white space and single-line comments are discarded and do not appear in the stream of input elements for the syntactic grammar. A |MultiLineComment| (that is, a comment of the form `/*`…`*/` that spans more than one line) is replaced by a single line terminator, which becomes part of the stream of input elements for the syntactic grammar.

A RegExp grammar for ECMAScript is given in . This grammar also has as its terminal symbols the code points as defined by |SourceCharacter|. It defines a set of productions, starting from the goal symbol |Pattern|, that describe how sequences of code points are translated into regular expression patterns.

Productions of the lexical and RegExp grammars are distinguished by having two colons “::” as separating punctuation. The lexical and RegExp grammars share some productions.

@@ -16018,7 +16018,7 @@

Syntax

Line Terminators

Like white space code points, line terminator code points are used to improve source text readability and to separate tokens (indivisible lexical units) from each other. However, unlike white space code points, line terminators have some influence over the behaviour of the syntactic grammar. In general, line terminators may occur between any two tokens, but there are a few places where they are forbidden by the syntactic grammar. Line terminators also affect the process of automatic semicolon insertion (). A line terminator cannot occur within any token except a |StringLiteral|, |Template|, or |TemplateSubstitutionTail|. <LF> and <CR> line terminators cannot occur within a |StringLiteral| token except as part of a |LineContinuation|.

-

A line terminator can occur within a |MultiLineComment| but cannot occur within a |SingleLineComment|.

+

A line terminator must occur within a |MultiLineComment| but cannot occur within a |SingleLineDelimitedComment| or a |SingleLineComment|.

Line terminators are included in the set of white space code points that are matched by the `\\s` class in regular expressions.

The ECMAScript line terminator code points are listed in .

@@ -16104,15 +16104,21 @@

Syntax

Comments

Comments can be either single or multi-line. Multi-line comments cannot nest.

Because a single-line comment can contain any Unicode code point except a |LineTerminator| code point, and because of the general rule that a token is always as long as possible, a single-line comment always consists of all code points from the `//` marker to the end of the line. However, the |LineTerminator| at the end of the line is not considered to be part of the single-line comment; it is recognized separately by the lexical grammar and becomes part of the stream of input elements for the syntactic grammar. This point is very important, because it implies that the presence or absence of single-line comments does not affect the process of automatic semicolon insertion (see ).

-

Comments behave like white space and are discarded except that, if a |MultiLineComment| contains a line terminator code point, then the entire comment is considered to be a |LineTerminator| for purposes of parsing by the syntactic grammar.

+

Comments behave like white space and are discarded except that a |MultiLineComment| or a |SingleLineHTMLCloseComment| is considered to be a |LineTerminator| for purposes of parsing by the syntactic grammar.

Syntax

Comment :: MultiLineComment SingleLineComment + SingleLineHTMLOpenComment + SingleLineHTMLCloseComment + SingleLineDelimitedComment MultiLineComment :: - `/*` MultiLineCommentChars? `*/` + `/*` FirstCommentLine? LineTerminator MultiLineCommentChars? `*/` HTMLCloseComment? + + FirstCommentLine :: + SingleLineDelimitedCommentChars MultiLineCommentChars :: MultiLineNotAsteriskChar MultiLineCommentChars? @@ -16131,13 +16137,59 @@

Syntax

SingleLineComment :: `//` SingleLineCommentChars? + SingleLineHTMLOpenComment :: + `<!--` SingleLineCommentChars? + + SingleLineHTMLCloseComment :: + LineTerminatorSequence HTMLCloseComment + + HTMLCloseComment :: + WhiteSpaceSequence? SingleLineDelimitedCommentSequence? `-->` SingleLineCommentChars? + + SingleLineDelimitedCommentSequence :: + SingleLineDelimitedComment WhiteSpaceSequence? SingleLineDelimitedCommentSequence? + + WhiteSpaceSequence :: + WhiteSpace WhiteSpaceSequence? + SingleLineCommentChars :: SingleLineCommentChar SingleLineCommentChars? SingleLineCommentChar :: SourceCharacter but not LineTerminator + + SingleLineDelimitedComment :: + `/*` SingleLineDelimitedCommentChars? `*/` + + SingleLineDelimitedCommentChars :: + SingleLineNotAsteriskChar SingleLineDelimitedCommentChars? + `*` SingleLinePostAsteriskCommentChars? + + SingleLineNotAsteriskChar :: + SourceCharacter but not one of `*` or LineTerminator + + SingleLinePostAsteriskCommentChars :: + SingleLineNotForwardSlashOrAsteriskChar SingleLineDelimitedCommentChars? + `*` SingleLinePostAsteriskCommentChars? + + SingleLineNotForwardSlashOrAsteriskChar :: + SourceCharacter but not one of `/` or `*` or LineTerminator
-

A number of productions in this section are given alternative definitions in section

+ + +

Static Semantics: Early Errors

+ + SingleLineHTMLOpenComment :: + `<!--` SingleLineCommentChars? + + HTMLCloseComment :: + WhiteSpaceSequence? SingleLineDelimitedCommentSequence? `-->` SingleLineCommentChars? + +
    +
  • It is a Syntax Error if a |Module| contains the source code matching this production.
  • +
+ In a |Script|, this syntax is allowed, but deprecated. +
@@ -28264,9 +28316,6 @@

Forbidden Extensions

  • When processing strict mode code, the extensions defined in , , , and must not be supported.
  • -
  • - When parsing for the |Module| goal symbol, the lexical grammar extensions defined in must not be supported. -
  • |ImportCall| must not be extended. @@ -46008,13 +46057,24 @@

    Lexical Grammar

    + + + + + + + + + + + @@ -46424,55 +46484,7 @@

    Additional Syntax

    HTML-like Comments

    -

    The syntax and semantics of is extended as follows except that this extension is not allowed when parsing source code using the goal symbol |Module|:

    -

    Syntax

    - - Comment :: - MultiLineComment - SingleLineComment - SingleLineHTMLOpenComment - SingleLineHTMLCloseComment - SingleLineDelimitedComment - - MultiLineComment :: - `/*` FirstCommentLine? LineTerminator MultiLineCommentChars? `*/` HTMLCloseComment? - - FirstCommentLine :: - SingleLineDelimitedCommentChars - - SingleLineHTMLOpenComment :: - `<!--` SingleLineCommentChars? - - SingleLineHTMLCloseComment :: - LineTerminatorSequence HTMLCloseComment - - SingleLineDelimitedComment :: - `/*` SingleLineDelimitedCommentChars? `*/` - - HTMLCloseComment :: - WhiteSpaceSequence? SingleLineDelimitedCommentSequence? `-->` SingleLineCommentChars? - - SingleLineDelimitedCommentChars :: - SingleLineNotAsteriskChar SingleLineDelimitedCommentChars? - `*` SingleLinePostAsteriskCommentChars? - - SingleLineNotAsteriskChar :: - SourceCharacter but not one of `*` or LineTerminator - - SingleLinePostAsteriskCommentChars :: - SingleLineNotForwardSlashOrAsteriskChar SingleLineDelimitedCommentChars? - `*` SingleLinePostAsteriskCommentChars? - - SingleLineNotForwardSlashOrAsteriskChar :: - SourceCharacter but not one of `/` or `*` or LineTerminator - - WhiteSpaceSequence :: - WhiteSpace WhiteSpaceSequence? - - SingleLineDelimitedCommentSequence :: - SingleLineDelimitedComment WhiteSpaceSequence? SingleLineDelimitedCommentSequence? - -

    Similar to a |MultiLineComment| that contains a line terminator code point, a |SingleLineHTMLCloseComment| is considered to be a |LineTerminator| for purposes of parsing by the syntactic grammar.

    +

    The HTML-like comment syntax used to be normative optional outside |Module|s.