From 6f1aa96edb03e0e3e3ff87f3dffe93e3d61947ad Mon Sep 17 00:00:00 2001 From: Daniel Ehrenberg Date: Wed, 7 Feb 2018 23:42:02 +0100 Subject: [PATCH] Normative: Cache templates per site, rather than by contents (#890) * Normative: Cache templates per site, rather than by contents The previous definition of template caching had a few issue: - (from @syg) Template strings may live forever due to putting them in a WeakMap - (from @ajklein) Because of this logic, it's rather difficult to implement any GC at all of template objects - (from @erights) The template string facility cannot be extended to expose anything about the site, as it's site-independent This patch makes template caching key off the Parse Node where the template occurs in source, rather than the List of Strings that the template evaluates into. These semantics seem to match SpiderMonkey's implementation of templates. V8, ChakraCore and JSC, on the other hand, implement the prior semantics. Resolves https://github.com/tc39/ecma262/issues/840 --- spec.html | 46 +++++++++++++++++++++++++--------------------- 1 file changed, 25 insertions(+), 21 deletions(-) diff --git a/spec.html b/spec.html index 173b3e022a..39db40867c 100644 --- a/spec.html +++ b/spec.html @@ -490,9 +490,12 @@

The Syntactic Grammar

The syntactic grammar for ECMAScript is given in clauses 11, 12, 13, 14, and 15. This grammar has ECMAScript tokens defined by the lexical grammar as its terminal symbols (). It defines a set of productions, starting from two alternative goal symbols |Script| and |Module|, that describe how sequences of tokens form syntactically correct independent components of ECMAScript programs.

When a stream of code points is to be parsed as an ECMAScript |Script| or |Module|, it is first converted to a stream of input elements by repeated application of the lexical grammar; this stream of input elements is then parsed by a single application of the syntactic grammar. The input stream is syntactically in error if the tokens in the stream of input elements cannot be parsed as a single instance of the goal nonterminal (|Script| or |Module|), with no tokens left over.

When a parse is successful, it constructs a parse tree, a rooted tree structure in which each node is a Parse Node. Each Parse Node is an instance of a symbol in the grammar; it represents a span of the source text that can be derived from that symbol. The root node of the parse tree, representing the whole of the source text, is an instance of the parse's goal symbol. When a Parse Node is an instance of a nonterminal, it is also an instance of some production that has that nonterminal as its left-hand side. Moreover, it has zero or more children, one for each symbol on the production's right-hand side: each child is a Parse Node that is an instance of the corresponding symbol.

+

New Parse Nodes are instantiated for each invocation of the parser and never reused between parses even of identical source text. Parse Nodes are considered the same Parse Node if and only if they represent the same span of source text, are instances of the same grammar symbol, and resulted from the same parser invocation. + Parsing the same String multiple times will lead to different Parse Nodes, e.g., as occurs in:

eval(str); eval(str);
. + Parse Nodes are specification artefacts, and implementations are not required to use an analogous data structure.

Productions of the syntactic grammar are distinguished by having just one colon “:” as punctuation.

The syntactic grammar as presented in clauses 12, 13, 14 and 15 is not a complete account of which token sequences are accepted as a correct ECMAScript |Script| or |Module|. Certain additional token sequences are also accepted, namely, those that would be described by the grammar if only semicolons were added to the sequence in certain places (such as before line terminator characters). Furthermore, certain token sequences that are described by the grammar are not considered acceptable if a line terminator character appears in certain “awkward” places.

-

In certain cases, in order to avoid ambiguities, the syntactic grammar uses generalized productions that permit token sequences that do not form a valid ECMAScript |Script| or |Module|. For example, this technique is used for object literals and object destructuring patterns. In such cases a more restrictive supplemental grammar is provided that further restricts the acceptable token sequences. Typically, an early error rule will then define an error condition if "_P_ cannot be reparsed as an _N_", where _P_ is a Parse Node (an instance of the generalized production) and _N_ is a nonterminal from the supplemental grammar. Here, the sequence of tokens originally matched by _P_ is parsed again using _N_ as the goal symbol. (If _N_ takes grammatical parameters, then they are set to the same values used when _P_ was originally parsed.) An error occurs if the sequence of tokens cannot be parsed as a single instance of _N_, with no tokens left over. Subsequently, algorithms access the result of the parse using a phrase of the form "the result of reparsing _P_ as an _N_". This will always be a Parse Node (an instance of _N_), since any parsing failure would have been detected by an early error rule.

+

In certain cases, in order to avoid ambiguities, the syntactic grammar uses generalized productions that permit token sequences that do not form a valid ECMAScript |Script| or |Module|. For example, this technique is used for object literals and object destructuring patterns. In such cases a more restrictive supplemental grammar is provided that further restricts the acceptable token sequences. Typically, an early error rule will then define an error condition if "_P_ is not covering an _N_", where _P_ is a Parse Node (an instance of the generalized production) and _N_ is a nonterminal from the supplemental grammar. Here, the sequence of tokens originally matched by _P_ is parsed again using _N_ as the goal symbol. (If _N_ takes grammatical parameters, then they are set to the same values used when _P_ was originally parsed.) An error occurs if the sequence of tokens cannot be parsed as a single instance of _N_, with no tokens left over. Subsequently, algorithms access the result of the parse using a phrase of the form "the _N_ that is covered by _P_". This will always be a Parse Node (an instance of _N_, unique for a given _P_), since any parsing failure would have been detected by an early error rule.

@@ -6121,10 +6124,11 @@

Realms

[[TemplateMap]] - A List of Record { [[Strings]]: List, [[Array]]: Object}. + A List of Record { [[Site]]: Parse Node, [[Array]]: Object}. - Template objects are canonicalized separately for each realm using its Realm Record's [[TemplateMap]]. Each [[Strings]] value is a List containing, in source text order, the raw String values of a |TemplateLiteral| that has been evaluated. The associated [[Array]] value is the corresponding template object that is passed to a tag function. + Template objects are canonicalized separately for each realm using its Realm Record's [[TemplateMap]]. Each [[Site]] value is a Parse Node that is a |TemplateLiteral|. The associated [[Array]] value is the corresponding template object that is passed to a tag function. + Once a Parse Node becomes unreachable, the corresponding [[Array]] is also unreachable, and it would be unobservable if an implementation removed the pair from the [[TemplateMap]] list. @@ -11364,7 +11368,7 @@

Semantics

Static Semantics: CoveredParenthesizedExpression

CoverParenthesizedExpressionAndArrowParameterList : `(` Expression `)` - 1. Return the result of reparsing |CoverParenthesizedExpressionAndArrowParameterList| as a |ParenthesizedExpression|. + 1. Return the |ParenthesizedExpression| that is covered by |CoverParenthesizedExpressionAndArrowParameterList|. @@ -12079,7 +12083,7 @@

Runtime Semantics: GetTemplateObject ( _templateLiteral_ )

1. Let _realm_ be the current Realm Record. 1. Let _templateRegistry_ be _realm_.[[TemplateMap]]. 1. For each element _e_ of _templateRegistry_, do - 1. If _e_.[[Strings]] and _rawStrings_ contain the same values in the same order, then + 1. If _e_.[[Site]] is the same Parse Node as _templateLiteral_, then 1. Return _e_.[[Array]]. 1. Let _cookedStrings_ be TemplateStrings of _templateLiteral_ with argument *false*. 1. Let _count_ be the number of elements in the List _cookedStrings_. @@ -12097,7 +12101,7 @@

Runtime Semantics: GetTemplateObject ( _templateLiteral_ )

1. Perform SetIntegrityLevel(_rawObj_, `"frozen"`). 1. Call _template_.[[DefineOwnProperty]](`"raw"`, PropertyDescriptor{[[Value]]: _rawObj_, [[Writable]]: *false*, [[Enumerable]]: *false*, [[Configurable]]: *false*}). 1. Perform SetIntegrityLevel(_template_, `"frozen"`). - 1. Append the Record{[[Strings]]: _rawStrings_, [[Array]]: _template_} to _templateRegistry_. + 1. Append the Record{[[Site]]: _templateLiteral_, [[Array]]: _template_} to _templateRegistry_. 1. Return _template_. @@ -12208,7 +12212,7 @@

Static Semantics: Early Errors

PrimaryExpression : CoverParenthesizedExpressionAndArrowParameterList