-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Editorial: Improve specification of [RegExp]IdentifierNames #2392
Conversation
3d0c24c
to
7a79833
Compare
458dd79
to
b9fcd20
Compare
(force-pushed to rebase to master + resolve merge conflicts from #2411) |
b9fcd20
to
0c5694a
Compare
I think you should combine the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM otherwise. This is a big improvement IMO.
Yeah, there are some parallels between the two, so it's tempting to look for ways to combine them. But I'm doubtful it would be a net benefit. E.g., consider [RegExp]IdentifierCodePoints:
so the two rules can't be combined; the best you can do is put them next to each other. With [RegExp]IdentifierCodePoint, it's slightly better: a couple of the algorithms are the same, so you could join the I'm not saying there's no benefit to moving two similar rules close together -- if, in future, you need to change one, there's a fair chance you need to change the other as well, and you're more likely to recognize that if they're right next to each other. On the other hand, in this case, it seems like a low probability they'll need to change. Moreover, bringing them together means that at least one of them is taken from its 'home', which is a tradeoff. (The above assumes you're not talking about merging productions (changing the grammar). That would allow more merging of SDOs, but it would have problems of its own.)
Hm, I don't follow. Can you give an example? |
@jmdyck Looks like I got If there's no benefit from the chain rule applying, I agree with your point about moving the relevant definitions of the SDO further from where it belongs. At that point, the minor benefit of reducing or grouping together the mostly-duplicated algorithms isn't enough to warrant the change. |
…39#2392) Extract `IdentifierStartChar` from `IdentifierStart` and `RegExpIdentifierStart`. Extract `IdentifierPartChar` from `IdentifierPart` and `RegExpIdentifierPart`. This has 3 benefits: - We eliminate some repetition between the productions for Identifiers and RegExpIdentifiers. - We can simplify 4 Early Error rules involving escape sequences, because the constraint can now be expressed in terms of a single nonterminal, rather than a nonterminal plus some terminals. - We can eliminate the Early Error rule for `RegularExpressionFlags` by instead expressing its constraint in the grammar: in the production for `RegularExpressionFlags`, replace `IdentifierPart` with `IdentifierPartChar`. (As a consequence of the last point, this commit undefines the following id: sec-literals-regular-expression-literals-static-semantics-early-errors There didn't seem to be a sensible place to relocate it as an oldid.)
This commit introduces SDOs `IdentifierCodePoints` and `IdentifierCodePoint`. - This allows `StringValue` of _IdentifierName_ to be specified more precisely. - It also simplifies two Early Error rules (involving _UnicodeEscapeSequence_), since they can now be expressed as constraints on a code point, rather than having to be translated into the space of String values. ---- Similarly, this commit introduces SDOs `RegExpIdentifierCodePoints` and `RegExpIdentifierCodePoint`. - This allows `CapturingGroupName` of _RegExpIdentifierName_ to be specified more precisely. - It also simplifies two Early Error rules (involving surrogate pairs). (Note that the current algorithm for `CapturingGroupName` only 'normalizes' escape sequences, whereas this PR's algorithm also normalizes surrogate pairs. However, since the normalized text is immediately passed to `CodePointsToString`, the result should be the same. Given the Early Error rules for surrogate pairs, normalizing them made sense to me.)
... from 12.6 Names and Keywords down to 12.6.1 Identifier Names. I think this makes it clearer that this prose is mostly saying the same thing as the associated Early Error rules and SDOs.
0c5694a
to
1901514
Compare
Each of the two tests: early/1447683fba196181.js early/d52f769ab39372c7.js consists of a RegularExpressionLiteral in which the RegularExpressionFlags contains a UnicodeEscapeSequence. Formerly, this was disallowed via an Early Error, but with the merge of tc39/ecma262#2392, it is now disallowed via the grammar. Thus, these tests should move from 'early' to 'fail'.
…39#2392) Extract `IdentifierStartChar` from `IdentifierStart` and `RegExpIdentifierStart`. Extract `IdentifierPartChar` from `IdentifierPart` and `RegExpIdentifierPart`. This has 3 benefits: - We eliminate some repetition between the productions for Identifiers and RegExpIdentifiers. - We can simplify 4 Early Error rules involving escape sequences, because the constraint can now be expressed in terms of a single nonterminal, rather than a nonterminal plus some terminals. - We can eliminate the Early Error rule for `RegularExpressionFlags` by instead expressing its constraint in the grammar: in the production for `RegularExpressionFlags`, replace `IdentifierPart` with `IdentifierPartChar`. (As a consequence of the last point, this commit undefines the following id: sec-literals-regular-expression-literals-static-semantics-early-errors There didn't seem to be a sensible place to relocate it as an oldid.)
This commit introduces SDOs `IdentifierCodePoints` and `IdentifierCodePoint`. - This allows `StringValue` of _IdentifierName_ to be specified more precisely. - It also simplifies two Early Error rules (involving _UnicodeEscapeSequence_), since they can now be expressed as constraints on a code point, rather than having to be translated into the space of String values. ---- Similarly, this commit introduces SDOs `RegExpIdentifierCodePoints` and `RegExpIdentifierCodePoint`. - This allows `CapturingGroupName` of _RegExpIdentifierName_ to be specified more precisely. - It also simplifies two Early Error rules (involving surrogate pairs). (Note that the current algorithm for `CapturingGroupName` only 'normalizes' escape sequences, whereas this PR's algorithm also normalizes surrogate pairs. However, since the normalized text is immediately passed to `CodePointsToString`, the result should be the same. Given the Early Error rules for surrogate pairs, normalizing them made sense to me.)
... from 12.6 Names and Keywords down to 12.6.1 Identifier Names. I think this makes it clearer that this prose is mostly saying the same thing as the associated Early Error rules and SDOs.
Currently, the algorithm for
StringValue
of IdentifierName has the step:(And the algorithm for
CapturingGroupName
of RegExpIdentifierName has something quite similar.)In #1552 (comment), @michaelficarra said of this step:
This PR does so.
The first commit refactors the grammar slightly, and the second introduces 4 SDOs. The first makes the second easier, but it also has benefits of its own. I think they're easier to understand as separate commits.
The third commit just moves some pre-existing prose. The connection between the prose and the Early Error rules + SDOs could maybe be improved, but that wasn't my focus.
Note that the second commit useslist-concatenation
in two spots, so if PR #2382 isn't accepted, I'll have to do something else there.