-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Normative] Add RegExp named capture groups #1027
Conversation
7dd277f
to
9af3ff8
Compare
1. Let the enclosed substring be _groupName_. | ||
1. Let _capture_ be ? Get(_namedCaptures_, _groupName_). | ||
1. If _capture_ is *undefined*, replace the text through `>` with the empty string. | ||
1. Otherwise, replace the text through this following `>` with ? ToString(_capture_). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should probably be worded a bit differently to ensure that ToString
is only called once on capture
- otherwise this could be interpreted as "call ToString for each replacement", which would be observable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about:
Otherwise, let _replacement_ be ? ToString(_capture_) and replace the text through this following `>` with _replacement_.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems completely unambiguous, LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really understand how this change relates to what @ljharb is getting at. Get and ToString are called multiple times, before and after this editorial change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it does penalize the common case; if done eagerly, we'd need to iterate all keys on the groups
object, while the replacement string may only reference a subset of them.
And since I suppose we wouldn't want to modify the given groups
object, we'd have to allocate a new object somewhere to store the ToString
normalized group values, even in the lazy case.
I didn't get the feeling that anyone was saying "we missed this stage 2 concern".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment about it “being a little late” seemed like that.
At any rate, allocating a new unobservanle object somewhere to store a string of Type(replacement) is not String, when in the common case it would be, seems like it would be both a cheap and a rare operation. However my concern is not for performance, which can always be improved upon later; but rather for observability, which can not be reduced later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there was a lack of review. These semantics were thought out, put out for review for 8 months ago, and now had a thorough review by two implementers. I was aware when writing the spec text that you'd get multiple observable reads. As @schuay raises, it's not clear what the alternative would be, and the downside of the current approach does not seem so high.
RegExps are extremely observable in their behavior in a way which I don't see much of a use case for. I raised this as a concern in late 2015, proposing a more minimal RegExp subclassing mechanism, which would've been much easier to implement and with fewer performance cliffs. The proposal was rejected by the committee. We have a specification with very high observability due to that decision; it will impact many new RegExp features going forward no matter whether we decide to do one or two Get/ToString calls. If we did caching, it would also be observable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Misunderstanding noted, and reverted to the original spec text.
9af3ff8
to
60ec63e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, though I'd appreciate other reviewers also took a look.
60ec63e
to
6fe708f
Compare
6fe708f
to
a581524
Compare
I think it'd be nice to resolve tc39/proposal-regexp-named-groups#34 before merging this PR, because both implementations (JSC and V8) don't implement this part correctly according to the current spec proposal (both implementations print |
a581524
to
8afa992
Compare
tc39/proposal-regexp-named-groups#34 has been resolved in tc39/proposal-regexp-named-groups#40. I’ve updated this patch accordingly. |
Test262 tests for the recent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but i'm not familiar enough with grammar stuff to review that part.
spec.html
Outdated
1. If _s_ is *undefined*, return _c_(_x_). | ||
1. Let _e_ be _x_'s _endIndex_. | ||
1. Let _len_ be the number of elements in _s_. | ||
1. Let _f_ be _e_+_len_. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spaces around the +
for clarity?
spec.html
Outdated
1. Let _e_ be _x_'s _endIndex_. | ||
1. Let _len_ be the number of elements in _s_. | ||
1. Let _f_ be _e_+_len_. | ||
1. If _f_>_InputLength_, return ~failure~. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also spaces around the >
?
spec.html
Outdated
1. Let _len_ be the number of elements in _s_. | ||
1. Let _f_ be _e_+_len_. | ||
1. If _f_>_InputLength_, return ~failure~. | ||
1. If there exists an integer _i_ between 0 (inclusive) and _len_ (exclusive) such that Canonicalize(_s_[_i_]) is not the same character value as Canonicalize(_Input_[_e_+_i_]), return ~failure~. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and the e + i
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, but keep in mind that this is now inconsistent with other similar parts of the RegExp section. There is an open issue to fix this throughout the spec: #925
8afa992
to
9bfcabe
Compare
spec.html
Outdated
@@ -28558,6 +28558,8 @@ <h1>Runtime Semantics: GetSubstitution( _matched_, _str_, _position_, _captures_ | |||
1. Assert: Type(_replacement_) is String. | |||
1. Let _tailPos_ be _position_ + _matchLength_. | |||
1. Let _m_ be the number of elements in _captures_. | |||
1. If _namedCaptures_ is not *undefined*, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: block form of If ends in a ", then":
spec.html
Outdated
<emu-alg> | ||
1. If _namedCaptures_ is *undefined*, the replacement text is the String `"$<"`. | ||
1. Otherwise, | ||
1. Scan until the next `>`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably could use the new single unicode codepoint convention here (U+XXXX CHAR NAME)
I think this is ready to merge. I only found a few nits. The grammar all looks good, editorial conventions are good, merges and builds cleanly. Nice work!! I had a couple minor nits but I'll pull by EOD and just fix them in a subsequent commit unless you want to get to it before me :) |
@bterlson does that mean that, per https://github.com/rwaldron/tc39-notes/blob/622ea4d93b4be44eb834d5e90bfebe0403f4216a/es8/2017-11/nov-28.md#conclusionresolution-10, this is now at stage 4? |
9bfcabe
to
2fedbb1
Compare
2fedbb1
to
898d7ae
Compare
emoji kazoo react |
Specifically, add [N] parameter to CharacterClass ClassRanges NonemptyClassRanges NonemptyClassRangesNoDash ClassAtom These were implied when commit 95ec0c6 (of PR tc39#1027)... - added [?N] to RHS occurrences of CharacterClass without explicitly adding [N] to the LHS occurrence CharacterClass; and - added [N] to the LHS occurrence of ClassAtomNoDash (in Annex B) without adding [?N] to any RHS occurrence. This commit propagates [N] across that gap. (See issue tc39#1081.)
Specifically, add [N] parameter to CharacterClass ClassRanges NonemptyClassRanges NonemptyClassRangesNoDash ClassAtom These were implied when commit 95ec0c6 (of PR tc39#1027)... - added [?N] to RHS occurrences of CharacterClass without explicitly adding [N] to the LHS occurrence CharacterClass; and - added [N] to the LHS occurrence of ClassAtomNoDash (in Annex B) without adding [?N] to any RHS occurrence. This commit propagates [N] across that gap. (See issue tc39#1081.)
Specifically, add [N] parameter to CharacterClass ClassRanges NonemptyClassRanges NonemptyClassRangesNoDash ClassAtom These were implied when commit 95ec0c6 (of PR tc39#1027)... - added [?N] to RHS occurrences of CharacterClass without explicitly adding [N] to the LHS occurrence CharacterClass; and - added [N] to the LHS occurrence of ClassAtomNoDash (in Annex B) without adding [?N] to any RHS occurrence. This commit propagates [N] across that gap. (See issue tc39#1081.)
Specifically, add [N] parameter to CharacterClass ClassRanges NonemptyClassRanges NonemptyClassRangesNoDash ClassAtom These were implied when commit 95ec0c6 (of PR tc39#1027)... - added [?N] to RHS occurrences of CharacterClass without explicitly adding [N] to the LHS occurrence CharacterClass; and - added [N] to the LHS occurrence of ClassAtomNoDash (in Annex B) without adding [?N] to any RHS occurrence. This commit propagates [N] across that gap. (See issue tc39#1081.)
Specifically, add [N] parameter to CharacterClass ClassRanges NonemptyClassRanges NonemptyClassRangesNoDash ClassAtom These were implied when commit 95ec0c6 (of PR tc39#1027)... - added [?N] to RHS occurrences of CharacterClass without explicitly adding [N] to the LHS occurrence CharacterClass; and - added [N] to the LHS occurrence of ClassAtomNoDash (in Annex B) without adding [?N] to any RHS occurrence. This commit propagates [N] across that gap. (See issue tc39#1081.)
Specifically, add [N] parameter to CharacterClass ClassRanges NonemptyClassRanges NonemptyClassRangesNoDash ClassAtom These were implied when commit 95ec0c6 (of PR tc39#1027)... - added [?N] to RHS occurrences of CharacterClass without explicitly adding [N] to the LHS occurrence CharacterClass; and - added [N] to the LHS occurrence of ClassAtomNoDash (in Annex B) without adding [?N] to any RHS occurrence. This commit propagates [N] across that gap. (See issue tc39#1081.)
Specifically, add [N] parameter to CharacterClass ClassRanges NonemptyClassRanges NonemptyClassRangesNoDash ClassAtom These were implied when commit 95ec0c6 (of PR tc39#1027)... - added [?N] to RHS occurrences of CharacterClass without explicitly adding [N] to the LHS occurrence CharacterClass; and - added [N] to the LHS occurrence of ClassAtomNoDash (in Annex B) without adding [?N] to any RHS occurrence. This commit propagates [N] across that gap. (See issue tc39#1081.)
Specifically, add [N] parameter to CharacterClass ClassRanges NonemptyClassRanges NonemptyClassRangesNoDash ClassAtom These were implied when commit 95ec0c6 (of PR tc39#1027)... - added [?N] to RHS occurrences of CharacterClass without explicitly adding [N] to the LHS occurrence CharacterClass; and - added [N] to the LHS occurrence of ClassAtomNoDash (in Annex B) without adding [?N] to any RHS occurrence. This commit propagates [N] across that gap. (See issue tc39#1081.)
Specifically, add [N] parameter to CharacterClass ClassRanges NonemptyClassRanges NonemptyClassRangesNoDash ClassAtom These were implied when commit 95ec0c6 (of PR tc39#1027)... - added [?N] to RHS occurrences of CharacterClass without explicitly adding [N] to the LHS occurrence CharacterClass; and - added [N] to the LHS occurrence of ClassAtomNoDash (in Annex B) without adding [?N] to any RHS occurrence. This commit propagates [N] across that gap. (See issue tc39#1081.)
Specifically, add [N] parameter to CharacterClass ClassRanges NonemptyClassRanges NonemptyClassRangesNoDash ClassAtom These were implied when commit 95ec0c6 (of PR tc39#1027)... - added [?N] to RHS occurrences of CharacterClass without explicitly adding [N] to the LHS occurrence CharacterClass; and - added [N] to the LHS occurrence of ClassAtomNoDash (in Annex B) without adding [?N] to any RHS occurrence. This commit propagates [N] across that gap. (See issue tc39#1081.)
Specifically, add [N] parameter to CharacterClass ClassRanges NonemptyClassRanges NonemptyClassRangesNoDash ClassAtom These were implied when commit 95ec0c6 (of PR tc39#1027)... - added [?N] to RHS occurrences of CharacterClass without explicitly adding [N] to the LHS occurrence CharacterClass; and - added [N] to the LHS occurrence of ClassAtomNoDash (in Annex B) without adding [?N] to any RHS occurrence. This commit propagates [N] across that gap. (See issue tc39#1081.)
Specifically, add [N] parameter to CharacterClass ClassRanges NonemptyClassRanges NonemptyClassRangesNoDash ClassAtom These were implied when commit 95ec0c6 (of PR tc39#1027)... - added [?N] to RHS occurrences of CharacterClass without explicitly adding [N] to the LHS occurrence CharacterClass; and - added [N] to the LHS occurrence of ClassAtomNoDash (in Annex B) without adding [?N] to any RHS occurrence. This commit propagates [N] across that gap. (See issue tc39#1081.)
Specifically, add [N] parameter to CharacterClass ClassRanges NonemptyClassRanges NonemptyClassRangesNoDash ClassAtom These were implied when commit 95ec0c6 (of PR tc39#1027)... - added [?N] to RHS occurrences of CharacterClass without explicitly adding [N] to the LHS occurrence CharacterClass; and - added [N] to the LHS occurrence of ClassAtomNoDash (in Annex B) without adding [?N] to any RHS occurrence. This commit propagates [N] across that gap. (See issue tc39#1081.)
Specifically, add [N] parameter to CharacterClass ClassRanges NonemptyClassRanges NonemptyClassRangesNoDash ClassAtom These were implied when commit 95ec0c6 (of PR tc39#1027)... - added [?N] to RHS occurrences of CharacterClass without explicitly adding [N] to the LHS occurrence CharacterClass; and - added [N] to the LHS occurrence of ClassAtomNoDash (in Annex B) without adding [?N] to any RHS occurrence. This commit propagates [N] across that gap. (See issue tc39#1081.)
Specifically, add [N] parameter to CharacterClass ClassRanges NonemptyClassRanges NonemptyClassRangesNoDash ClassAtom These were implied when commit 95ec0c6 (of PR tc39#1027)... - added [?N] to RHS occurrences of CharacterClass without explicitly adding [N] to the LHS occurrence CharacterClass; and - added [N] to the LHS occurrence of ClassAtomNoDash (in Annex B) without adding [?N] to any RHS occurrence. This commit propagates [N] across that gap. (See issue tc39#1081.)
Specifically, add [N] parameter to CharacterClass ClassRanges NonemptyClassRanges NonemptyClassRangesNoDash ClassAtom These were implied when commit 95ec0c6 (of PR tc39#1027)... - added [?N] to RHS occurrences of CharacterClass without explicitly adding [N] to the LHS occurrence CharacterClass; and - added [N] to the LHS occurrence of ClassAtomNoDash (in Annex B) without adding [?N] to any RHS occurrence. This commit propagates [N] across that gap. (See issue tc39#1081.)
@littledan asked me to prepare the PR for his proposal adding named capture groups to ECMAScript regular expressions.
Proposal repo: https://github.com/tc39/proposal-regexp-named-groups
@littledan, I’ve added you as the commit author.