-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Editorial: Improve internal and external consistency of RegExp pattern semantics #2112
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is generally great, thank you. Had a few small comments.
6c45aff
to
426eb4f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM other than the 1-based indexing commit.
@gibson042: are you OK with dropping e82e75b and 1f968cd from this PR and continuing discussion of those bits elsewhere? I know your preference is to land those as well before continuing the discussion, but @michaelficarra disagrees; the rest of this is uncontroversial and I don't want to block it on that point. The other commits will also need rebasing into a collection of commits which can individually land; I'm happy to do that myself if you'd like. |
1f968cd
to
4c98049
Compare
4c98049
to
fa07512
Compare
Thanks! I rebased the review commits into the applicable commit, so that these commits can all land rather than being squashed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally lgtm, thanks.
As a follow-on I'd like to see us remove "mathematical sets", which AFAICT is only used for sets of characters in the RegExp machinery, and use the already defined Sets.
@@ -31953,14 +31953,17 @@ <h1>Notation</h1> | |||
<li> | |||
_Unicode_ is *true* if the RegExp object's [[OriginalFlags]] internal slot contains *"u"* and otherwise is *false*. | |||
</li> | |||
<li oldids="sec-runtime-semantics-wordcharacters-abstract-operation"> | |||
_WordCharacters_ is the mathematical set that is the union of all sixty-three characters in *"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789_"* (letters, numbers, and U+005F (LOW LINE) in the Unicode Basic Latin block) and all characters _c_ for which _c_ is not in that set but Canonicalize(_c_) is. _WordCharacters_ cannot contain more than sixty-three characters unless _Unicode_ and _IgnoreCase_ are both *true*. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For WordCharacters and CharSet, we could use the specification Set type, though the prose would need to be updated to say it is used in more places than just the memory model.
fa07512
to
9067bf5
Compare
9067bf5
to
420e82e
Compare
No description provided.