Editorial: Improve internal and external consistency of RegExp pattern semantics #2112

gibson042 · 2020-07-24T22:14:39Z

No description provided.

bakkot

This is generally great, thank you. Had a few small comments.

spec.html

michaelficarra

LGTM other than the 1-based indexing commit.

bakkot · 2020-08-20T22:34:38Z

@gibson042: are you OK with dropping e82e75b and 1f968cd from this PR and continuing discussion of those bits elsewhere? I know your preference is to land those as well before continuing the discussion, but @michaelficarra disagrees; the rest of this is uncontroversial and I don't want to block it on that point.

The other commits will also need rebasing into a collection of commits which can individually land; I'm happy to do that myself if you'd like.

gibson042 · 2020-08-21T00:10:09Z

@bakkot Done; I extracted the controversial commits into #2150 so they aren't lost.

bakkot · 2020-08-21T01:04:14Z

Thanks! I rebased the review commits into the applicable commit, so that these commits can all land rather than being squashed.

syg

Generally lgtm, thanks.

As a follow-on I'd like to see us remove "mathematical sets", which AFAICT is only used for sets of characters in the RegExp machinery, and use the already defined Sets.

syg · 2020-08-24T23:29:25Z

spec.html

@@ -31953,14 +31953,17 @@ <h1>Notation</h1>
          <li>
            _Unicode_ is *true* if the RegExp object's [[OriginalFlags]] internal slot contains *"u"* and otherwise is *false*.
          </li>
+          <li oldids="sec-runtime-semantics-wordcharacters-abstract-operation">
+            _WordCharacters_ is the mathematical set that is the union of all sixty-three characters in *"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789_"* (letters, numbers, and U+005F (LOW LINE) in the Unicode Basic Latin block) and all characters _c_ for which _c_ is not in that set but Canonicalize(_c_) is. _WordCharacters_ cannot contain more than sixty-three characters unless _Unicode_ and _IgnoreCase_ are both *true*.


For WordCharacters and CharSet, we could use the specification Set type, though the prose would need to be updated to say it is used in more places than just the memory model.

spec.html

…antics (tc39#2112)

…rthand syntax (tc39#2112)

…overwrite (tc39#2112)

…her and callers (tc39#2112)

…se (tc39#2112)

…antics (tc39#2112)

…rthand syntax (tc39#2112)

…overwrite (tc39#2112)

…her and callers (tc39#2112)

…se (tc39#2112)

bakkot reviewed Aug 9, 2020

View reviewed changes

gibson042 force-pushed the 2020-07-CharSet branch from 6c45aff to 426eb4f Compare August 12, 2020 23:26

bakkot added the editor call to be discussed in the next editor call label Aug 13, 2020

bakkot mentioned this pull request Aug 17, 2020

Editorial: replace "and let"/"and return" with better phrasing #2137

Merged

michaelficarra reviewed Aug 19, 2020

View reviewed changes

spec.html Outdated Show resolved Hide resolved

michaelficarra approved these changes Aug 19, 2020

View reviewed changes

gibson042 mentioned this pull request Aug 19, 2020

Editorial: normalisation around List construction #2142

Merged

ljharb added the editorial change label Aug 19, 2020

ljharb requested review from bakkot and ljharb August 20, 2020 17:15

ljharb removed the editor call to be discussed in the next editor call label Aug 20, 2020

Kurniawan1983 approved these changes Aug 20, 2020

View reviewed changes

gibson042 force-pushed the 2020-07-CharSet branch from 1f968cd to 4c98049 Compare August 21, 2020 00:03

gibson042 mentioned this pull request Aug 21, 2020

Editorial: Add notes about the 1-based indexing of regular expression _captures_ #2150

Open

bakkot approved these changes Aug 21, 2020

View reviewed changes

bakkot force-pushed the 2020-07-CharSet branch from 4c98049 to fa07512 Compare August 21, 2020 01:03

michaelficarra requested review from syg and a team August 21, 2020 01:19

syg approved these changes Aug 24, 2020

View reviewed changes

ljharb self-assigned this Aug 25, 2020

ljharb pushed a commit to gibson042/ecma262 that referenced this pull request Aug 26, 2020

Editorial: Consistently use CharSet in Regular Expression Pattern Sem…

15098b4

…antics (tc39#2112)

ljharb pushed a commit to gibson042/ecma262 that referenced this pull request Aug 26, 2020

Editorial: Simplify CharacterSetMatcher (tc39#2112)

a60cfcf

ljharb pushed a commit to gibson042/ecma262 that referenced this pull request Aug 26, 2020

Editorial: Update regular expression algorithms to use prevailing sho…

80dc7ed

…rthand syntax (tc39#2112)

ljharb pushed a commit to gibson042/ecma262 that referenced this pull request Aug 26, 2020

Editorial: Convert WordCharacters to a common alias (tc39#2112)

9120c71

ljharb pushed a commit to gibson042/ecma262 that referenced this pull request Aug 26, 2020

Editorial: Introduce a temporary spec variable to avoid an immediate …

81e7faa

…overwrite (tc39#2112)

ljharb pushed a commit to gibson042/ecma262 that referenced this pull request Aug 26, 2020

Editorial: Avoid confusion with 0-based indexing in BackreferenceMatc…

424fed3

…her and callers (tc39#2112)

ljharb pushed a commit to gibson042/ecma262 that referenced this pull request Aug 26, 2020

Editorial: Reduce polarity flips in Canonicalize (tc39#2112)

af6c8f5

ljharb pushed a commit to gibson042/ecma262 that referenced this pull request Aug 26, 2020

Editorial: Inline the relevant portions of String.prototype.toUpperCa…

9067bf5

…se (tc39#2112)

ljharb force-pushed the 2020-07-CharSet branch from fa07512 to 9067bf5 Compare August 26, 2020 22:24

gibson042 added 8 commits August 26, 2020 15:24

Editorial: Consistently use CharSet in Regular Expression Pattern Sem…

b7c0d9d

…antics (tc39#2112)

Editorial: Simplify CharacterSetMatcher (tc39#2112)

124072a

Editorial: Update regular expression algorithms to use prevailing sho…

68050e7

…rthand syntax (tc39#2112)

Editorial: Convert WordCharacters to a common alias (tc39#2112)

b5385b0

Editorial: Introduce a temporary spec variable to avoid an immediate …

686240c

…overwrite (tc39#2112)

Editorial: Avoid confusion with 0-based indexing in BackreferenceMatc…

da0a3f6

…her and callers (tc39#2112)

Editorial: Reduce polarity flips in Canonicalize (tc39#2112)

2c66f83

Editorial: Inline the relevant portions of String.prototype.toUpperCa…

420e82e

…se (tc39#2112)

ljharb force-pushed the 2020-07-CharSet branch from 9067bf5 to 420e82e Compare August 26, 2020 22:25

ljharb merged commit 420e82e into tc39:master Aug 26, 2020

jmdyck mentioned this pull request Apr 14, 2022

Editorial: Eliminate RegExp's "global aliases" #2716

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Editorial: Improve internal and external consistency of RegExp pattern semantics #2112

Editorial: Improve internal and external consistency of RegExp pattern semantics #2112

gibson042 commented Jul 24, 2020

bakkot left a comment

michaelficarra left a comment

bakkot commented Aug 20, 2020

gibson042 commented Aug 21, 2020

bakkot commented Aug 21, 2020

syg left a comment

syg Aug 24, 2020

Editorial: Improve internal and external consistency of RegExp pattern semantics #2112

Editorial: Improve internal and external consistency of RegExp pattern semantics #2112

Conversation

gibson042 commented Jul 24, 2020

bakkot left a comment

Choose a reason for hiding this comment

michaelficarra left a comment

Choose a reason for hiding this comment

bakkot commented Aug 20, 2020

gibson042 commented Aug 21, 2020

bakkot commented Aug 21, 2020

syg left a comment

Choose a reason for hiding this comment

syg Aug 24, 2020

Choose a reason for hiding this comment