From eadf6293254a623815280cc91e9def77ee04723f Mon Sep 17 00:00:00 2001 From: Richard Gibson Date: Tue, 4 Aug 2020 15:59:24 -0400 Subject: [PATCH 1/2] Editorial: Add notes about the 1-based indexing of regular expression _captures_ --- spec.html | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/spec.html b/spec.html index 0516da2d75..57c8cb7713 100644 --- a/spec.html +++ b/spec.html @@ -31960,7 +31960,7 @@

Notation

A CharSet is a mathematical set of characters, either code units or code points depending up the state of the _Unicode_ flag. “All characters” means either all code unit values or all code point values also depending upon the state of _Unicode_.
  • - A State is an ordered pair (_endIndex_, _captures_) where _endIndex_ is an integer and _captures_ is a List of _NcapturingParens_ values. States are used to represent partial match states in the regular expression matching algorithms. The _endIndex_ is one plus the index of the last input character matched so far by the pattern, while _captures_ holds the results of capturing parentheses. The _n_th element of _captures_ is either a List that represents the value obtained by the _n_th set of capturing parentheses or *undefined* if the _n_th set of capturing parentheses hasn't been reached yet. Due to backtracking, many States may be in use at any time during the matching process. + A State is an ordered pair (_endIndex_, _captures_) where _endIndex_ is an integer and _captures_ is a List of length _NcapturingParens_, indexed 1 through _NcapturingParens_. States are used to represent partial match states in the regular expression matching algorithms. The _endIndex_ is one plus the index of the last input character matched so far by the pattern, while _captures_ holds the results of capturing parentheses. The _n_th element of _captures_ is either a List of characters that represents the value obtained by the _n_th set of capturing parentheses or *undefined* if the _n_th set of capturing parentheses hasn't been reached yet. Due to backtracking, many States may be in use at any time during the matching process.
  • A MatchResult is either a State or the special token ~failure~ that indicates that the match failed. @@ -32105,6 +32105,7 @@

    Runtime Semantics: RepeatMatcher ( _m_, _min_, _max_, _greedy_, _x_, _c_, _p 1. If _max_ is ∞, let _max2_ be ∞; otherwise let _max2_ be _max_ - 1. 1. Call RepeatMatcher(_m_, _min2_, _max2_, _greedy_, _y_, _c_, _parenIndex_, _parenCount_) and return its result. 1. Let _cap_ be a copy of _x_'s _captures_ List. + 1. NOTE: _cap_ is indexed from 1 (). 1. [id="step-repeatmatcher-clear-captures"] For each integer _k_ that satisfies _parenIndex_ < _k_ and _k_ ≤ _parenIndex_ + _parenCount_, set _cap_[_k_] to *undefined*. 1. Let _e_ be _x_'s _endIndex_. 1. Let _xr_ be the State (_e_, _cap_). @@ -32602,6 +32603,7 @@

    Atom

    1. Let _d_ be a new Continuation with parameters (_y_) that captures _x_, _c_, _direction_, and _parenIndex_ and performs the following steps when called: 1. Assert: _y_ is a State. 1. Let _cap_ be a copy of _y_'s _captures_ List. + 1. NOTE: _cap_ is indexed from 1 (). 1. Let _xe_ be _x_'s _endIndex_. 1. Let _ye_ be _y_'s _endIndex_. 1. If _direction_ is equal to +1, then @@ -32767,6 +32769,7 @@

    Runtime Semantics: BackreferenceMatcher ( _n_, _direction_ )

    1. Assert: _x_ is a State. 1. Assert: _c_ is a Continuation. 1. Let _cap_ be _x_'s _captures_ List. + 1. NOTE: _cap_ is indexed from 1 (). 1. Let _s_ be _cap_[_n_]. 1. If _s_ is *undefined*, return _c_(_x_). 1. Let _e_ be _x_'s _endIndex_. @@ -33234,7 +33237,9 @@

    Runtime Semantics: RegExpBuiltinExec ( _R_, _S_ )

    1. Set _e_ to _eUTF_. 1. If _global_ is *true* or _sticky_ is *true*, then 1. Perform ? Set(_R_, *"lastIndex"*, _e_, *true*). - 1. Let _n_ be the number of elements in _r_'s _captures_ List. (This is the same value as 's _NcapturingParens_.) + 1. Let _cap_ be _r_'s _captures_ List. + 1. NOTE: _cap_ is indexed from 1 (). + 1. Let _n_ be the number of elements in _cap_. (This is the same value as 's _NcapturingParens_.) 1. Assert: _n_ < 232 - 1. 1. Let _A_ be ! ArrayCreate(_n_ + 1). 1. Assert: The value of _A_'s *"length"* property is _n_ + 1. @@ -33248,7 +33253,7 @@

    Runtime Semantics: RegExpBuiltinExec ( _R_, _S_ )

    1. Let _groups_ be *undefined*. 1. Perform ! CreateDataPropertyOrThrow(_A_, *"groups"*, _groups_). 1. For each integer _i_ such that _i_ > 0 and _i_ ≤ _n_, do - 1. Let _captureI_ be _i_th element of _r_'s _captures_ List. + 1. Let _captureI_ be _cap_[_i_]. 1. If _captureI_ is *undefined*, let _capturedValue_ be *undefined*. 1. Else if _fullUnicode_ is *true*, then 1. Assert: _captureI_ is a List of code points. From 5e494e3a9477a737e88e3f960b053143113e067b Mon Sep 17 00:00:00 2001 From: Richard Gibson Date: Wed, 19 Aug 2020 21:53:06 -0400 Subject: [PATCH 2/2] Editorial: Replace "nth element" with "element at index n" --- spec.html | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/spec.html b/spec.html index 57c8cb7713..e5d252f6e0 100644 --- a/spec.html +++ b/spec.html @@ -31960,7 +31960,7 @@

    Notation

    A CharSet is a mathematical set of characters, either code units or code points depending up the state of the _Unicode_ flag. “All characters” means either all code unit values or all code point values also depending upon the state of _Unicode_.
  • - A State is an ordered pair (_endIndex_, _captures_) where _endIndex_ is an integer and _captures_ is a List of length _NcapturingParens_, indexed 1 through _NcapturingParens_. States are used to represent partial match states in the regular expression matching algorithms. The _endIndex_ is one plus the index of the last input character matched so far by the pattern, while _captures_ holds the results of capturing parentheses. The _n_th element of _captures_ is either a List of characters that represents the value obtained by the _n_th set of capturing parentheses or *undefined* if the _n_th set of capturing parentheses hasn't been reached yet. Due to backtracking, many States may be in use at any time during the matching process. + A State is an ordered pair (_endIndex_, _captures_) where _endIndex_ is an integer and _captures_ is a List of length _NcapturingParens_, indexed 1 through _NcapturingParens_. States are used to represent partial match states in the regular expression matching algorithms. The _endIndex_ is one plus the index of the last input character matched so far by the pattern, while _captures_ holds the results of capturing parentheses. The element of _captures_ at index _n_ is either a List of characters that represents the value obtained by the _n_th set of capturing parentheses or *undefined* if the _n_th set of capturing parentheses hasn't been reached yet. Due to backtracking, many States may be in use at any time during the matching process.
  • A MatchResult is either a State or the special token ~failure~ that indicates that the match failed.