Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Editorial: Add notes about the 1-based indexing of regular expression _captures_ #2150

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 8 additions & 3 deletions spec.html
Original file line number Diff line number Diff line change
Expand Up @@ -31960,7 +31960,7 @@ <h1>Notation</h1>
A <em>CharSet</em> is a mathematical set of characters, either code units or code points depending up the state of the _Unicode_ flag. &ldquo;All characters&rdquo; means either all code unit values or all code point values also depending upon the state of _Unicode_.
</li>
<li>
A <em>State</em> is an ordered pair (_endIndex_, _captures_) where _endIndex_ is an integer and _captures_ is a List of _NcapturingParens_ values. States are used to represent partial match states in the regular expression matching algorithms. The _endIndex_ is one plus the index of the last input character matched so far by the pattern, while _captures_ holds the results of capturing parentheses. The _n_<sup>th</sup> element of _captures_ is either a List that represents the value obtained by the _n_<sup>th</sup> set of capturing parentheses or *undefined* if the _n_<sup>th</sup> set of capturing parentheses hasn't been reached yet. Due to backtracking, many States may be in use at any time during the matching process.
A <em>State</em> is an ordered pair (_endIndex_, _captures_) where _endIndex_ is an integer and _captures_ is a List of length _NcapturingParens_, indexed 1 through _NcapturingParens_. States are used to represent partial match states in the regular expression matching algorithms. The _endIndex_ is one plus the index of the last input character matched so far by the pattern, while _captures_ holds the results of capturing parentheses. The element of _captures_ at index _n_ is either a List of characters that represents the value obtained by the _n_<sup>th</sup> set of capturing parentheses or *undefined* if the _n_<sup>th</sup> set of capturing parentheses hasn't been reached yet. Due to backtracking, many States may be in use at any time during the matching process.
</li>
<li>
A <em>MatchResult</em> is either a State or the special token ~failure~ that indicates that the match failed.
Expand Down Expand Up @@ -32105,6 +32105,7 @@ <h1>Runtime Semantics: RepeatMatcher ( _m_, _min_, _max_, _greedy_, _x_, _c_, _p
1. If _max_ is &infin;, let _max2_ be &infin;; otherwise let _max2_ be _max_ - 1.
1. Call RepeatMatcher(_m_, _min2_, _max2_, _greedy_, _y_, _c_, _parenIndex_, _parenCount_) and return its result.
1. Let _cap_ be a copy of _x_'s _captures_ List.
1. NOTE: _cap_ is indexed from 1 (<emu-xref href="#sec-notation"></emu-xref>).
1. [id="step-repeatmatcher-clear-captures"] For each integer _k_ that satisfies _parenIndex_ &lt; _k_ and _k_ &le; _parenIndex_ + _parenCount_, set _cap_[_k_] to *undefined*.
1. Let _e_ be _x_'s _endIndex_.
1. Let _xr_ be the State (_e_, _cap_).
Expand Down Expand Up @@ -32602,6 +32603,7 @@ <h1>Atom</h1>
1. Let _d_ be a new Continuation with parameters (_y_) that captures _x_, _c_, _direction_, and _parenIndex_ and performs the following steps when called:
1. Assert: _y_ is a State.
1. Let _cap_ be a copy of _y_'s _captures_ List.
1. NOTE: _cap_ is indexed from 1 (<emu-xref href="#sec-notation"></emu-xref>).
1. Let _xe_ be _x_'s _endIndex_.
1. Let _ye_ be _y_'s _endIndex_.
1. If _direction_ is equal to +1, then
Expand Down Expand Up @@ -32767,6 +32769,7 @@ <h1>Runtime Semantics: BackreferenceMatcher ( _n_, _direction_ )</h1>
1. Assert: _x_ is a State.
1. Assert: _c_ is a Continuation.
1. Let _cap_ be _x_'s _captures_ List.
1. NOTE: _cap_ is indexed from 1 (<emu-xref href="#sec-notation"></emu-xref>).
1. Let _s_ be _cap_[_n_].
1. If _s_ is *undefined*, return _c_(_x_).
1. Let _e_ be _x_'s _endIndex_.
Expand Down Expand Up @@ -33234,7 +33237,9 @@ <h1>Runtime Semantics: RegExpBuiltinExec ( _R_, _S_ )</h1>
1. Set _e_ to _eUTF_.
1. If _global_ is *true* or _sticky_ is *true*, then
1. Perform ? Set(_R_, *"lastIndex"*, _e_, *true*).
1. Let _n_ be the number of elements in _r_'s _captures_ List. (This is the same value as <emu-xref href="#sec-notation"></emu-xref>'s _NcapturingParens_.)
1. Let _cap_ be _r_'s _captures_ List.
1. NOTE: _cap_ is indexed from 1 (<emu-xref href="#sec-notation"></emu-xref>).
1. Let _n_ be the number of elements in _cap_. (This is the same value as <emu-xref href="#sec-notation"></emu-xref>'s _NcapturingParens_.)
1. Assert: _n_ &lt; 2<sup>32</sup> - 1.
1. Let _A_ be ! ArrayCreate(_n_ + 1).
1. Assert: The value of _A_'s *"length"* property is _n_ + 1.
Expand All @@ -33248,7 +33253,7 @@ <h1>Runtime Semantics: RegExpBuiltinExec ( _R_, _S_ )</h1>
1. Let _groups_ be *undefined*.
1. Perform ! CreateDataPropertyOrThrow(_A_, *"groups"*, _groups_).
1. For each integer _i_ such that _i_ &gt; 0 and _i_ &le; _n_, do
1. Let _captureI_ be _i_<sup>th</sup> element of _r_'s _captures_ List.
1. Let _captureI_ be _cap_[_i_].
1. If _captureI_ is *undefined*, let _capturedValue_ be *undefined*.
1. Else if _fullUnicode_ is *true*, then
1. Assert: _captureI_ is a List of code points.
Expand Down