Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interpolating RegExps with capturing groups changes group numbering of groups in the template. #1

Open
mikesamuel opened this issue Aug 2, 2015 · 8 comments

Comments

@mikesamuel
Copy link
Owner

RegExp.make${ /(foo)/ }(bar)``

has 1 group in the template, (bar).

In the matched output, this group will be at an index 1 + numberOfGroupsIn(/(foo)/), in this case 2.

Figure out what the semantics should be here.

@mikesamuel
Copy link
Owner Author

There are probably legitimate use cases for interpolating capturing groups.

Strawman 1:

  • Export another property, templateGroups which is an Array such that for each group number i in the RegExp formed by simply interpolating "0" for every field,
    match[templateGroups[i]] is the match for that group i.

So in

let re = RegExp.make`${ /(foo)/ }(bar)`;
let match = .exec('foobar');
match[re.templateGroups[1]] == 'bar';

templateGroups[0] is always 0.

We could also attach a property interpGroups such that

interpGroups[valueIndex][groupIndex]

is the similarly defined array but for interpolated RegExps so in the above example

match[re.interpGroups[0][1]] === 'foo'

but this would mean that interpGroups[valueIndex][0] is undefined.

@erights
Copy link
Collaborator

erights commented Aug 3, 2015

We should work through some more examples. But my immediate reaction is that I like the templateGroups approach much better.

@mikesamuel
Copy link
Owner Author

Fixed at 2e3d0ae

@bergus
Copy link
Collaborator

bergus commented Aug 12, 2015

Can you reopen this, please? I don't think matchResult.templateGroups is the best solution, it strikes me as inapt.

Creating a matcher that does special things about groups doesn't work well with subclassing imo. Or does RegExp.make already return a subclass, like InterpolatedRegExp whose methods return these specially shaped results whose instances have these special properties?

Maybe the more fundamental problem is that we cannot easily get a count of groups for a regex instance. If we'd just give every RegExp a .groupCount or so, this could be easily solved by:

const a = /(foo)/;
const b = RegExp.make`${a}(bar)`

let m = b.exec("foobar");
m[0]; // foo
m[a.groupCount + 0]; // bar

@erights
Copy link
Collaborator

erights commented Aug 12, 2015

Reopening as requested.

RegExp.make does not return a subclass. AFAIK the possibility has not been considered. Why might it help? I don't see it.

@erights erights reopened this Aug 12, 2015
@ljharb
Copy link
Collaborator

ljharb commented Aug 12, 2015

@erights I opened #11 about subclassing. I think that MyRegExp.make = RegExp.make; MyRegExp.makefoo`` should definitely return a subclass, just like the same thing with Array.from.

@bergus
Copy link
Collaborator

bergus commented Aug 12, 2015

Yeah, RegExp.make should definitely return a RegExp, not a special subclass (and MyRegExp.make should return a MyRegExp). But that's why I dislike "special" RegExp objects that have an extra .templateGroups property.

Also, this approach doesn't compose well. If I put multiple regexes, some of them RegExp.made, into another RegExp.make call, then how do I access the groups of the inner ones?

const a = /(foo)/;
const b = /(bar)/;
const c = RegExp.make `${b} (baz)`;
const d = RegExp.make `${a}${c}(n)`

let match = d.match("foobar bazn"); // ["foo", "bar", "baz", "n"]
match[0 + 0] == "foo"; // first group in first interpolated regex
match[d.templateGroups[0]] == "n"; // first group in outermost (d) template
match[??? 0] == "bar"; // first group in first regex after how many groups?
match[??? c.templateGroups[0]] == "baz"; // first group in (c) template after how many groups?

Not that nested composition is really an actual use case, but I think that it hints at the inferiority of the design.

@mikesamuel
Copy link
Owner Author

var re = RegExp.make`${ /(.)/ }(.)`

// Case 1 - Match by Method
re.exec("ab")
// Case 2 - Oblique Match
"ab".replace(re, "$1 $2")  // calls re[@@replace] internally
// Case 3 - Explicit call to method on RegExp
RegExp.prototype.exec.call(re, "ab")

Strawman I - Do nothing

For all of the above cases, capturing groups are ["ab", "a", "b"]

Strawman II - Convert capturing groups in interpolated regexps to non-capturing groups

Error out if there is a backreference in an interpolated regexp.

For all of the above cases, capturing groups are ["ab", "b"]

Strawman III - Like II, but treat every interpolation as an implicit capturing group

For all of the above cases, capturing groups are ["ab", "a", "b"]

Strawman IV - Like II, but instrument to add structure.

For case 1, and 2, the method calls come to our custom implementations that adjust
the match objects.
The match objects seen are ["ab", ["a", "a"], "b"].

Since we treat ${ /(.)/ } as introducing a capturing group, we've got a slot to put
the results of the matches from the "nested" regexp.

For 3, the explicit call to super class, we get ["ab", "a", "a", "b"].

We could instrument by defining a subclass of RegExp, ComplexRegExp for lack of a better term.
If the make method creates a disjoint RegExp subclass per issue 11, you would get different behavior.

Strawman V - A combination of II and III

Convert nested groups to non-capturing and ban unresolvable backreferences, but
also treat each interpolation as an implicit capturing group.

For all 3, we get ["ab", "a", "b"].


Any other options?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants