Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does not support astral ID_Continue characters in capture group names #90

Closed
adrianheine opened this issue Jan 16, 2018 · 13 comments
Closed

Comments

@adrianheine
Copy link
Contributor

/(?<$𐒤>a)/u should parse according to test262.

@mathiasbynens
Copy link
Collaborator

To save everyone else the trouble of figuring out which symbol is being used — the above input string is '/(?<$\u{104A4}>a)/u'.

@adrianheine
Copy link
Contributor Author

I now understand that this changed between ES5 and ES6. If that's true, I see three approaches:

  1. Continue following ES5
  2. Switch to ES6
  3. Make it configurable

I prefer 3 over 2 over 1.

@mathiasbynens
Copy link
Collaborator

I prefer 2 over 3 over 1.

@mathiasbynens
Copy link
Collaborator

But note that this is unrelated to ES2015 — named capture groups are a more recent proposal, that might become part of ES2018.

@adrianheine
Copy link
Contributor Author

Right, isIdentifier(Start|Part) is mostly used for named capture groups which is stage3, so definitely not ES5.

isIdentifierPart is also used in parseIdentityEscape, though (which is a horrible mis-feature to begin with). So /\𐒤/ (/\\\u{104A4}/) shouldn't parse but currently does.

@jviereck
Copy link
Owner

jviereck commented Jan 21, 2018

I now understand that this changed between ES5 and ES6. If that's true, I see three approaches:

  1. Continue following ES5
  2. Switch to ES6
  3. Make it configurable

I prefer 3 over 2 over 1.

Does this need a resolution? Following the comment 1, I am not sure what needs to be done / is the correct way to do here.

@adrianheine
Copy link
Contributor Author

There are basically two issues:

  1. '/\\\u{104A4}/' passes, which wouldn't pass in ES6+. regjsparser cannot realistically claim to parse regular expressions according to ES5 (since it optionally supports newer features), so this is at least odd. On the other hand, there is no claim as to which EcmaScript version's regular expressions it parses, so users shouldn't expect it to be strictly any specific version, either.
  2. '/(?<$\u{104A4}>a)/u' doesn't parse, which should (if named capture groups are enabled). That is a bug.

@mathiasbynens
Copy link
Collaborator

Re: 1. /\𐒤/ should parse according to the spec. Did you mean /\𐒤/u?

@adrianheine
Copy link
Contributor Author

The production for IdentityEscape is as follows:

IdentityEscape [U] ::
  [+U] SyntaxCharacter
  [+U] /
  [~U] SourceCharacter but not UnicodeIDContinue

So /\𐒤/ could only be IdentityEscape :: [~U] SourceCharacter but not UnicodeIDContinue, but since \u{104A4} is UnicodeIDContinue (since ES6), this shouldn't match.

@mathiasbynens
Copy link
Collaborator

Right, but in practice engines (must) implement Annex B for Web compatibility. The IdentityEscape production there is:

IdentityEscape[U, N] ::
  [+U] SyntaxCharacter
  [+U] /
  [~U] SourceCharacterIdentityEscape[?N]

SourceCharacterIdentityEscape[N] ::
  [~N] SourceCharacter but not c
  [+N] SourceCharacter but not one of c or k

regjsparser should implement the same.

@adrianheine
Copy link
Contributor Author

Alright. So I suggest to switch parseIdentityEscape behavior to Annex B, i. e. drop the isIdentifierPart check, and to update isIdentifierPart and isIdentifierStart to accept astral plane characters for capture group names.

@jviereck
Copy link
Owner

Alright. So I suggest to switch parseIdentityEscape behavior to Annex B, i. e. drop the isIdentifierPart check, and to update isIdentifierPart and isIdentifierStart to accept astral plane characters for capture group names.

Yes, this sounds reasonable.

@adrianheine
Copy link
Contributor Author

https://github.com/adrianheine/regjsparser/tree/astral contains my first try at fixing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants