Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch IdentityEscape and Atom parsing behaviour to Annex B #93

Merged
merged 1 commit into from
Nov 25, 2018
Merged

Switch IdentityEscape and Atom parsing behaviour to Annex B #93

merged 1 commit into from
Nov 25, 2018

Conversation

adrianheine
Copy link
Contributor

parseIdentityEscape currently is broken since it passes a string to isIdentifierPart instead of a char code. This commit fixes that and switches to Annex B at the same time. There are some failing tests, though, and I'd like some help with them. I suspect there are some more hidden bugs somewhere else.

@jviereck
Copy link
Owner

@adrianheine, thanks for providing this. I can take a look over the weekend in case no one else jumps up.

@adrianheine
Copy link
Contributor Author

That would be great! I also started working on the remainder of #90, but wanted to get this done first to minimize (my) confusion.

@jviereck
Copy link
Owner

jviereck commented Mar 4, 2018

Taking a closer look at this, I am not so sure yet what exactly the issue is. Here what I could figure out so far. Basically it boils down to the question what /[\c0]/ is supposed to match. This is a valid RegExp in Firefox and Chrome but I have no real clue what this should match right now.

Applying the current PR version and running npm test, this is the first error I got (after improving the output a bit):

Error: Failure parsing string

  [\c0001d-G]

Got:

{"type":"error","name":"SyntaxError","message":"classEscape at position 2
    [\\c0001d-G]
      ^","input":"[\\c0001d-G]"}

Expected:

{"type":"error","name":"SyntaxError","message":"invalid range in character class at position 7: d-G
    [\\c0001d-G]
           ^","input":"[\\c0001d-G]"}

Note how after applying this patch, the parsing error changes. Before, the error was due to the invalid range in /[d-G]/. Now, the parser fails at the /[\c0001]/ part.

From playing in the browser,

/[\c312]/.test('3') -> False
/[\c012]/.test('0') -> False
/[\c012]/.test('1') -> True
/[\c012]/.test('2') -> True

So, it seems the parsing behavior is like the \c3 etc get parsed into whatever and then the 1 and 2 are parsed as possible entries in the group.

Does this parsing behavior make sense to someone?

@mathiasbynens
Copy link
Collaborator

https://tc39.github.io/ecma262/#sec-regular-expression-patterns-semantics says:

The production ClassAtomNoDash :: \ [lookahead = c] evaluates as follows:

Return the CharSet containing the single character \ U+005C (REVERSE SOLIDUS).

NOTE: This production can only be reached from the sequence \c within a character class where it is not followed by an acceptable control character.

@adrianheine
Copy link
Contributor Author

adrianheine commented Nov 17, 2018

Edit: never mind, I got it:

/\c0/.test("\u0010") // => false
/[\c0]/.test("\u0010") // => true
/\ca/.test("\u0001") // => true
/[\ca]/.test("\u0001") // => true

Also:

> /[\c]/.test("\\")
true
> /[\c]/.test("c")
true
> /\c/.test("\\c")
true

See tc39/ecma262@421a6ae and tc39/ecma262@f48dab5.

@adrianheine adrianheine changed the title [WIP] Switch IdentityEscape parsing behaviour to Annex B Switch IdentityEscape and Atom parsing behaviour to Annex B Nov 18, 2018
@adrianheine
Copy link
Contributor Author

This is good to merge from my point of view.

@mathiasbynens
Copy link
Collaborator

@adrianheine You figured it out. FWIW, I have a table here: https://mathiasbynens.be/notes/javascript-escapes#control

@jviereck jviereck merged commit cabbed7 into jviereck:gh-pages Nov 25, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants