Switch IdentityEscape and Atom parsing behaviour to Annex B #93

adrianheine · 2018-02-27T21:23:09Z

parseIdentityEscape currently is broken since it passes a string to isIdentifierPart instead of a char code. This commit fixes that and switches to Annex B at the same time. There are some failing tests, though, and I'd like some help with them. I suspect there are some more hidden bugs somewhere else.

jviereck · 2018-02-27T21:32:18Z

@adrianheine, thanks for providing this. I can take a look over the weekend in case no one else jumps up.

adrianheine · 2018-02-27T21:39:19Z

That would be great! I also started working on the remainder of #90, but wanted to get this done first to minimize (my) confusion.

jviereck · 2018-03-04T03:23:30Z

Taking a closer look at this, I am not so sure yet what exactly the issue is. Here what I could figure out so far. Basically it boils down to the question what /[\c0]/ is supposed to match. This is a valid RegExp in Firefox and Chrome but I have no real clue what this should match right now.

Applying the current PR version and running npm test, this is the first error I got (after improving the output a bit):

Error: Failure parsing string

  [\c0001d-G]

Got:

{"type":"error","name":"SyntaxError","message":"classEscape at position 2
    [\\c0001d-G]
      ^","input":"[\\c0001d-G]"}

Expected:

{"type":"error","name":"SyntaxError","message":"invalid range in character class at position 7: d-G
    [\\c0001d-G]
           ^","input":"[\\c0001d-G]"}

Note how after applying this patch, the parsing error changes. Before, the error was due to the invalid range in /[d-G]/. Now, the parser fails at the /[\c0001]/ part.

From playing in the browser,

/[\c312]/.test('3') -> False
/[\c012]/.test('0') -> False
/[\c012]/.test('1') -> True
/[\c012]/.test('2') -> True

So, it seems the parsing behavior is like the \c3 etc get parsed into whatever and then the 1 and 2 are parsed as possible entries in the group.

Does this parsing behavior make sense to someone?

mathiasbynens · 2018-03-04T15:24:41Z

https://tc39.github.io/ecma262/#sec-regular-expression-patterns-semantics says:

The production ClassAtomNoDash :: \ [lookahead = c] evaluates as follows:

Return the CharSet containing the single character \ U+005C (REVERSE SOLIDUS).

NOTE: This production can only be reached from the sequence \c within a character class where it is not followed by an acceptable control character.

adrianheine · 2018-11-17T14:04:32Z

Edit: never mind, I got it:

/\c0/.test("\u0010") // => false
/[\c0]/.test("\u0010") // => true
/\ca/.test("\u0001") // => true
/[\ca]/.test("\u0001") // => true

Also:

> /[\c]/.test("\\")
true
> /[\c]/.test("c")
true
> /\c/.test("\\c")
true

See tc39/ecma262@421a6ae and tc39/ecma262@f48dab5.

adrianheine · 2018-11-18T17:09:21Z

This is good to merge from my point of view.

mathiasbynens · 2018-11-18T22:30:31Z

@adrianheine You figured it out. FWIW, I have a table here: https://mathiasbynens.be/notes/javascript-escapes#control

Switch IdentityEscape and Atom parsing behaviour to Annex B

545e078

adrianheine changed the title ~~[WIP] Switch IdentityEscape parsing behaviour to Annex B~~ Switch IdentityEscape and Atom parsing behaviour to Annex B Nov 18, 2018

mathiasbynens approved these changes Nov 18, 2018

View reviewed changes

jviereck merged commit cabbed7 into jviereck:gh-pages Nov 25, 2018

adrianheine mentioned this pull request Dec 4, 2018

Support lookbehind mathiasbynens/regexpu-core#22

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch IdentityEscape and Atom parsing behaviour to Annex B #93

Switch IdentityEscape and Atom parsing behaviour to Annex B #93

adrianheine commented Feb 27, 2018

jviereck commented Feb 27, 2018

adrianheine commented Feb 27, 2018

jviereck commented Mar 4, 2018

mathiasbynens commented Mar 4, 2018

adrianheine commented Nov 17, 2018 •

edited

Loading

adrianheine commented Nov 18, 2018

mathiasbynens commented Nov 18, 2018

Switch IdentityEscape and Atom parsing behaviour to Annex B #93

Switch IdentityEscape and Atom parsing behaviour to Annex B #93

Conversation

adrianheine commented Feb 27, 2018

jviereck commented Feb 27, 2018

adrianheine commented Feb 27, 2018

jviereck commented Mar 4, 2018

mathiasbynens commented Mar 4, 2018

adrianheine commented Nov 17, 2018 • edited Loading

adrianheine commented Nov 18, 2018

mathiasbynens commented Nov 18, 2018

adrianheine commented Nov 17, 2018 •

edited

Loading