Should the regular expression engine be required to validate the character? #11

nathanhammond · 2018-06-15T18:38:20Z

Matching the EBNF does not verify that it is a valid character. (Dozens of invalid regional indicator to flag combinations, Microsoft's Ninja-cat, probably more.)
Should it only match characters valid for general interchange?
The implication from the proposal at present is that it will do a lookup against a list. Is that list OS-dependent? If so, how should cross-platform browsers get the list?

mathiasbynens · 2018-06-16T12:41:06Z

This proposal defers to the Unicode Standard for the definitions of each of the sequence properties, just like for existing property escapes. See UTR51 which refers to the data files for each property.

nathanhammond · 2018-06-19T20:50:01Z

This is proposing that this defer to Unicode's set, implying that the regular expression engine would be required to validate each sequence.

This proposal does not specify how an engine should maintain the list of valid sequences. This is a chance for cross-platform divergence in behavior (possibly even within the same engine), somewhat similar to the Date issues that Microsoft has long faced. I'd be much more comfortable with a specific plan as to how engines should update and maintain this list going forward.

How should this interact with Node's LTS policies?

mathiasbynens · 2018-06-20T09:38:57Z

This proposal does not specify how an engine should maintain the list of valid sequences.

It does not need to, as the ECMAScript spec already codifies this. The latest version of the Unicode Standard is required (tc39/ecma262#620).

Once this proposal matures, I'll update https://github.com/mathiasbynens/unicode-property-escapes-tests which generates the Test262 tests for Unicode property escapes to include sequence property tests. These tests will be updated whenever the Unicode Standard gets an update. A tc39/ecma262 issue will be filed for every such update detailing the changes.

I don't see how this is different from any other Unicode-related change. Am I misunderstanding your feedback?

mathiasbynens closed this as completed Jun 16, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should the regular expression engine be required to validate the character? #11

Should the regular expression engine be required to validate the character? #11

nathanhammond commented Jun 15, 2018 •

edited

Loading

mathiasbynens commented Jun 16, 2018

nathanhammond commented Jun 19, 2018

mathiasbynens commented Jun 20, 2018

Should the regular expression engine be required to validate the character? #11

Should the regular expression engine be required to validate the character? #11

Comments

nathanhammond commented Jun 15, 2018 • edited Loading

mathiasbynens commented Jun 16, 2018

nathanhammond commented Jun 19, 2018

mathiasbynens commented Jun 20, 2018

nathanhammond commented Jun 15, 2018 •

edited

Loading