Skip to content
This repository has been archived by the owner on May 20, 2022. It is now read-only.

Should the regular expression engine be required to validate the character? #11

Closed
nathanhammond opened this issue Jun 15, 2018 · 3 comments

Comments

@nathanhammond
Copy link
Member

nathanhammond commented Jun 15, 2018

  • Matching the EBNF does not verify that it is a valid character. (Dozens of invalid regional indicator to flag combinations, Microsoft's Ninja-cat, probably more.)
  • Should it only match characters valid for general interchange?
  • The implication from the proposal at present is that it will do a lookup against a list. Is that list OS-dependent? If so, how should cross-platform browsers get the list?
@mathiasbynens
Copy link
Member

This proposal defers to the Unicode Standard for the definitions of each of the sequence properties, just like for existing property escapes. See UTR51 which refers to the data files for each property.

@nathanhammond
Copy link
Member Author

This is proposing that this defer to Unicode's set, implying that the regular expression engine would be required to validate each sequence.

This proposal does not specify how an engine should maintain the list of valid sequences. This is a chance for cross-platform divergence in behavior (possibly even within the same engine), somewhat similar to the Date issues that Microsoft has long faced. I'd be much more comfortable with a specific plan as to how engines should update and maintain this list going forward.

How should this interact with Node's LTS policies?

@mathiasbynens
Copy link
Member

This proposal does not specify how an engine should maintain the list of valid sequences.

It does not need to, as the ECMAScript spec already codifies this. The latest version of the Unicode Standard is required (tc39/ecma262#620).

Once this proposal matures, I'll update https://github.com/mathiasbynens/unicode-property-escapes-tests which generates the Test262 tests for Unicode property escapes to include sequence property tests. These tests will be updated whenever the Unicode Standard gets an update. A tc39/ecma262 issue will be filed for every such update detailing the changes.

I don't see how this is different from any other Unicode-related change. Am I misunderstanding your feedback?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants