Skip to content
This repository has been archived by the owner on Feb 16, 2024. It is now read-only.

only finite sets of strings #26

Closed
markusicu opened this issue May 26, 2021 · 4 comments
Closed

only finite sets of strings #26

markusicu opened this issue May 26, 2021 · 4 comments

Comments

@markusicu
Copy link
Collaborator

In the TC39 meeting today (2021-may-26) there was some discussion of whether we should prepare for character classes matching infinite sets of strings.

From the start, the proposal has been to extended character classes, and supported Unicode properties, from finite sets of characters to finite sets of strings. This was the basis for the argument to use \p for properties of strings.

As an example, in UTS #51 there is a very clear distinction between

  1. an emoji zwj sequence, defined via a regular expression that matches an infinite set of strings
  2. the RGI emoji ZWJ sequence set (= the RGI_Emoji_ZWJ_Sequence property) which is a finite set of strings listed in a data file

It would be possible to support named matchers for infinite sets of strings, that is, a kind of named sub-regular-expression, but that is very different from a finite set, needs to have separate syntax, and should not be allowed inside character classes.

@macchiati
Copy link
Collaborator

macchiati commented May 26, 2021 via email

@mathiasbynens
Copy link
Member

I agree that named matchers for infinite sets of strings could be useful, but I'm not convinced this is part of the MVP. I would prefer pursuing it as a separate follow-up proposal. @waldemarhorwat, does that match your thinking?

That said, here’s some thoughts:

It would be possible to support named matchers for infinite sets of strings, that is, a kind of named sub-regular-expression, but that is very different from a finite set, […]

Agreed.

[…] needs to have separate syntax, […]

Not sure I agree. I think we could totally use \p{…} for this as well if we decide to support this in the future. Nothing about our current proposal prevents us from doing that, since \p{SomeUnknownOrUnsupportedProperty} throws an exception.

[…] and should not be allowed inside character classes.

I’m not sure. Mark’s example of [\p{Valid_Emoji}--\p{RGI_Emoji}] seems compelling.

@markusicu
Copy link
Collaborator Author

Proposed resolution: There is enough reserved syntax (e.g., curly braces) to enable wide-ranging extensions in the future, but we don't plan to build something specific into the proposed spec changes.

Cc stage 3 reviewers @waldemarhorwat @gibson042 @msaboff

@mathiasbynens
Copy link
Member

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants