Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix compiling v set notation to u with unicode properties #70

Merged

Conversation

liuxingbaoyu
Copy link
Contributor

@liuxingbaoyu liuxingbaoyu commented Nov 15, 2022

Fixes: babel/babel#15193 (comment)
Since v and u flags are mutually exclusive, and I read in the v8 blog that there is no reason not to replace u with v, I guess they can maintain similar behavior here. (I'm not familiar with regular expressions, sorry if I'm wrong)

Copy link
Collaborator

@JLHwung JLHwung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Transforming unicodeFlag requires transforming unicodePropertyEscapes because unicode property escape is valid only in /u and /v mode. In other words, it is impossible to transform unicode flag to an ES5 regex while preserving the unicode property escape.

However, this is not the case for the /v flag, e.g.

/\p{ASCII}/v
/\p{ASCII}/u

They are equivalent. I will expect the regexpu transform former to the latter, i.e. the current behavior, so that it can run on the targets with native unicode property escape. However, this PR further transforms /\p{ASCII}/v to /[\x00-/x7f]/. Although it could solve the linked Babel issue, we could have done better, e.g. only transform unicode property escape when it is involved in set notations, e.g.

/[\p{ASCII}&&\p{Decimal_Number}]/v

@liuxingbaoyu
Copy link
Contributor Author

liuxingbaoyu commented Nov 16, 2022

I see what you mean, but that I haven't figured out how to do it yet, I plan to put it in the next PR.
I hope to support regex with v flag and unicodePropertyEscapes in this PR.
When used without set notations, a regex using the v flag will not compile unicodePropertyEscapes, even if we explicitly define unicodePropertyEscapes: 'transform'.
I agree with what you said, it makes sense to preserve when compiling to the u flag.
Originally I changed it to
config.transform.unicodePropertyEscapes = (config.flags.unicode || config.flags.unicodeSets) && ( transform(options, 'unicodeFlag') || transform(options, 'unicodeSetsFlag') || transform(options, 'unicodePropertyEscapes')
But I then realized that this would accidentally compile unicodePropertyEscapes in the regex for the u flag when transform(options, 'unicodeFlag') != true.
Now I'm not sure what to do.

@JLHwung
Copy link
Collaborator

JLHwung commented Nov 16, 2022

(config.transform.unicodeSetsFlag && nestedData.maybeIncludesStrings);

It seems to me we can mark the character escape as transformed when the unicodeSetsFlag is applied and the item kind is not union (&& or --):

(config.transform.unicodeSetsFlag && (nestedData.maybeIncludesStrings || characterClassItem.kind !== "union"))

@liuxingbaoyu
Copy link
Contributor Author

@JLHwung Thanks! This is amazing.

@liuxingbaoyu
Copy link
Contributor Author

liuxingbaoyu commented Nov 17, 2022

Since these two changes are relatively small, I put them in a PR.🙂
Also I tested locally and was able to fix the babel issue linked to.

@liuxingbaoyu liuxingbaoyu changed the title Automatically enable unicodePropertyEscapes when using the v flag Compile unicodePropertyEscapes in regular expressions with v flag and unicodePropertyEscapes that set notations depend on Nov 17, 2022
@@ -700,7 +700,7 @@ const rewritePattern = (pattern, flags, options) => {
config.transform.unicodeSetsFlag = config.flags.unicodeSets && transform(options, 'unicodeSetsFlag');

// unicodeFlag: 'transform' implies unicodePropertyEscapes: 'transform'
config.transform.unicodePropertyEscapes = config.flags.unicode && (
config.transform.unicodePropertyEscapes = (config.flags.unicode || config.flags.unicodeSets) && (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still not convinced that we should always transform unicode property escape if unicodeSets flag is set. Can you add a new test case?

/[\p{ASCII}]/v

should be transformed to

/[\p{ASCII}]/u

Of course if user wants to further transform /u, then we should transform \p{ASCII}, too.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see, there is a transform(options, 'unicodePropertyEscapes') check later so that unicodePropertyEscapes: false will still preserve the property escapes.

@JLHwung
Copy link
Collaborator

JLHwung commented Nov 17, 2022

Can you check if this PR also fixes babel/babel#15193 (comment) ?

@liuxingbaoyu
Copy link
Contributor Author

Yes, I originally opened this PR to fix babel/babel#15193 (comment).

@JLHwung
Copy link
Collaborator

JLHwung commented Nov 17, 2022

Yes, I originally opened this PR to fix babel/babel#15193 (comment).

If so can you add /^\P{Lowercase_Letter}$/v test cases mentioned in that comment?

And also the union of two property escape, e.g. /[\p{ASCII}\p{Decimal_Number}]/v, the property escape should be preserved. I am aware that we have test cases for the union of string property.

@liuxingbaoyu
Copy link
Contributor Author

If so can you add /^\P{Lowercase_Letter}$/v test cases mentioned in that comment?

https://github.com/mathiasbynens/regexpu-core/pull/70/files#diff-2531263950f638caf4f29666c05364762d24d7c3f735505f3008c3272b762a8aR1406-R1410

I added it here, should be the same?

@JLHwung
Copy link
Collaborator

JLHwung commented Nov 17, 2022

If so can you add /^\P{Lowercase_Letter}$/v test cases mentioned in that comment?

https://github.com/mathiasbynens/regexpu-core/pull/70/files#diff-2531263950f638caf4f29666c05364762d24d7c3f735505f3008c3272b762a8aR1406-R1410

I added it here, should be the same?

Not quite the same, \P is the negation of \p, e.g. \P{Lowercase_Letter} matches all characters except that its category is lowercase letter.

tests/tests.js Outdated Show resolved Hide resolved
@@ -700,7 +700,7 @@ const rewritePattern = (pattern, flags, options) => {
config.transform.unicodeSetsFlag = config.flags.unicodeSets && transform(options, 'unicodeSetsFlag');

// unicodeFlag: 'transform' implies unicodePropertyEscapes: 'transform'
config.transform.unicodePropertyEscapes = config.flags.unicode && (
config.transform.unicodePropertyEscapes = (config.flags.unicode || config.flags.unicodeSets) && (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see, there is a transform(options, 'unicodePropertyEscapes') check later so that unicodePropertyEscapes: false will still preserve the property escapes.

@JLHwung
Copy link
Collaborator

JLHwung commented Sep 18, 2023

✅ pending test case updates.
/cc @mathiasbynens @nicolo-ribaudo

Co-authored-by: Huáng Jùnliàng <[email protected]>
@nicolo-ribaudo nicolo-ribaudo changed the title Compile unicodePropertyEscapes in regular expressions with v flag and unicodePropertyEscapes that set notations depend on Fix compiling v set notation to u with unicode properties Sep 23, 2023
Copy link
Collaborator

@nicolo-ribaudo nicolo-ribaudo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@nicolo-ribaudo nicolo-ribaudo merged commit c24e0cc into mathiasbynens:main Sep 23, 2023
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: @babel/plugin-proposal-unicode-sets-regex – unexpected results
3 participants