-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Invalid transpilation for .
with su
flags
#23
Comments
This is a known limitation of regenerate: https://github.com/mathiasbynens/regenerate#regenerateprototypetostringoptions
Nowadays we have lookbehind support in JS, but a transpiler such as regexpu can’t rely on that (at least not until lookbehinds are widely supported). Here’s the old relevant comment: mathiasbynens/regenerate#28 (comment) I’d love to hear your thoughts on this! |
Hmm, yeah, I guess it's not an easy thing to solve in a general case and needs more research. However, for #24 instead of having post-rewrite optimisation layer, I was rather thinking of having specialisations for common cases so that they wouldn't have to be expanded in the first place. And, specifically for Arguably this is just a partial solution, but then, current implementation already covers only part of the full range anyway. What do you think? |
Actually, maybe we could do the same for any case where we know the codepoint range (that is, when original regex doesn't explicitly try to match lone surrogates either)... |
I’m open to this as well. 👍🏻 |
Digging into this deeper, I'm starting to think the issue can be solved deeper in regenerate. In |
Ah I found original issue mathiasbynens/regexpu#16 now, and indeed the "simpler" transpilation as shown above would break it again. |
The example in README with
su
flag works for a single character, but output seemed odd, so I tried to fuzz it for various strings and looks like output is incorrect / incompatible with native implementation (unless it's a bug in V8 instead).Let's try an example with two any characters and an ill-formed string:
As you can see, native regexp implementation (Node.js 11.0.0) matches it as expected.
Let's try with regexpu +
useUnicodeFlag
:So far, so good.
Now what if we ask it to expand Unicode ranges?
Oops, looks like a bug.
I didn't dig into reasons yet, but I suspect it has something to do with
[^\uD800-\uDBFF]|^
part.The text was updated successfully, but these errors were encountered: