Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EncodeForRegExpEscape should not return results that require particular flags #69

Closed
gibson042 opened this issue Mar 27, 2024 · 3 comments · Fixed by #71
Closed

EncodeForRegExpEscape should not return results that require particular flags #69

gibson042 opened this issue Mar 27, 2024 · 3 comments · Fixed by #71

Comments

@gibson042
Copy link
Contributor

EncodeForRegExpEscape step 4.e (which would be reached if input c were a Space_Separator supplementary code point in [U+10000, U+10FFFF]) results in a return value like \u{…}. The interpretation of such pattern text is dependent upon regular expression flags—specifically, it is interpreted as a |RegExpUnicodeEscapeSequence| that will match a code point with the contained hexadecimal value in the presence of a "u" or "v" flag, but otherwise is interpreted as either a syntax error or (only in a host supporting Annex B and only when the hexadecimal representation of the code point consists only of decimal digits) as a quantified |ExtendedAtom| "u" with the specified decimal count of repetitions (e.g., /^\u{10000}$/.test("u".repeat(10000)) is true).

Rather than returning results subject to conditional interpretation, EncodeForRegExpEscape should return a \u…\u… surrogate pair |RegExpUnicodeEscapeSequence| for such inputs (which work in both Unicode and non-Unicode regular expressions, e.g. /^\uD834\uDF06$/u.test("𝌆") and /^\uD834\uDF06$/v.test("𝌆") and /^\uD834\uDF06$/.test("𝌆") are all true).

Or alternatively (and preferably IMO), EncodeForRegExpEscape should not escape all white space. I'm not certain why it does so right now, but looking back I suspect it is due to a misinterpretation of #30 (which requests escaping of control characters, and even more specifically line terminators—and even that isn't necessary).

@gibson042 gibson042 mentioned this issue Mar 27, 2024
32 tasks
@bakkot
Copy link
Collaborator

bakkot commented Mar 27, 2024

Whitespace is escaped to leave room for /x mode regexps in the future.

@ljharb
Copy link
Member

ljharb commented Mar 27, 2024

So to make sure I understand the issue properly, this would be solved if done by code units, and not code points?

@jridgewell
Copy link
Member

Yes, but I think there is a possibility that a Space_Separator is added in the future that exists in the higher U+100000-10FFFF range. So we would be adding this same support in the future.

ljharb added a commit that referenced this issue Mar 27, 2024
ljharb added a commit that referenced this issue Mar 27, 2024
@ljharb ljharb closed this as completed in 21cdd91 Mar 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants