-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add short regex pattern compatible with ES2018 that matches whatever emoji are supported natively #3
Comments
This is a fascinating puzzle :)
I can’t think of one. cc @markusicu I’ll note that for production apps, I’ve learned the hard way that |
That's helpful! Okay, so I guess I'm actually more interested in matching the same emoji list as As for the special case I highlighted of U+1F575 (🕵) without a following U+FE0F (VS16) even though the VS16 is required by '\u261d \u26f9 \u270c \u270d \u{1f3cb} \u{1f3cc} \u{1f574} \u{1f575} \u{1f590}'
// '☝ ⛹ ✌ ✍ 🏋 🏌 🕴 🕵 🖐' These are all common-sense emoji despite the lack of a following VS16, and all render as colorful images as compared to monochrome text variants even without VS16 (at least on Windows 11 where I'm currently viewing them). So I think it makes sense for me to include them as "underqualified emoji" exceptions, similar to One adjustment I'll make, though: To support all emoji tag sequences (including the Texas flag supported by WhatsApp and OpenMoji), I will change |
I've published a version of the regex here (with some updates/fixes) as |
I'm trying to write a version of
\p{RGI_Emoji}
(for use in fabian-hiller/valibot#666 and elsewhere) that is compatible with ES2018 and does not rely on a giant listing of code points. I'm okay with the list of emoji being tied to whatever version of Unicode that the JS environment supports natively. I'm finding the Unicode spec not very easy to follow for this purpose.Here's what I have so far:
/(?:(?:\p{Emoji_Modifier_Base}\p{Emoji_Modifier}?|\p{Emoji_Presentation}|\p{Emoji}\uFE0F\u20E3?)(?:\u200D(?:\p{Emoji_Modifier_Base}\p{Emoji_Modifier}?|\p{Emoji_Presentation}|\p{Emoji}\uFE0F))*|[\u{1F1E6}-\u{1F1FF}]{2}|\u{1F3F4}[\u{E0061}-\u{E007A}]{5}\u{E007F})/u
This matches every emoji from the full
RGI_Emoji
list in this repo (here). It also correctly excludes things that are matched by\p{Emoji}
like digits,*
, and symbols (e.g.👁
,✈
,🏳
,♂
) that are only emoji when followed by U+FE0F (VS16). And it correctly excludes things like bare U+200D (ZWJ) matched by\p{Emoji_Component}
.The one issue seems to be that it matches some
\p{Emoji_Modifier_Base}
code points that are not technically emoji without a following VS16, which\p{RGI_Emoji}
therefore does not match on their own. The one example I've found is U+1F575 (🕵
). I don't know if there are other cases like this, but I suspect there are. The complicating factor is that there are other emoji like👂
(U+1F442),🤘
(U+1F918), and💃
(U+1F483) that are matched by\p{Emoji_Modifier_Base}
and do not use/require a following VS16.So, two questions:
\p{Emoji_Modifier_Base}
but are not matched by\p{RGI_Emoji}
?The text was updated successfully, but these errors were encountered: