You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In consequence this means for KorAP that queries need to treat these sequences as token sequences, like [orth=✊][orth=🏿] and [orth=👨][orth=][orth=👦][orth=][orth=👦]. Which is especially problematic with ZWJ.
It's also therefore not possible to search for ✊🏿 or 👨👦👦.
This was reported by Louis Cotgrove.
The text was updated successfully, but these errors were encountered:
The tokenizer currently splits emoji sequences, either with modifiers or with ZWJs.
Examples are
In consequence this means for KorAP that queries need to treat these sequences as token sequences, like
[orth=✊][orth=🏿]
and[orth=👨][orth=][orth=👦][orth=][orth=👦]
. Which is especially problematic with ZWJ.It's also therefore not possible to search for ✊🏿 or 👨👦👦.
This was reported by Louis Cotgrove.
The text was updated successfully, but these errors were encountered: