-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wcswidth incorrect for heart emoji, ❤️ ("\u2764\ufe0f") #96
Comments
In this case, the first character "\u2764" (❤) is a width of 1, but is then cojoined with a second character, variation selector "\ufe0f" which then modifies the cell length of the first character to 2. We don't have any code in wcwidth to detect this special kind of combining, this might be like the Devanagari issue #47, that "combiner may sometimes increase the width of the previous cell, depending on its value" |
this sequence is in https://www.unicode.org/Public/UCD/latest/ucd/emoji/emoji-variation-sequences.txt which doesn't seem to hint about this width modification, but maybe it could be used to test for and make a "narrow to wide variations" table of sorts. |
I have created a solution in #97, I am now testing it with popular terminals, thanks again for the bug report |
Closes #96 - Add new table, `VS16_NARROW_TO_WIDE`. It has only one version, "9.0.0". This defines a set of characters that are otherwise Narrow, like '0', that become wide when combined with `U+FE0F`, "VARIATION SELECTOR 16". - `wcwidth.wcswidth()` function now tracks "last measured character", and, on U+FE0F, checks that character in table VS16_NARROW_TO_WIDE, and, if matching, adds 1 to the measured width. - add `verify-table-integrity.py`, this is an unrelated file from previous work in #91 that should have been included there. - The latest list of 'emoji-zwj-sequences.txt' and 'emoji-variation-sequences.txt' are fetched by update-tables.py and placed in 'tests/' folder, and now used by automatic tests in test_emoji_zwj.py, this is helpful to ensure 100% compatibility with all latest known emoji sequences Note: A single "9.0.0" version is used because of ambiguity in legacy releases of the emoji variation sequences files. So ambiguous, that very few terminals get it right! Details are documented in update-tables.py and I will share results from 'ucs-detect' project shortly. I believe that U+FE0F is something of a "fixup" for early emojis. I don't expect any new U+FE0F sequences to be published, no changes since release 10.0
Closes #96 - Add new table, `VS16_NARROW_TO_WIDE`. It has only one version, "9.0.0". This defines a set of characters that are otherwise Narrow, like '0', that become wide when combined with `U+FE0F`, "VARIATION SELECTOR 16". - change `wcwidth.wcswidth()` function, now tracks "last measured character", and, on U+FE0F, checks that character in table VS16_NARROW_TO_WIDE, and, if matching, adds 1 to the measured width. - add `verify-table-integrity.py`, this is an unrelated file from previous work in #91 that should have been included there. - new tests: The latest list of 'emoji-zwj-sequences.txt' and 'emoji-variation-sequences.txt' are fetched by update-tables.py and placed in 'tests/' folder, and now used by automatic tests in test_emoji_zwj.py, this is helpful to ensure 100% compatibility with all latest known emoji sequences - fix issue with codecov.io token Note: A single "9.0.0" version is used because of ambiguity in legacy releases of the emoji variation sequences files. So ambiguous, that very few terminals get it right! See https://ucs-detect.readthedocs.io/results.html for testing results. I believe that U+FE0F is something of a "fixup" for early emojis. I don't expect any new U+FE0F sequences to be published.
This fix will be soon released in version 0.2.10. I also tested many terminals for VS16 support, only about 28% of those tested support any kind of VS-16 sequence. https://ucs-detect.readthedocs.io/results.html |
Hello,
The
wcswidth
function seems to be incorrectly calculating the width of the heart "❤️" ("\u2764\ufe0f") moji. An example:The heart emoji occupies 2 cells and should be returning 2 as per the other examples above.
The text was updated successfully, but these errors were encountered: