wcswidth incorrect for heart emoji, ❤️ ("\u2764\ufe0f") #96

dscrofts · 2023-10-31T13:09:40Z

Hello,

The wcswidth function seems to be incorrectly calculating the width of the heart "❤️" ("\u2764\ufe0f") moji. An example:

>>> from wcwidth import wcswidth
>>> wcswidth("❤️")
1
>>> wcswidth("💞")
2
>>> wcswidth("💘")
2

The heart emoji occupies 2 cells and should be returning 2 as per the other examples above.

The text was updated successfully, but these errors were encountered:

jquast · 2023-10-31T16:05:30Z

In this case, the first character "\u2764" (❤) is a width of 1, but is then cojoined with a second character, variation selector "\ufe0f" which then modifies the cell length of the first character to 2.

We don't have any code in wcwidth to detect this special kind of combining, this might be like the Devanagari issue #47, that "combiner may sometimes increase the width of the previous cell, depending on its value"

jquast · 2023-10-31T16:12:17Z

this sequence is in https://www.unicode.org/Public/UCD/latest/ucd/emoji/emoji-variation-sequences.txt which doesn't seem to hint about this width modification, but maybe it could be used to test for and make a "narrow to wide variations" table of sorts.

jquast · 2023-11-07T17:06:11Z

I have created a solution in #97, I am now testing it with popular terminals, thanks again for the bug report

Closes #96 - Add new table, `VS16_NARROW_TO_WIDE`. It has only one version, "9.0.0". This defines a set of characters that are otherwise Narrow, like '0', that become wide when combined with `U+FE0F`, "VARIATION SELECTOR 16". - `wcwidth.wcswidth()` function now tracks "last measured character", and, on U+FE0F, checks that character in table VS16_NARROW_TO_WIDE, and, if matching, adds 1 to the measured width. - add `verify-table-integrity.py`, this is an unrelated file from previous work in #91 that should have been included there. - The latest list of 'emoji-zwj-sequences.txt' and 'emoji-variation-sequences.txt' are fetched by update-tables.py and placed in 'tests/' folder, and now used by automatic tests in test_emoji_zwj.py, this is helpful to ensure 100% compatibility with all latest known emoji sequences Note: A single "9.0.0" version is used because of ambiguity in legacy releases of the emoji variation sequences files. So ambiguous, that very few terminals get it right! Details are documented in update-tables.py and I will share results from 'ucs-detect' project shortly. I believe that U+FE0F is something of a "fixup" for early emojis. I don't expect any new U+FE0F sequences to be published, no changes since release 10.0

Closes #96 - Add new table, `VS16_NARROW_TO_WIDE`. It has only one version, "9.0.0". This defines a set of characters that are otherwise Narrow, like '0', that become wide when combined with `U+FE0F`, "VARIATION SELECTOR 16". - change `wcwidth.wcswidth()` function, now tracks "last measured character", and, on U+FE0F, checks that character in table VS16_NARROW_TO_WIDE, and, if matching, adds 1 to the measured width. - add `verify-table-integrity.py`, this is an unrelated file from previous work in #91 that should have been included there. - new tests: The latest list of 'emoji-zwj-sequences.txt' and 'emoji-variation-sequences.txt' are fetched by update-tables.py and placed in 'tests/' folder, and now used by automatic tests in test_emoji_zwj.py, this is helpful to ensure 100% compatibility with all latest known emoji sequences - fix issue with codecov.io token Note: A single "9.0.0" version is used because of ambiguity in legacy releases of the emoji variation sequences files. So ambiguous, that very few terminals get it right! See https://ucs-detect.readthedocs.io/results.html for testing results. I believe that U+FE0F is something of a "fixup" for early emojis. I don't expect any new U+FE0F sequences to be published.

jquast · 2023-11-13T21:25:51Z

This fix will be soon released in version 0.2.10. I also tested many terminals for VS16 support, only about 28% of those tested support any kind of VS-16 sequence. https://ucs-detect.readthedocs.io/results.html

jquast changed the title ~~wcswidth incorrect for heart emoji ("❤️")~~ wcswidth incorrect for heart emoji, ❤️ ("\u2764\ufe0f") Oct 31, 2023

jquast added the bug label Oct 31, 2023

jquast mentioned this issue Nov 7, 2023

Bugfix accounting for Variation Selector 16 #97

Merged

jquast closed this as completed in #97 Nov 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wcswidth incorrect for heart emoji, ❤️ ("\u2764\ufe0f") #96

wcswidth incorrect for heart emoji, ❤️ ("\u2764\ufe0f") #96

dscrofts commented Oct 31, 2023 •

edited by jquast

Loading

jquast commented Oct 31, 2023 •

edited

Loading

jquast commented Oct 31, 2023

jquast commented Nov 7, 2023

jquast commented Nov 13, 2023

wcswidth incorrect for heart emoji, ❤️ ("\u2764\ufe0f") #96

wcswidth incorrect for heart emoji, ❤️ ("\u2764\ufe0f") #96

Comments

dscrofts commented Oct 31, 2023 • edited by jquast Loading

jquast commented Oct 31, 2023 • edited Loading

jquast commented Oct 31, 2023

jquast commented Nov 7, 2023

jquast commented Nov 13, 2023

dscrofts commented Oct 31, 2023 •

edited by jquast

Loading

jquast commented Oct 31, 2023 •

edited

Loading