Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wcswidth incorrect for heart emoji, ❤️ ("\u2764\ufe0f") #96

Closed
dscrofts opened this issue Oct 31, 2023 · 4 comments · Fixed by #97
Closed

wcswidth incorrect for heart emoji, ❤️ ("\u2764\ufe0f") #96

dscrofts opened this issue Oct 31, 2023 · 4 comments · Fixed by #97
Labels

Comments

@dscrofts
Copy link

dscrofts commented Oct 31, 2023

Hello,

The wcswidth function seems to be incorrectly calculating the width of the heart "❤️" ("\u2764\ufe0f") moji. An example:

>>> from wcwidth import wcswidth
>>> wcswidth("❤️")
1
>>> wcswidth("💞")
2
>>> wcswidth("💘")
2

The heart emoji occupies 2 cells and should be returning 2 as per the other examples above.

@jquast jquast changed the title wcswidth incorrect for heart emoji ("❤️") wcswidth incorrect for heart emoji, ❤️ ("\u2764\ufe0f") Oct 31, 2023
@jquast
Copy link
Owner

jquast commented Oct 31, 2023

In this case, the first character "\u2764" (❤) is a width of 1, but is then cojoined with a second character, variation selector "\ufe0f" which then modifies the cell length of the first character to 2.

We don't have any code in wcwidth to detect this special kind of combining, this might be like the Devanagari issue #47, that "combiner may sometimes increase the width of the previous cell, depending on its value"

@jquast jquast added the bug label Oct 31, 2023
@jquast
Copy link
Owner

jquast commented Oct 31, 2023

this sequence is in https://www.unicode.org/Public/UCD/latest/ucd/emoji/emoji-variation-sequences.txt which doesn't seem to hint about this width modification, but maybe it could be used to test for and make a "narrow to wide variations" table of sorts.

@jquast
Copy link
Owner

jquast commented Nov 7, 2023

I have created a solution in #97, I am now testing it with popular terminals, thanks again for the bug report

jquast added a commit that referenced this issue Nov 7, 2023
Closes #96

- Add new table, `VS16_NARROW_TO_WIDE`. It has only one version,
  "9.0.0". This defines a set of characters that are otherwise Narrow,
  like '0', that become wide when combined with `U+FE0F`, "VARIATION
  SELECTOR 16".

- `wcwidth.wcswidth()` function now tracks "last measured character",
  and, on U+FE0F, checks that character in table VS16_NARROW_TO_WIDE,
  and, if matching, adds 1 to the measured width.

- add `verify-table-integrity.py`, this is an unrelated file from
  previous work in #91 that should have been included there.

- The latest list of 'emoji-zwj-sequences.txt' and
  'emoji-variation-sequences.txt' are fetched by update-tables.py and
  placed in 'tests/' folder, and now used by automatic tests in
  test_emoji_zwj.py, this is helpful to ensure 100% compatibility with
  all latest known emoji sequences

Note: A single "9.0.0" version is used because of ambiguity in legacy
releases of the emoji variation sequences files. So ambiguous, that very
few terminals get it right! Details are documented in update-tables.py
and I will share results from 'ucs-detect' project shortly.

I believe that U+FE0F is something of a "fixup" for early emojis. I
don't expect any new U+FE0F sequences to be published, no changes since
release 10.0
jquast added a commit that referenced this issue Nov 13, 2023
Closes #96 

- Add new table, `VS16_NARROW_TO_WIDE`. It has only one version, "9.0.0". This defines a set of characters that are otherwise Narrow, like '0', that become wide when combined with `U+FE0F`, "VARIATION SELECTOR 16".

- change `wcwidth.wcswidth()` function, now tracks "last measured character", and, on U+FE0F, checks that character in table VS16_NARROW_TO_WIDE, and, if matching, adds 1 to the measured width.

- add `verify-table-integrity.py`, this is an unrelated file from previous work in #91 that should have been included there.

- new tests: The latest list of 'emoji-zwj-sequences.txt' and 'emoji-variation-sequences.txt' are fetched by update-tables.py and placed in 'tests/' folder, and now used by automatic tests in test_emoji_zwj.py, this is helpful to ensure 100% compatibility with all latest known emoji sequences

- fix issue with codecov.io token

Note: A single "9.0.0" version is used because of ambiguity in legacy releases of the emoji variation sequences files. So ambiguous, that very few terminals get it right! See https://ucs-detect.readthedocs.io/results.html for testing results.  I believe that U+FE0F is something of a "fixup" for early emojis. I don't expect any new U+FE0F sequences to be published.
@jquast
Copy link
Owner

jquast commented Nov 13, 2023

This fix will be soon released in version 0.2.10. I also tested many terminals for VS16 support, only about 28% of those tested support any kind of VS-16 sequence. https://ucs-detect.readthedocs.io/results.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants