-
Notifications
You must be signed in to change notification settings - Fork 8.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ZWSP (\U200B) is rendered as a space when in 'grapheme clusters' mode #18267
Comments
Repro steps:
Output:
Example of using ZWSPhttps://en.wikipedia.org/wiki/Thai_script#Orthography
Thai text with tokenized words using ZWSP:
|
Your example of Thai script makes me believe that we should indeed treat the ZWSP like any other extender. We can't treat it as a standalone zero-width cell, since those cannot exist in a terminal. And we can't leave it as-is, because it's clearly quite important for languages like Thai. |
FYI, I haven't encountered any inconsistencies yet while leaving ZWSP as part of the preceding grapheme cluster. It seems that DirectWrite rasterization is not affected by the presence of ZWSP at the end of the cluster, although there may be some edge cases that I'm not aware of. |
You two are saying the same thing. :) |
Thanks for the fix. A leading zwsp still shows up as a space, such that the next character is in column 2. This differs from some other windows consoles I tried such as wezterm - but these consoles often seem to misreport the cursor position after emitting zwsp - so I'm not necessarily saying they are rolemodels in this area. (e.g they may linewrap too early due to this, which is even uglier) Also differs to default FreeBSD console behaviour. |
re-testing in a different way, it only seems to be an issue if the zwsp happened to fall at column 1 in the first place. A Thai sequence joined with zwsp (taken from o-sdn-o's example) seems to behave ok even at edge of screen - so I guess if this is a bug it's probably of fairly low importance. |
That's an issue we could fix in the future. Personally, I'm not yet convinced that it's an important edge case to fix, because I felt like a lone zero-width character in the first column should be a very rare occurrence. The reason it happens is because before inserting anything in our text buffer we check if it joins with the already existing character and then merge it together with the new input. For any other characters we clamp the width of each grapheme cluster to a value between 1 and 2. Since a zero-width character can't join with anything in the first column it will be measured as its own cluster which results in a width of 1. I filed an issue: #18296 |
Originally posted by @juliannoble in #11850
The text was updated successfully, but these errors were encountered: