-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update character width tables according to Unicode 9 #294
Conversation
Generated from the updated EastAsianWidth.txt using the following script: ``` fullwidth = IOBuffer() fullwidthsingle = IOBuffer() ambiguouswidth = IOBuffer() ambiguouswidthsingle = IOBuffer() function print_to_correct_buffer(rangebuf, singlebuf, range, str) if length(range) == 1 println(singlebuf, "[$str addCharactersInRange:NSMakeRange(0x$(hex(first(range))), 1)];") else f, l = hex(first(range)), hex(last(range)) println(rangebuf, "[$str addCharactersInRange:NSMakeRange(0x$f, 0x$l - 0x$f + 1)];") end end ranges = Any[] for line in readlines(open("EastAsianWidth.txt")) #Strip comments line[1] == '#' && continue precomment = split(line, '#')[1] #Parse code point range and width code tokens = split(precomment, ';') length(tokens) >= 2 || continue charrange = tokens[1] width = strip(tokens[2]) #Parse code point range into Julia UnitRange rangetokens = split(charrange, "..") charstart = parse(UInt32, "0x"*rangetokens[1]) charend = parse(UInt32, "0x"*rangetokens[length(rangetokens)>1 ? 2 : 1]) range = charstart:charend # Coalesce ranges if !isempty(ranges) && ranges[end][1] == width && last(ranges[end][2]) == first(range)-1 ranges[end] = (width, first(ranges[end][2]):last(range)) else push!(ranges, (width, range)) end end for (width, range) in ranges if width=="W" || width=="F" # wide or full print_to_correct_buffer(fullwidth, fullwidthsingle, range, "sFullWidth") elseif width == "A" print_to_correct_buffer(ambiguouswidth, ambiguouswidthsingle, range, "sAmbiguousWidth") end end ```
@gnachman I've updated the tests as much as I knew how to (for some reason it seems like a much larger number of tests fails for me locally, so I can't really reproduce this failure). Not sure what the remaining complication in the emoji test is. |
This is awesome! I'm glad to see some of the fixes for Emoji. The Golden tests are a pain to work with because every machine renders text slightly differently. The big question is when this should be enabled for the world. Should we support multiple versions of the width table? |
Given that working with emoji is pretty broken without this, I don't think there should be too much of a problem of just activating it immediately, but it's your call of course. |
Non-interactive use (e.g., Here's a screen recording demonstrating the craziness: https://iterm2.com/misc/Unicode9Bugs.mov With the Unicode 8 tables the emoji overlap each other and it's ugly but at least it's possible to edit. Different programs will get updated at different times, meaning everything's going to be broken for a while as far as emoji width goes. I think it should be an off-by-default option for now and I'll flip it on by default when there's more adoption. I'll make a note to merge this but put it behind an advanced pref. |
Could there be a proprietary escape code for a program to declare that it is unicode 9 aware? |
Yes. It would be awesome if other terminals could standardize on something. Let me reach out to Thomas Dickey and see what he thinks. |
Looks like xterm doesn't support emoji, so scratch that idea. We should just invent something and maybe others will follow. |
Do the bugs you mention also exist for East Asian characters, like I would argue to make the change immediately. It could be tough for some terminal applications that use emoji, but being double width is the correct behavior, as sanctioned by the Unicode standards. I don't see the point of a proprietary escape code. If an application is aware of iTerm2, couldn't it just use whatever mechanism gets the iTerm2 version (I forgot how that works) and decide how to render emoji based on that. Or better yet, just print emoji the "correct" way (no extra space after each emoji to keep them from overlapping), and require users to use an up-to-date terminal emulator to get the best formatting. |
I am unable to insert or paste emoji into bash. It just strips them from the text. How were you able to do that in your video? |
OK, I built this branch and I see the issue now (for some reason, I can't insert emoji in iTerm2 on my other machine; maybe I had a broken nightly, or maybe it's because this machine has Sierra?). The issue is the mismatch between the wcwidth (or equivalent) of the terminal application, and what iTerm2 thinks is happening. Sadly, you get equally broken behavior if wcwidth thinks that emoji should be double width, but iTerm doesn't. Here is a video using xonsh (which uses the wcwidth Python library, which as of 0.1.7 uses Unicode 9.0) and 3.0.20160720-nightly. So things are broken either way, unless both iTerm2 and the underlying application agree on how wide an emoji is. Hence, my argument is, allow to change the behavior (maybe with options to change only for certain applications), but by default, do the Unicode 9.0 behavior. That way, at least iTerm2 is pushing the other applications in the right direction (most terminal applications see the emulator as a source of truth when it comes to ambiguity anyway). |
I like pushing things forward but I also hate having a giant queue of bugs filed by confused users :) What I'd like to do is to define a new escape sequence to set the unicode version. At least this way it'll be possible to support both and people who know what they're doing can configure their apps to switch versions. |
Merged (mostly). Please see this page for info: https://gitlab.com/gnachman/iterm2/wikis/unicodeversionswitching Thanks for getting the ball rolling on this, @Keno , and for being on the right side of history, @asmeurer |
Generated from the updated EastAsianWidth.txt.