Update character width tables according to Unicode 9 #294

Keno · 2016-06-24T04:34:38Z

Generated from the updated EastAsianWidth.txt.

Generated from the updated EastAsianWidth.txt using the following script: ``` fullwidth = IOBuffer() fullwidthsingle = IOBuffer() ambiguouswidth = IOBuffer() ambiguouswidthsingle = IOBuffer() function print_to_correct_buffer(rangebuf, singlebuf, range, str) if length(range) == 1 println(singlebuf, "[$str addCharactersInRange:NSMakeRange(0x$(hex(first(range))), 1)];") else f, l = hex(first(range)), hex(last(range)) println(rangebuf, "[$str addCharactersInRange:NSMakeRange(0x$f, 0x$l - 0x$f + 1)];") end end ranges = Any[] for line in readlines(open("EastAsianWidth.txt")) #Strip comments line[1] == '#' && continue precomment = split(line, '#')[1] #Parse code point range and width code tokens = split(precomment, ';') length(tokens) >= 2 || continue charrange = tokens[1] width = strip(tokens[2]) #Parse code point range into Julia UnitRange rangetokens = split(charrange, "..") charstart = parse(UInt32, "0x"*rangetokens[1]) charend = parse(UInt32, "0x"*rangetokens[length(rangetokens)>1 ? 2 : 1]) range = charstart:charend # Coalesce ranges if !isempty(ranges) && ranges[end][1] == width && last(ranges[end][2]) == first(range)-1 ranges[end] = (width, first(ranges[end][2]):last(range)) else push!(ranges, (width, range)) end end for (width, range) in ranges if width=="W" || width=="F" # wide or full print_to_correct_buffer(fullwidth, fullwidthsingle, range, "sFullWidth") elseif width == "A" print_to_correct_buffer(ambiguouswidth, ambiguouswidthsingle, range, "sAmbiguousWidth") end end ```

Keno · 2016-06-25T14:51:37Z

@gnachman I've updated the tests as much as I knew how to (for some reason it seems like a much larger number of tests fails for me locally, so I can't really reproduce this failure). Not sure what the remaining complication in the emoji test is.

gnachman · 2016-06-29T17:17:19Z

This is awesome! I'm glad to see some of the fixes for Emoji.

The Golden tests are a pain to work with because every machine renders text slightly differently.

The big question is when this should be enabled for the world. Should we support multiple versions of the width table?

Keno · 2016-06-29T17:21:13Z

Given that working with emoji is pretty broken without this, I don't think there should be too much of a problem of just activating it immediately, but it's your call of course.

gnachman · 2016-06-29T22:21:08Z

Non-interactive use (e.g., cat emoji.txt) is much better with Unicode 9 but both emacs and bash get totally confused (the cursor position does not correspond with the where edits will actually occur).

Here's a screen recording demonstrating the craziness: https://iterm2.com/misc/Unicode9Bugs.mov

With the Unicode 8 tables the emoji overlap each other and it's ugly but at least it's possible to edit.

Different programs will get updated at different times, meaning everything's going to be broken for a while as far as emoji width goes.

I think it should be an off-by-default option for now and I'll flip it on by default when there's more adoption.

I'll make a note to merge this but put it behind an advanced pref.

Keno · 2016-06-29T22:24:54Z

Could there be a proprietary escape code for a program to declare that it is unicode 9 aware?

gnachman · 2016-06-29T22:35:39Z

Yes. It would be awesome if other terminals could standardize on something. Let me reach out to Thomas Dickey and see what he thinks.

gnachman · 2016-06-29T22:45:03Z

Looks like xterm doesn't support emoji, so scratch that idea. We should just invent something and maybe others will follow.

asmeurer · 2016-07-28T04:52:34Z

Do the bugs you mention also exist for East Asian characters, like コンニチハ? For me, in iTerm2, both in bash and in emacs, it seems to work just fine, except for an issue where if I select a character the right half is not inverse-videoed correctly.

I would argue to make the change immediately. It could be tough for some terminal applications that use emoji, but being double width is the correct behavior, as sanctioned by the Unicode standards.

I don't see the point of a proprietary escape code. If an application is aware of iTerm2, couldn't it just use whatever mechanism gets the iTerm2 version (I forgot how that works) and decide how to render emoji based on that. Or better yet, just print emoji the "correct" way (no extra space after each emoji to keep them from overlapping), and require users to use an up-to-date terminal emulator to get the best formatting.

asmeurer · 2016-07-28T15:50:24Z

I am unable to insert or paste emoji into bash. It just strips them from the text. How were you able to do that in your video?

asmeurer · 2016-07-28T21:34:08Z

OK, I built this branch and I see the issue now (for some reason, I can't insert emoji in iTerm2 on my other machine; maybe I had a broken nightly, or maybe it's because this machine has Sierra?).

The issue is the mismatch between the wcwidth (or equivalent) of the terminal application, and what iTerm2 thinks is happening.

Sadly, you get equally broken behavior if wcwidth thinks that emoji should be double width, but iTerm doesn't. Here is a video using xonsh (which uses the wcwidth Python library, which as of 0.1.7 uses Unicode 9.0) and 3.0.20160720-nightly.

So things are broken either way, unless both iTerm2 and the underlying application agree on how wide an emoji is.

Hence, my argument is, allow to change the behavior (maybe with options to change only for certain applications), but by default, do the Unicode 9.0 behavior. That way, at least iTerm2 is pushing the other applications in the right direction (most terminal applications see the emulator as a source of truth when it comes to ambiguity anyway).

gnachman · 2016-08-04T04:54:37Z

I like pushing things forward but I also hate having a giant queue of bugs filed by confused users :)

What I'd like to do is to define a new escape sequence to set the unicode version. At least this way it'll be possible to support both and people who know what they're doing can configure their apps to switch versions.

gnachman · 2016-08-08T00:39:13Z

Merged (mostly). Please see this page for info: https://gitlab.com/gnachman/iterm2/wikis/unicodeversionswitching

Thanks for getting the ball rolling on this, @Keno , and for being on the right side of history, @asmeurer

Keno mentioned this pull request Jun 24, 2016

Julia doesn't like Pizza JuliaLang/julia#3721

Closed

Update tests for new character widths

c8e3bb7

fornwall mentioned this pull request Aug 1, 2016

Emoji UI bug neovim/neovim#5149

Closed

gnachman closed this Aug 8, 2016

senorprogrammer mentioned this pull request May 24, 2018

Emoji display in titles is wonky wtfutil/wtf#15

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update character width tables according to Unicode 9 #294

Update character width tables according to Unicode 9 #294

Keno commented Jun 24, 2016

Keno commented Jun 25, 2016

gnachman commented Jun 29, 2016

Keno commented Jun 29, 2016

gnachman commented Jun 29, 2016

Keno commented Jun 29, 2016

gnachman commented Jun 29, 2016

gnachman commented Jun 29, 2016

asmeurer commented Jul 28, 2016

asmeurer commented Jul 28, 2016

asmeurer commented Jul 28, 2016

gnachman commented Aug 4, 2016

gnachman commented Aug 8, 2016

Update character width tables according to Unicode 9 #294

Update character width tables according to Unicode 9 #294

Conversation

Keno commented Jun 24, 2016

Keno commented Jun 25, 2016

gnachman commented Jun 29, 2016

Keno commented Jun 29, 2016

gnachman commented Jun 29, 2016

Keno commented Jun 29, 2016

gnachman commented Jun 29, 2016

gnachman commented Jun 29, 2016

asmeurer commented Jul 28, 2016

asmeurer commented Jul 28, 2016

asmeurer commented Jul 28, 2016

gnachman commented Aug 4, 2016

gnachman commented Aug 8, 2016