-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
revamp wcwidth #1467
Comments
While playing around with the custom setting in #1470 which addresses 2. from the list above it became clear that the emulator should primarily follow the wcwidth behavior of the backend system (where the pty is running) to avoid all kinds of weird cursor and eol behavior. This is due intelligent backend libs like libreadline that anticipate the terminal behavior and are used in almost every terminal program nowadays. It can be solved with another node package that collects the wcwidth output from the server side. For not nodejs based server components it might need a small C helper to get those data. |
Kinda offtopic here but related to recent issues - it is kinda hard to get wcwidth to respect the backend situation in a platform independent fashion because of the following:
|
5. When using xterm.js as an SSH frontend, there is no way to reliably determine the wcwith of the target system. @egmontkob Do you know how VTE solves this, especially for the SSH case? |
Here is a quite elaborated view on this problem: mobile-shell/mosh#234 (comment) |
@jerch What a mess 😢Seems unsolvable without having a living global specification for wcwidth that everybody can follow. |
VTE simply uses whatever's returned by g_unichar_iswide(), which is already sub-ideal, we have a pending bug 772890 to perhaps use glibc's wcwidth() instead. In case these methods are locale-dependent (are they really? even among the UTF-8 ones?), simply the process's locale is used (in case of gnome-terminal it's the locale set up by systemd --user). Plus there's an API option in VTE (compatibility preference in gnome-terminal) for ambiguous-width characters to be treated as wide. This changes the width of several characters. The set is somewhat arbitrary, and is subject to accidental as well as planned further changes, see bug 791452. Alas there's no locale definition to back this up, I vaguely recall there's a long-standing open (or rejected?) glibc bug on sourceware, as far as I recall it had a four-digit bug number, but the last time I tried I couldn't find it. This means that apps can't really use this behavior of the terminal emulator. Ideally I guess there'd be no such API in VTE, rather you could switch (both VTE and the apps running inside) into a different locale with a different wcwidth. That still wouldn't solve the problem of ssh, though. And wouldn't solve how the locale used by the terminal emulator would match up with the locale of the app. Maybe be need to introduce an escape sequence for that?? And then juggle with multiple locales inside a single terminal emulator process (e.g. in case of gnome-terminal), using newlocale/uselocale/whatever_l and friends.
Even worse, a specification couldn't solve it either, see the mess caused by Unicode 9.0 changing the width of certain already existing codepoints (e.g. VTE 772812 and tons of duplicates out there). We'd either need a specification that's guaranteed not to change in backwards incompatible ways (Unicode seems not to make such a guarantee), or introduce some kind of versioning. I guess versioning is required anyways, an app should be able to tell if the terminal emulator doesn't yet recognize a new codepoint. And of course, design all of these in a way that can even work across ssh, across different OSes with different notations for locales. Sigh. Maybe, maybe just ignoring this whole problem set and living with the tiny breakages every now and then is better than overengineering it?? |
@egmontkob Thanks for your deep insights, very helpful!
I'm about to agree 😅 @jerch Maybe it is a good foundation to have a pluggable wcwidth implementation, ideally one that can be hot-plugged (so it re-evaluates the CHAR_WIDTH stored in our buffer once it gets swapped out, then triggers a renderer refresh). This would at least allow us to ship different implementations and have the user switch between them (live) to see which one works best. It would also provide a way for a backend to determine its wcwidth characteristics and then choose a matching implementation in the frontend. |
Lol what a nightmare, seems to be unfixable without the versioning thing. This is also an issue with the other unicode algos like the grapheme splitting and maybe bidi (have not yet read deeper into the latter). I like the idea to have some escape sequence asking for unicode support XY, but thats a different story, it wont help here unless specified and supported by a critical mass of cmdline apps. @egmontkob Are there any efforts drafted yet into this direction? @mofux Yeah gonna stick to the pluggable concept, not sure yet if it makes sense to deliver preconfigured unicode version sets, not sure how closely duplexers like tmux, screen and such follow the unicode spec or do their "own thing". About live patching the widths: also not sure yet how to get this done in a sane way - we dont record the input data atm and the buffer is likely to change all over if the first char is misaligned (with all weirdness like changing the cursor position and the line wraps). |
No, not that I'm aware of. Note that a specification without a reasonably easy way of implementing it by emulators and apps is unlikely to get adopted. And here for the apps we'd at least need a way to query the Unicode version (I'm not aware of such method), and for the emulators we'd need to even support more of them at the same time (there are much fewer emulators than apps, so maybe inlining the databases is okay here). |
Yes I think we have there all needed bits that need to be wrapped up. Closing this issue. |
wcwidth
needs several enhancements:This would make it possible to overload the default width set with some custom values. Those values could be based on backend or frontend or elseone's needs.
wcwidth
with future grapheme handling. Most of the wcwidth rules are a subset of the grapheme clustering. Once we have working grapheme handling merging them might save a lot of runtime. This is somewhat doubtful because grapheme handling might not work at an early stage with current cell based buffer model and should be added at a later stage (closer to the renderer). See also Grapheme support #1468.The text was updated successfully, but these errors were encountered: