Multibyte characters get corrupted when KEYBD trap is set #307

posguy99 · 2021-05-14T18:08:40Z

See:

https://apple.stackexchange.com/questions/420570/i-am-typing-into-my-terminal-and-the-stdin-is-changing-and-adding-characters-on

The stackexchange posting is about zsh, but I tested all the shells I have installed on macOS Catalina. Only ksh93u+m (and 93u+) mangles the input line.

Doesn't matter in emacs mode, in vi mode, both of them off.

McDutchie · 2021-05-14T19:33:07Z

It works for me, nothing gets mangled. What is the output of your locale command when run from ksh?

posguy99 · 2021-05-14T19:40:05Z

[51] mbp13 $ locale
LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL=

Same as for all the other shells. :)

So if you copy that line, and paste it into the shell, you don't get a bell and this:

$ 
$ $ questions/21/

Don't have to try to execute it.

For bash:

$ echo $BASH_VERSION
3.2.57(1)-release
$ curl -d " 印度是最好的 The weather is good" -X POST http://localhost:8000/questions/21/

Again, don't have to try to execute it, just paste it.

posguy99 · 2021-05-14T19:46:19Z

It happens with ksh93u+ too, it’s not a u+m bug.

McDutchie · 2021-05-14T20:24:04Z

So if you copy that line, and paste it into the shell, you don't get a bell and this:
$ 
$ $ questions/21/
`

These are the results I get:

So, it works fine on my end on both 93u+m and 93u+. There is something different about our setups and we need to find out what it is, or I'm not going to be able to try to debug this problem.

@JohnoKing, @hyenias, can either of you reproduce this?

hyenias · 2021-05-14T20:34:52Z

I am unable to reproduce locally using any of my three terminal clients (iTERM, Terminal, and Alacritty). I tested with what I had on my mac at the time (Version AJMv 93u+m/1.0.0-alpha+92f7ca54 2021-05-07) and default macOS ksh. But I do not have macOS Catalina, I have 10.13.6. I am recompiling now with current master. I pasted without issue and even changed the curl command to a function call of echo of "$@" without issue.

posguy99 · 2021-05-14T20:42:49Z

It's my keyboard setup script that runs during .kshrc...

keybd_trap () {
  case ${.sh.edchar} in
    $'\e[1~') .sh.edchar=$'\001';;      # Home = beginning-of-line
    $'\e[4~') .sh.edchar=$'\005';;      # End = end-of-line
    $'\e[5~') .sh.edchar=$'\e>';;       # PgUp = history-previous
    $'\e[6~') .sh.edchar=$'\e<';;       # PgDn = history-next
    $'\e[3~') .sh.edchar=$'\004';;      # Delete = delete-char
  esac
}
trap keybd_trap KEYBD

It's enough that the keyboard trap is set. The case/esac in the function can be completely empty, but setting the trap is enough to cause the failure.

And all you actually have to paste is the 印度是最好的

JohnoKing · 2021-05-14T20:53:31Z

I think the bug @posguy99 is experiencing may be related to (or is the same as) att#197. After launching ksh with ENV=/dev/null ksh and setting up the keyboard trap in #307 (comment) I get very broken behavior (on Linux x86_64):

# Repeated attempts to paste `curl -d " 印度是最好的 The weather is good" -X POST http://localhost:8000/questions/21/`
# result in `curl -d " p^` instead.
$ curl -d " p^
$ questions/21/
ksh: questions/21/: not found
$ curl -d " p^
$ questions/21/
ksh: questions/21/: not found

hyenias · 2021-05-14T21:04:47Z

Agreed. I am able to reproduce the behavior meaning any attempt to paste 印度是最好的 fails when a KEYBD trap is enabled. When I turn KEYBD trap off, I can paste into the command line.

$ exec ksh +E --emacs
$
$ keybd_trap () { : ; }
$ trap keybd_trap KEYBD
$ $
$ trap - KEYBD
$ 印度是最好的
ksh: 印度是最好的: not found
$

McDutchie · 2021-05-14T21:33:06Z

Confirmed. After setting that KEYBD trap, I not only cannot paste multibyte characters, I can't type them either. Ugh.

McDutchie · 2021-05-14T21:35:18Z

In fact, the keyboard trap doesn't even have to do anything. Just trap : KEYBD is enough to mess up all multibyte processing on the command line. At least that should make the fix fairly straightforward, if we can locate the code block where multibyte support is missing.

posguy99 · 2021-05-14T21:53:33Z

The RH proposed patch in att#83 fixes it, and doesn’t cause any regression test failures.

I'm not sophisticated enough to understand what @kdudka thought was wrong with the patch, though. The trap only pulls single bytes out of the input queue and that causes the bug?

(sorry for the email noise... proofread comment before pressing green button)

McDutchie · 2021-05-14T22:18:53Z

I don't know what they're on about in the att#83 discussion re: compilers supporting signed chars or not. As far as I can tell, it's irrelevant, because the data type of the variable that may contain a negated character is int, not char. If anyone has any thoughts on that, I'm all ears; I may be missing something.

A cleaned-up version of that patch is:

--- a/src/cmd/ksh93/edit/edit.c
+++ b/src/cmd/ksh93/edit/edit.c
@@ -1150,12 +1150,9 @@ int ed_getchar(register Edit_t *ep,int mode)
 							break;
 					}
 				}
-				if(n=keytrap(ep,readin,n,LOOKAHEAD-n,mode))
-				{
-					putstack(ep,readin,n,0);
-					c = ep->e_lbuf[--ep->e_lookahead];
-				}
 				else
+					c = -c;
+				if(!keytrap(ep,readin,n,LOOKAHEAD-n,mode))
 					c = ed_getchar(ep,mode);
 				ep->e_keytrap = 0;
 			}

It does fix the problem, but introduces a new one: the KEYBD trap function in this comment no longer works correctly. After loading that, the Home, End, PgUp, PgDn and Del keys all stop working, and insert the last character of their escape sequence into the command line instead. And so do all the other VT220 keys like the arrow keys.

Still, the correct fix is probably somewhere in this general direction.

McDutchie · 2021-05-14T23:04:22Z

I played around some with that patch and I think I may have fixed it without breaking the KEYBD trap. Please try to break this patch:

--- a/src/cmd/ksh93/edit/edit.c
+++ b/src/cmd/ksh93/edit/edit.c
@@ -1150,10 +1150,15 @@ int ed_getchar(register Edit_t *ep,int mode)
 							break;
 					}
 				}
+				else
+					c = -c;
 				if(n=keytrap(ep,readin,n,LOOKAHEAD-n,mode))
 				{
+					int maybe_c;
 					putstack(ep,readin,n,0);
-					c = ep->e_lbuf[--ep->e_lookahead];
+					maybe_c = ep->e_lbuf[--ep->e_lookahead];
+					if(maybe_c > 0)
+						c = maybe_c;
 				}
 				else
 					c = ed_getchar(ep,mode);

McDutchie · 2021-05-14T23:23:37Z

Hmm. I can type multibyte characters (like é or ü) just fine with that patch, but pasting the Chinese characters is still broken. :/

hyenias · 2021-05-14T23:46:23Z

Ahhh. As for some near-and-dear Japanese characters, try コーンシェル aka Kornshell. They work for me but the 印度是最好的 (India is the best) does not work correctly. All the Japanese characters for Kornshell (コーンシェル ) are ~~two bytes~~ three bytes each whereas perhaps 印度是最好 might have some 3 or 4 bytes characters.

Edit: No idea as to why the 3 byte Japanese UTF8 characters work and the 3 byte Chinese UTF8 characters do not when using your #307 (comment) [maybe_c version].

arch/*/bin/ksh +E --emacs
$ keybd_trap () { : ; }
$ trap keybd_trap KEYBD
$ p度/
$
$ コーンシェル
arch/linux.arm/bin/ksh: コーンシェル: not found
$ trap - KEYBD
$ 印度是最好
arch/linux.arm/bin/ksh: 印度是最好: not found
$ コーンシェル
arch/linux.arm/bin/ksh: コーンシェル: not found
$

posguy99 · 2021-05-14T23:48:35Z

In UTF-8 they're all three bytes.

posguy99 · 2021-05-14T23:53:35Z

@hyenias , so are the sample ones you posted.

hyenias · 2021-05-15T00:04:07Z

Well, I certainly stand corrected. You just helped me out on #189! Thank you so much as I thought each were two bytes in length.

The cleaned up version of att#83 at #307 (comment) seems to work for me on my limited tests.

posguy99 · 2021-05-15T00:12:52Z

With the patch in #307 (comment) , the example Japanese text from @hyenias works without issue, but the Chinese text does not. If you paste the Chinese characters individually, the second and last characters work, but the others have various misbehaviors.

posguy99 · 2021-05-15T18:17:30Z

With the patch in #307 (comment) there is no change for the reproducer in the RH bugreport https://bugzilla.redhat.com/show_bug.cgi?id=1503922 either.

The characters are spaced correctly, though, like there's some masking of high-order bits going on. They're not random.

posguy99 · 2021-07-24T18:44:49Z

Here's more weirdness, if you ^V and then try to paste the first Chinese character, you get the Chinese character 印 like you should.

If you just paste the first Chinese character, you get a p.

KEYBD trap present for both tests.

Edit: Does ^V turn off the trap?

posguy99 · 2021-07-24T20:06:22Z

Even more weirdness...

#!/usr/bin/env ksh
while read -rsn1 k; do
 echo $k
done

And you get the Chinese characters when you paste them.

JohnoKing · 2022-01-04T12:03:31Z

I decided to check how old versions of ksh behaved and it appears this bug is a regression introduced in 93t+ 2009-10-07. The patch below (when applied on top of 93t+ 2009-09-17) is what introduced the bug:

--- a/src/cmd/ksh93/RELEASE
+++ b/src/cmd/ksh93/RELEASE
@@ -1,4 +1,9 @@
-09-09-17  --- Release ksh93t+  ---
+09-10-07  --- Release ksh93t+  ---
+09-10-07  $PATH processing has been changed to delay dir stat() and .paths
+	  lookup until the directory is needed in the path search.
+09-09-28  Call the ast setlocale() intercept on unset too.
+09-09-24  A bug in which LANG=foo; LC_ALL=foo; unset LC_ALL; did not revert
+	  LC_CTYPE etc. to the LANG value has been fixed.
 09-09-17  A bug in which unsetting SVLVL could cause a script invoked by
 	  name without #! to core dump has been fixed.
 09-09-16  A bug in which a pipeline in a here-document could hang when the
--- a/src/cmd/ksh93/sh/init.c
+++ b/src/cmd/ksh93/sh/init.c
@@ -355,13 +355,17 @@ static void put_cdpath(register Namval_t* np,const char *val,int flags,Namfun_t
     }
 #endif
 
-    /* Trap for LC_ALL, LC_TYPE, LC_MESSAGES, LC_COLLATE and LANG */
+    /* Trap for LC_ALL, LC_CTYPE, LC_MESSAGES, LC_COLLATE and LANG */
     static void put_lang(Namval_t* np,const char *val,int flags,Namfun_t *fp)
     {
 	Shell_t *shp = nv_shell(np);
 	int type;
 	char *lc_all = nv_getval(LCALLNOD);
 	char *name = nv_name(np);
+	if((shp->test&1) && !val && !nv_getval(np))
+		return;
+	if(shp->test&2)
+		nv_putv(np, val, flags, fp);
 	if(name==(LCALLNOD)->nvname)
 		type = LC_ALL;
 	else if(name==(LCTYPENOD)->nvname)
@@ -372,22 +376,30 @@ static void put_cdpath(register Namval_t* np,const char *val,int flags,Namfun_t
 		type = LC_COLLATE;
 	else if(name==(LCNUMNOD)->nvname)
 		type = LC_NUMERIC;
-	else if(name==(LANGNOD)->nvname && (!lc_all || *lc_all==0))
-		type = LC_ALL;
+#ifdef LC_LANG
+	else if(name==(LANGNOD)->nvname)
+		type = LC_LANG;
+#else
+#define LC_LANG		LC_ALL
+	else if(name==(LANGNOD)->nvname && (!lc_all || !*lc_all))
+		type = LC_LANG;
+#endif
 	else
 		type= -1;
 	if(sh_isstate(SH_INIT) && type>=0 && type!=LC_ALL && lc_all && *lc_all)
 		type= -1;
-	if(type>=0 || type==LC_ALL)
+	if(type>=0 || type==LC_ALL || type==LC_LANG)
 	{
-		if(!setlocale(type,val?val:""))
+		if(!setlocale(type,val?val:"-") && val)
 		{
 			if(!sh_isstate(SH_INIT) || shp->login_sh==0)
 				errormsg(SH_DICT,0,e_badlocale,val);
 			return;
 		}
 	}
-	if(CC_NATIVE==CC_ASCII && (type==LC_ALL || type==LC_CTYPE))
+	if(!(shp->test&2))
+		nv_putv(np, val, flags, fp);
+	if(CC_NATIVE==CC_ASCII && (type==LC_ALL || type==LC_LANG || type==LC_CTYPE))
 	{
 		if(sh_lexstates[ST_BEGIN]!=sh_lexrstates[ST_BEGIN])
 			free((void*)sh_lexstates[ST_BEGIN]);
@@ -438,7 +450,6 @@ static void put_cdpath(register Namval_t* np,const char *val,int flags,Namfun_t
 	if(type==LC_ALL || type==LC_MESSAGES)
 		error_info.translate = msg_translate;
 #endif
-	nv_putv(np, val, flags, fp);
     }
 #endif /* _hdr_locale */
 
--- a/src/lib/libast/RELEASE
+++ b/src/lib/libast/RELEASE
@@ -1,3 +1,7 @@
+09-10-05 _sfopen.c: add but ignore 'F' flags for stdio compatibility
+09-09-28 fts.h,ftwalk.h,fts.c: promote { namelen pathlen level } to (s)size_t
+09-09-28 locales: add AST_LC_LANG for $LANG
+09-09-28 setlocale.c: fix logic for dynamic { LANG LC_ALL LC_* } changes
 09-09-17 include/sfio.h,sfio/sfwalk.c: add sfwalk()
 09-09-09 sfio/sfputr.c: add SIGPIPE hang fix
 09-08-24 sfio/sfreserve.c: fix SF_UNBOUND logic with pushed streams
--- a/src/lib/libast/comp/setlocale.c
+++ b/src/lib/libast/comp/setlocale.c
@@ -570,7 +570,7 @@ static const signed char	utf8tab[256] =
 	4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6,-1,-1,
 };
 
-int
+static int
 utf8_mbtowc(wchar_t* wp, const char* str, size_t n)
 {
 	register unsigned char*	sp = (unsigned char*)str;
@@ -616,7 +616,7 @@ utf8_mbtowc(wchar_t* wp, const char* str, size_t n)
 	return -1;
 }
 
-int
+static int
 utf8_mblen(const char* str, size_t n)
 {
 	wchar_t		w;
@@ -749,6 +749,9 @@ Lc_category_t		lc_categories[] =
 { "LC_PAPER",         LC_PAPER,         AST_LC_PAPER,         0               },
 };
 
+static Lc_t*		lang;
+static Lc_t*		lc_all;
+
 typedef struct Unamval_s
 {
 	char*		name;
@@ -816,7 +819,7 @@ single(int category, Lc_t* lc)
 	const char*	sys;
 	int		i;
 
-	if (!lc && !(lc = lc_categories[category].prev))
+	if (!lc && !(lc = lc_all) && !(lc = lc_categories[category].prev) && !(lc = lang))
 		lc = lcmake(NiL);
 	if (locales[category] != lc)
 	{
@@ -982,6 +985,12 @@ composite(register const char* s, int initialize)
 
 /*
  * setlocale() intercept
+ *
+ * locale:
+ *	0	query
+ *	""	initialize from environment
+ *	"-"	unset
+ *	*	set
  */
 
 char*
@@ -1008,7 +1017,7 @@ _ast_setlocale(int category, const char* locale)
 		 */
 
 	compose:
-		if (category != AST_LC_ALL)
+		if (category != AST_LC_ALL && category != AST_LC_LANG)
 			return (char*)locales[category]->name;
 		if (!sp && !(sp = sfstropen()))
 			return 0;
@@ -1043,9 +1052,12 @@ _ast_setlocale(int category, const char* locale)
 		return sfstruse(sp);
 	}
 	if (!ast.locale.serial++)
+	{
 		stropt(getenv("LC_OPTIONS"), options, sizeof(*options), setopt, NiL);
+		initialized = 0;
+	}
 	if (*locale)
-		p = lcmake(locale);
+		p = streq(locale, "-") ? (Lc_t*)0 : lcmake(locale);
 	else if (!initialized)
 	{
 		char*	u;
@@ -1057,63 +1069,75 @@ _ast_setlocale(int category, const char* locale)
 		 */
 
 		u = 0;
-		if (!(a = getenv("LC_ALL")) || !*a)
+		if ((s = getenv("LANG")) && *s)
 		{
-			for (i = 1; i < AST_LC_COUNT; i++)
-				if ((s = getenv(lc_categories[i].name)) && *s)
-				{
-					if (streq(s, local) && (u || (u = native_locale(locale, tmp, sizeof(tmp)))))
-						s = u;
-					lc_categories[i].prev = lcmake(s);
-				}
-			a = getenv("LANG");
+			if (streq(s, local) && (u || (u = native_locale(locale, tmp, sizeof(tmp)))))
+				s = u;
+			lang = lcmake(s);
 		}
-		if (a)
+		else
+			lang = 0;
+		if ((s = getenv("LC_ALL")) && *s)
 		{
-			if (streq(a, local) && (u || (u = native_locale(locale, tmp, sizeof(tmp)))))
-				a = u;
-			if (composite(a, 1))
-				a = 0;
+			if (streq(s, local) && (u || (u = native_locale(locale, tmp, sizeof(tmp)))))
+				s = u;
+			lc_all = lcmake(s);
 		}
-		p = 0;
+		else
+			lc_all = 0;
 		for (i = 1; i < AST_LC_COUNT; i++)
-		{
-			if (!lc_categories[i].prev)
+			if ((s = getenv(lc_categories[i].name)) && *s)
 			{
-				if (!p && !(p = lcmake(a)))
-					break;
-				lc_categories[i].prev = p;
+				if (streq(s, local) && (u || (u = native_locale(locale, tmp, sizeof(tmp)))))
+					s = u;
+				lc_categories[i].prev = lcmake(s);
 			}
-			if (!single(i, lc_categories[i].prev))
+			else
+				lc_categories[i].prev = 0;
+		for (i = 1; i < AST_LC_COUNT; i++)
+			if (!single(i, lc_all ? lc_all : lc_categories[i].prev))
 			{
 				while (i--)
 					single(i, NiL);
 				return 0;
 			}
-		}
 		if (ast.locale.set & AST_LC_debug)
 			for (i = 1; i < AST_LC_COUNT; i++)
-				sfprintf(sfstderr, "locale env  %17s %s\n", lc_categories[i].name, locales[i]->name);
+				sfprintf(sfstderr, "locale env  %17s %16s %16s\n", lc_categories[i].name, locales[i]->name, lc_categories[i].prev ? lc_categories[i].prev->name : (char*)0);
 		initialized = 1;
 		goto compose;
 	}
-	else if (!(p = lc_categories[category].prev))
+	else if (category == AST_LC_LANG || !(p = lc_categories[category].prev))
 		p = lcmake("C");
-	if (category != AST_LC_ALL)
+	if (category == AST_LC_LANG)
+	{
+		if (lang != p)
+		{
+			lang = p;
+			if (!lc_all)
+				for (i = 1; i < AST_LC_COUNT; i++)
+					if (!single(i, lc_categories[i].prev))
+					{
+						while (i--)
+							single(i, NiL);
+						return 0;
+					}
+		}
+	}
+	else if (category != AST_LC_ALL)
 		return single(category, p);
-	else if (!(i = composite(locale, 0)))
+	else if ((i = composite(locale, 0)) < 0)
+		return 0;
+	else if (lc_all != p)
 	{
-		if (!p)
-			return 0;
+		lc_all = p;
 		for (i = 1; i < AST_LC_COUNT; i++)
-			if (!single(i, p))
+			if (!single(i, lc_all ? lc_all : lc_categories[i].prev))
 			{
 				while (i--)
 					single(i, NiL);
 				return 0;
 			}
 	}
-	else if (i < 0)
-		return 0;
 	goto compose;
 }
--- a/src/lib/libast/include/ast_std.h
+++ b/src/lib/libast/include/ast_std.h
@@ -154,6 +154,7 @@ extern char*		strerror(int);
 #define AST_LC_MEASUREMENT	12
 #define AST_LC_PAPER		13
 #define AST_LC_COUNT		14
+#define AST_LC_LANG		255
 
 #define AST_LC_find		(1L<<28)
 #define AST_LC_debug		(1L<<29)
@@ -202,6 +203,9 @@ extern char*		strerror(int);
 #ifndef LC_PAPER
 #define LC_PAPER		(-AST_LC_PAPER)
 #endif
+#ifndef LC_LANG
+#define LC_LANG			(-AST_LC_LANG)
+#endif
 
 #undef	extern
 
--- a/src/lib/libast/port/lc.c
+++ b/src/lib/libast/port/lc.c
@@ -125,6 +125,7 @@ lcindex(int category, int min)
 	case LC_COLLATE:	return AST_LC_COLLATE;
 	case LC_CTYPE:		return AST_LC_CTYPE;
 	case LC_IDENTIFICATION:	return AST_LC_IDENTIFICATION;
+	case LC_LANG:		return AST_LC_LANG;
 	case LC_MEASUREMENT:	return AST_LC_MEASUREMENT;
 	case LC_MESSAGES:	return AST_LC_MESSAGES;
 	case LC_MONETARY:	return AST_LC_MONETARY;

JohnoKing · 2022-01-04T12:25:46Z

Unsetting the $LANG variable can be used as a workaround for the bug:

$ trap : KEYBD
$ unset LANG
$ print 印度是最好的
印度是最好的

posguy99 · 2022-01-05T05:18:11Z

Unsetting $LANG isn't a perfect workaround on macOS here. It will let you input the characters in question... but now try to backspace. It doesn't seem to realize how wide they are. Each character takes two backspaces to actually remove.

$ unset LANG
$ print 印度是最好的
印度是最好的
$ 
$ print 印度是最好  <-- one backspace here removed the character, but left a space
印度是最好? <- question mark output
$

McDutchie · 2022-01-05T06:55:29Z

Yes, that is the logical consequence of unsetting LANG – because that is how ksh (and any other program) knows it's in a locale that uses multi-byte characters.

posguy99 · 2022-01-10T16:42:25Z

Yes, that is the logical consequence of unsetting LANG – because that is how ksh (and any other program) knows it's in a locale that uses multi-byte characters.

Just for fun...

ksh93u+m has the odd backspace behavior when you unset LANG, but bash and mksh do not.

zsh has an even weirder behavior:

$ zsh
mwilson@mbp13 ~ % unset LANG
mwilson@mbp13 ~ % ?<008d>?度?<0098>??<009c><0080>好?<009a><0084>

oksh has the odd backspace behavior whether LANG is set or not. 😄

McDutchie · 2022-02-12T20:54:31Z

A fix remains elusive, but we've got to do something with this bug before releasing 1.0. Currently, if a KEYBD trap is set in a multibyte locale (which almost everyone uses these days), not just the trap but the entire shell essentially becomes unusable until the trap is unset. That's not an acceptable situation.

The patch below simply refuses to execute the KEYBD trap if a multibyte locale is active (mbwide()) and the character is greater than 128. For some reason that I don't understand yet, the value of the character turned negative is stored in the integer variable c, so we check for c > -128.

That effectively fixes this bug, leaving a much more benign bug or limitation in its place; in multibyte locales, the KEYBD trap is only triggered for ASCII characters.

diff --git a/src/cmd/ksh93/edit/edit.c b/src/cmd/ksh93/edit/edit.c
index d9933b63b..be31ec7cb 100644
--- a/src/cmd/ksh93/edit/edit.c
+++ b/src/cmd/ksh93/edit/edit.c
@@ -1122,7 +1122,9 @@ int ed_getchar(register Edit_t *ep,int mode)
 				killpg(getpgrp(),SIGINT);
 				siglongjmp(ep->e_env, UINTR);
 			}
-			if(mode<=0 && sh.st.trap[SH_KEYTRAP])
+			if(mode<=0 && sh.st.trap[SH_KEYTRAP]
+			/* workaround for https://github.com/ksh93/ksh/issues/307 -- ignore non-ASCII in multibyte locale */
+			&& (!mbwide() || c > -128))
 			{
 				ep->e_keytrap = 1;
 				n=1;

posguy99 · 2022-02-12T22:59:48Z

Github needs a WONTFIXRIGHTNOW. 😏

In UTF-8 locales, ksh breaks when a KEYBD trap is active, even a dummy no-op one like 'trap : KEYBD'. Entering multi-byte characters fails (the input is interrupted and a new prompt is displayed) and pasting content with multi-byte characters produces corrupted results. The cause is that the KEYBD trap code is not multibyte-ready. Unfortunately nobody yet understands the edit.c code well enough to implement a proper fix. Pending that, this commit implements a workaround that at least avoids breaking the shell. src/cmd/ksh93/edit/edit.c: ed_getchar(): - When a multi-byte locale is active, do not trigger the the KEYBD trap except for ASCII characters (1-127). Resolves: #307

The KEYBD trap should now be fully functional for UTF-8 and other multibyte locales. Thanks to Johnothan King for finding this fix! Analysis: The KEYBD trap code processes character code points stored in e_lbuf by ed_read(). But shell variables store bytes, not characters. So, in UTF-8 locales for example, the Unicode code points need to be converted to multibyte UTF-8 encoding. This is needed to calculate the length of each encoded character in bytes (which fixes the corruption issue) and for keytrap() to store its UTF-8 representation in ${.sh.edchar}. src/cmd/ksh93/edit/edit.c: ed_getchar(): - Remove the workaround from the referenced commit. - Use mbconv to convert innput codepoints to bytes before adding them to inbuff, a char array that is passed on to keytrap(). Related: https://bugzilla.redhat.com/show_bug.cgi?id=1503922 Related: att#197 Related: #307 Resolves: #460 Co-authored-by: Johnothan King <[email protected]>

McDutchie added the bug Something is not working label May 14, 2021

McDutchie changed the title ~~KSH and Chinese input characters~~ Multibyte characters get corrupted when KEYBD trap is set May 14, 2021

McDutchie closed this as completed in 6b77da2 Feb 12, 2022

McDutchie mentioned this issue Feb 12, 2022

Make KEYBD trap handling code multibyte-aware #460

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multibyte characters get corrupted when KEYBD trap is set #307

Multibyte characters get corrupted when KEYBD trap is set #307

posguy99 commented May 14, 2021

McDutchie commented May 14, 2021

posguy99 commented May 14, 2021

posguy99 commented May 14, 2021

McDutchie commented May 14, 2021

hyenias commented May 14, 2021

posguy99 commented May 14, 2021

JohnoKing commented May 14, 2021

hyenias commented May 14, 2021

McDutchie commented May 14, 2021

McDutchie commented May 14, 2021

posguy99 commented May 14, 2021

McDutchie commented May 14, 2021 •

edited

Loading

McDutchie commented May 14, 2021 •

edited

Loading

McDutchie commented May 14, 2021 •

edited

Loading

hyenias commented May 14, 2021 •

edited

Loading

posguy99 commented May 14, 2021

posguy99 commented May 14, 2021

hyenias commented May 15, 2021

posguy99 commented May 15, 2021

posguy99 commented May 15, 2021

posguy99 commented Jul 24, 2021 •

edited

Loading

posguy99 commented Jul 24, 2021

JohnoKing commented Jan 4, 2022 •

edited

Loading

JohnoKing commented Jan 4, 2022

posguy99 commented Jan 5, 2022 •

edited

Loading

McDutchie commented Jan 5, 2022

posguy99 commented Jan 10, 2022

McDutchie commented Feb 12, 2022

posguy99 commented Feb 12, 2022

Multibyte characters get corrupted when KEYBD trap is set #307

Multibyte characters get corrupted when KEYBD trap is set #307

Comments

posguy99 commented May 14, 2021

McDutchie commented May 14, 2021

posguy99 commented May 14, 2021

posguy99 commented May 14, 2021

McDutchie commented May 14, 2021

hyenias commented May 14, 2021

posguy99 commented May 14, 2021

JohnoKing commented May 14, 2021

hyenias commented May 14, 2021

McDutchie commented May 14, 2021

McDutchie commented May 14, 2021

posguy99 commented May 14, 2021

McDutchie commented May 14, 2021 • edited Loading

McDutchie commented May 14, 2021 • edited Loading

McDutchie commented May 14, 2021 • edited Loading

hyenias commented May 14, 2021 • edited Loading

posguy99 commented May 14, 2021

posguy99 commented May 14, 2021

hyenias commented May 15, 2021

posguy99 commented May 15, 2021

posguy99 commented May 15, 2021

posguy99 commented Jul 24, 2021 • edited Loading

posguy99 commented Jul 24, 2021

JohnoKing commented Jan 4, 2022 • edited Loading

JohnoKing commented Jan 4, 2022

posguy99 commented Jan 5, 2022 • edited Loading

McDutchie commented Jan 5, 2022

posguy99 commented Jan 10, 2022

McDutchie commented Feb 12, 2022

posguy99 commented Feb 12, 2022

McDutchie commented May 14, 2021 •

edited

Loading

McDutchie commented May 14, 2021 •

edited

Loading

McDutchie commented May 14, 2021 •

edited

Loading

hyenias commented May 14, 2021 •

edited

Loading

posguy99 commented Jul 24, 2021 •

edited

Loading

JohnoKing commented Jan 4, 2022 •

edited

Loading

posguy99 commented Jan 5, 2022 •

edited

Loading