Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multibyte characters get corrupted when KEYBD trap is set #307

Closed
posguy99 opened this issue May 14, 2021 · 29 comments
Closed

Multibyte characters get corrupted when KEYBD trap is set #307

posguy99 opened this issue May 14, 2021 · 29 comments
Labels
bug Something is not working

Comments

@posguy99
Copy link

See:

https://apple.stackexchange.com/questions/420570/i-am-typing-into-my-terminal-and-the-stdin-is-changing-and-adding-characters-on

The stackexchange posting is about zsh, but I tested all the shells I have installed on macOS Catalina. Only ksh93u+m (and 93u+) mangles the input line.

Doesn't matter in emacs mode, in vi mode, both of them off.

@McDutchie
Copy link

It works for me, nothing gets mangled. What is the output of your locale command when run from ksh?

@posguy99
Copy link
Author

[51] mbp13 $ locale
LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL=

Same as for all the other shells. :)

So if you copy that line, and paste it into the shell, you don't get a bell and this:

$ 
$ $ questions/21/

Don't have to try to execute it.

For bash:

$ echo $BASH_VERSION
3.2.57(1)-release
$ curl -d " 印度是最好的 The weather is good" -X POST http://localhost:8000/questions/21/

Again, don't have to try to execute it, just paste it.

@posguy99
Copy link
Author

It happens with ksh93u+ too, it’s not a u+m bug.

@McDutchie
Copy link

So if you copy that line, and paste it into the shell, you don't get a bell and this:

$ 
$ $ questions/21/
`

These are the results I get:

Schermafbeelding 2021-05-14 om 22 22 45

So, it works fine on my end on both 93u+m and 93u+. There is something different about our setups and we need to find out what it is, or I'm not going to be able to try to debug this problem.

@JohnoKing, @hyenias, can either of you reproduce this?

@hyenias
Copy link

hyenias commented May 14, 2021

I am unable to reproduce locally using any of my three terminal clients (iTERM, Terminal, and Alacritty). I tested with what I had on my mac at the time (Version AJMv 93u+m/1.0.0-alpha+92f7ca54 2021-05-07) and default macOS ksh. But I do not have macOS Catalina, I have 10.13.6. I am recompiling now with current master. I pasted without issue and even changed the curl command to a function call of echo of "$@" without issue.

@posguy99
Copy link
Author

It's my keyboard setup script that runs during .kshrc...

keybd_trap () {
  case ${.sh.edchar} in
    $'\e[1~') .sh.edchar=$'\001';;      # Home = beginning-of-line
    $'\e[4~') .sh.edchar=$'\005';;      # End = end-of-line
    $'\e[5~') .sh.edchar=$'\e>';;       # PgUp = history-previous
    $'\e[6~') .sh.edchar=$'\e<';;       # PgDn = history-next
    $'\e[3~') .sh.edchar=$'\004';;      # Delete = delete-char
  esac
}
trap keybd_trap KEYBD

It's enough that the keyboard trap is set. The case/esac in the function can be completely empty, but setting the trap is enough to cause the failure.

And all you actually have to paste is the 印度是最好的

@JohnoKing
Copy link

I think the bug @posguy99 is experiencing may be related to (or is the same as) att#197. After launching ksh with ENV=/dev/null ksh and setting up the keyboard trap in #307 (comment) I get very broken behavior (on Linux x86_64):

# Repeated attempts to paste `curl -d " 印度是最好的 The weather is good" -X POST http://localhost:8000/questions/21/`
# result in `curl -d " p^` instead.
$ curl -d " p^
$ questions/21/
ksh: questions/21/: not found
$ curl -d " p^
$ questions/21/
ksh: questions/21/: not found

@hyenias
Copy link

hyenias commented May 14, 2021

Agreed. I am able to reproduce the behavior meaning any attempt to paste 印度是最好的 fails when a KEYBD trap is enabled. When I turn KEYBD trap off, I can paste into the command line.

$ exec ksh +E --emacs
$
$ keybd_trap () { : ; }
$ trap keybd_trap KEYBD
$ $
$ trap - KEYBD
$ 印度是最好的
ksh: 印度是最好的: not found
$

@McDutchie McDutchie added the bug Something is not working label May 14, 2021
@McDutchie McDutchie changed the title KSH and Chinese input characters Multibyte characters get corrupted when KEYBD trap is set May 14, 2021
@McDutchie
Copy link

Confirmed. After setting that KEYBD trap, I not only cannot paste multibyte characters, I can't type them either. Ugh.

@McDutchie
Copy link

In fact, the keyboard trap doesn't even have to do anything. Just trap : KEYBD is enough to mess up all multibyte processing on the command line. At least that should make the fix fairly straightforward, if we can locate the code block where multibyte support is missing.

@posguy99
Copy link
Author

The RH proposed patch in att#83 fixes it, and doesn’t cause any regression test failures.

I'm not sophisticated enough to understand what @kdudka thought was wrong with the patch, though. The trap only pulls single bytes out of the input queue and that causes the bug?

(sorry for the email noise... proofread comment before pressing green button)

@McDutchie
Copy link

McDutchie commented May 14, 2021

I don't know what they're on about in the att#83 discussion re: compilers supporting signed chars or not. As far as I can tell, it's irrelevant, because the data type of the variable that may contain a negated character is int, not char. If anyone has any thoughts on that, I'm all ears; I may be missing something.

A cleaned-up version of that patch is:

--- a/src/cmd/ksh93/edit/edit.c
+++ b/src/cmd/ksh93/edit/edit.c
@@ -1150,12 +1150,9 @@ int ed_getchar(register Edit_t *ep,int mode)
 							break;
 					}
 				}
-				if(n=keytrap(ep,readin,n,LOOKAHEAD-n,mode))
-				{
-					putstack(ep,readin,n,0);
-					c = ep->e_lbuf[--ep->e_lookahead];
-				}
 				else
+					c = -c;
+				if(!keytrap(ep,readin,n,LOOKAHEAD-n,mode))
 					c = ed_getchar(ep,mode);
 				ep->e_keytrap = 0;
 			}

It does fix the problem, but introduces a new one: the KEYBD trap function in this comment no longer works correctly. After loading that, the Home, End, PgUp, PgDn and Del keys all stop working, and insert the last character of their escape sequence into the command line instead. And so do all the other VT220 keys like the arrow keys.

Still, the correct fix is probably somewhere in this general direction.

@McDutchie
Copy link

McDutchie commented May 14, 2021

I played around some with that patch and I think I may have fixed it without breaking the KEYBD trap. Please try to break this patch:

--- a/src/cmd/ksh93/edit/edit.c
+++ b/src/cmd/ksh93/edit/edit.c
@@ -1150,10 +1150,15 @@ int ed_getchar(register Edit_t *ep,int mode)
 							break;
 					}
 				}
+				else
+					c = -c;
 				if(n=keytrap(ep,readin,n,LOOKAHEAD-n,mode))
 				{
+					int maybe_c;
 					putstack(ep,readin,n,0);
-					c = ep->e_lbuf[--ep->e_lookahead];
+					maybe_c = ep->e_lbuf[--ep->e_lookahead];
+					if(maybe_c > 0)
+						c = maybe_c;
 				}
 				else
 					c = ed_getchar(ep,mode);

@McDutchie
Copy link

McDutchie commented May 14, 2021

Hmm. I can type multibyte characters (like é or ü) just fine with that patch, but pasting the Chinese characters is still broken. :/

@hyenias
Copy link

hyenias commented May 14, 2021

Ahhh. As for some near-and-dear Japanese characters, try コーンシェル aka Kornshell. They work for me but the 印度是最好的 (India is the best) does not work correctly. All the Japanese characters for Kornshell (コーンシェル ) are two bytes three bytes each whereas perhaps 印度是最好 might have some 3 or 4 bytes characters.

Edit: No idea as to why the 3 byte Japanese UTF8 characters work and the 3 byte Chinese UTF8 characters do not when using your #307 (comment) [maybe_c version].

arch/*/bin/ksh +E --emacs
$ keybd_trap () { : ; }
$ trap keybd_trap KEYBD
$ p度/
$
$ コーンシェル
arch/linux.arm/bin/ksh: コーンシェル: not found
$ trap - KEYBD
$ 印度是最好
arch/linux.arm/bin/ksh: 印度是最好: not found
$ コーンシェル
arch/linux.arm/bin/ksh: コーンシェル: not found
$

@posguy99
Copy link
Author

In UTF-8 they're all three bytes.

@posguy99
Copy link
Author

@hyenias , so are the sample ones you posted.

2021-05-14-16-53-06 image

@hyenias
Copy link

hyenias commented May 15, 2021

Well, I certainly stand corrected. You just helped me out on #189! Thank you so much as I thought each were two bytes in length.

The cleaned up version of att#83 at #307 (comment) seems to work for me on my limited tests.

@posguy99
Copy link
Author

With the patch in #307 (comment) , the example Japanese text from @hyenias works without issue, but the Chinese text does not. If you paste the Chinese characters individually, the second and last characters work, but the others have various misbehaviors.

@posguy99
Copy link
Author

With the patch in #307 (comment) there is no change for the reproducer in the RH bugreport https://bugzilla.redhat.com/show_bug.cgi?id=1503922 either.

The characters are spaced correctly, though, like there's some masking of high-order bits going on. They're not random.

@posguy99
Copy link
Author

posguy99 commented Jul 24, 2021

Here's more weirdness, if you ^V and then try to paste the first Chinese character, you get the Chinese character like you should.

If you just paste the first Chinese character, you get a p.

KEYBD trap present for both tests.

Edit: Does ^V turn off the trap?

@posguy99
Copy link
Author

Even more weirdness...

#!/usr/bin/env ksh
while read -rsn1 k; do
 echo $k
done

And you get the Chinese characters when you paste them.

@JohnoKing
Copy link

JohnoKing commented Jan 4, 2022

I decided to check how old versions of ksh behaved and it appears this bug is a regression introduced in 93t+ 2009-10-07. The patch below (when applied on top of 93t+ 2009-09-17) is what introduced the bug:

--- a/src/cmd/ksh93/RELEASE
+++ b/src/cmd/ksh93/RELEASE
@@ -1,4 +1,9 @@
-09-09-17  --- Release ksh93t+  ---
+09-10-07  --- Release ksh93t+  ---
+09-10-07  $PATH processing has been changed to delay dir stat() and .paths
+	  lookup until the directory is needed in the path search.
+09-09-28  Call the ast setlocale() intercept on unset too.
+09-09-24  A bug in which LANG=foo; LC_ALL=foo; unset LC_ALL; did not revert
+	  LC_CTYPE etc. to the LANG value has been fixed.
 09-09-17  A bug in which unsetting SVLVL could cause a script invoked by
 	  name without #! to core dump has been fixed.
 09-09-16  A bug in which a pipeline in a here-document could hang when the
--- a/src/cmd/ksh93/sh/init.c
+++ b/src/cmd/ksh93/sh/init.c
@@ -355,13 +355,17 @@ static void put_cdpath(register Namval_t* np,const char *val,int flags,Namfun_t
     }
 #endif
 
-    /* Trap for LC_ALL, LC_TYPE, LC_MESSAGES, LC_COLLATE and LANG */
+    /* Trap for LC_ALL, LC_CTYPE, LC_MESSAGES, LC_COLLATE and LANG */
     static void put_lang(Namval_t* np,const char *val,int flags,Namfun_t *fp)
     {
 	Shell_t *shp = nv_shell(np);
 	int type;
 	char *lc_all = nv_getval(LCALLNOD);
 	char *name = nv_name(np);
+	if((shp->test&1) && !val && !nv_getval(np))
+		return;
+	if(shp->test&2)
+		nv_putv(np, val, flags, fp);
 	if(name==(LCALLNOD)->nvname)
 		type = LC_ALL;
 	else if(name==(LCTYPENOD)->nvname)
@@ -372,22 +376,30 @@ static void put_cdpath(register Namval_t* np,const char *val,int flags,Namfun_t
 		type = LC_COLLATE;
 	else if(name==(LCNUMNOD)->nvname)
 		type = LC_NUMERIC;
-	else if(name==(LANGNOD)->nvname && (!lc_all || *lc_all==0))
-		type = LC_ALL;
+#ifdef LC_LANG
+	else if(name==(LANGNOD)->nvname)
+		type = LC_LANG;
+#else
+#define LC_LANG		LC_ALL
+	else if(name==(LANGNOD)->nvname && (!lc_all || !*lc_all))
+		type = LC_LANG;
+#endif
 	else
 		type= -1;
 	if(sh_isstate(SH_INIT) && type>=0 && type!=LC_ALL && lc_all && *lc_all)
 		type= -1;
-	if(type>=0 || type==LC_ALL)
+	if(type>=0 || type==LC_ALL || type==LC_LANG)
 	{
-		if(!setlocale(type,val?val:""))
+		if(!setlocale(type,val?val:"-") && val)
 		{
 			if(!sh_isstate(SH_INIT) || shp->login_sh==0)
 				errormsg(SH_DICT,0,e_badlocale,val);
 			return;
 		}
 	}
-	if(CC_NATIVE==CC_ASCII && (type==LC_ALL || type==LC_CTYPE))
+	if(!(shp->test&2))
+		nv_putv(np, val, flags, fp);
+	if(CC_NATIVE==CC_ASCII && (type==LC_ALL || type==LC_LANG || type==LC_CTYPE))
 	{
 		if(sh_lexstates[ST_BEGIN]!=sh_lexrstates[ST_BEGIN])
 			free((void*)sh_lexstates[ST_BEGIN]);
@@ -438,7 +450,6 @@ static void put_cdpath(register Namval_t* np,const char *val,int flags,Namfun_t
 	if(type==LC_ALL || type==LC_MESSAGES)
 		error_info.translate = msg_translate;
 #endif
-	nv_putv(np, val, flags, fp);
     }
 #endif /* _hdr_locale */
 
--- a/src/lib/libast/RELEASE
+++ b/src/lib/libast/RELEASE
@@ -1,3 +1,7 @@
+09-10-05 _sfopen.c: add but ignore 'F' flags for stdio compatibility
+09-09-28 fts.h,ftwalk.h,fts.c: promote { namelen pathlen level } to (s)size_t
+09-09-28 locales: add AST_LC_LANG for $LANG
+09-09-28 setlocale.c: fix logic for dynamic { LANG LC_ALL LC_* } changes
 09-09-17 include/sfio.h,sfio/sfwalk.c: add sfwalk()
 09-09-09 sfio/sfputr.c: add SIGPIPE hang fix
 09-08-24 sfio/sfreserve.c: fix SF_UNBOUND logic with pushed streams
--- a/src/lib/libast/comp/setlocale.c
+++ b/src/lib/libast/comp/setlocale.c
@@ -570,7 +570,7 @@ static const signed char	utf8tab[256] =
 	4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6,-1,-1,
 };
 
-int
+static int
 utf8_mbtowc(wchar_t* wp, const char* str, size_t n)
 {
 	register unsigned char*	sp = (unsigned char*)str;
@@ -616,7 +616,7 @@ utf8_mbtowc(wchar_t* wp, const char* str, size_t n)
 	return -1;
 }
 
-int
+static int
 utf8_mblen(const char* str, size_t n)
 {
 	wchar_t		w;
@@ -749,6 +749,9 @@ Lc_category_t		lc_categories[] =
 { "LC_PAPER",         LC_PAPER,         AST_LC_PAPER,         0               },
 };
 
+static Lc_t*		lang;
+static Lc_t*		lc_all;
+
 typedef struct Unamval_s
 {
 	char*		name;
@@ -816,7 +819,7 @@ single(int category, Lc_t* lc)
 	const char*	sys;
 	int		i;
 
-	if (!lc && !(lc = lc_categories[category].prev))
+	if (!lc && !(lc = lc_all) && !(lc = lc_categories[category].prev) && !(lc = lang))
 		lc = lcmake(NiL);
 	if (locales[category] != lc)
 	{
@@ -982,6 +985,12 @@ composite(register const char* s, int initialize)
 
 /*
  * setlocale() intercept
+ *
+ * locale:
+ *	0	query
+ *	""	initialize from environment
+ *	"-"	unset
+ *	*	set
  */
 
 char*
@@ -1008,7 +1017,7 @@ _ast_setlocale(int category, const char* locale)
 		 */
 
 	compose:
-		if (category != AST_LC_ALL)
+		if (category != AST_LC_ALL && category != AST_LC_LANG)
 			return (char*)locales[category]->name;
 		if (!sp && !(sp = sfstropen()))
 			return 0;
@@ -1043,9 +1052,12 @@ _ast_setlocale(int category, const char* locale)
 		return sfstruse(sp);
 	}
 	if (!ast.locale.serial++)
+	{
 		stropt(getenv("LC_OPTIONS"), options, sizeof(*options), setopt, NiL);
+		initialized = 0;
+	}
 	if (*locale)
-		p = lcmake(locale);
+		p = streq(locale, "-") ? (Lc_t*)0 : lcmake(locale);
 	else if (!initialized)
 	{
 		char*	u;
@@ -1057,63 +1069,75 @@ _ast_setlocale(int category, const char* locale)
 		 */
 
 		u = 0;
-		if (!(a = getenv("LC_ALL")) || !*a)
+		if ((s = getenv("LANG")) && *s)
 		{
-			for (i = 1; i < AST_LC_COUNT; i++)
-				if ((s = getenv(lc_categories[i].name)) && *s)
-				{
-					if (streq(s, local) && (u || (u = native_locale(locale, tmp, sizeof(tmp)))))
-						s = u;
-					lc_categories[i].prev = lcmake(s);
-				}
-			a = getenv("LANG");
+			if (streq(s, local) && (u || (u = native_locale(locale, tmp, sizeof(tmp)))))
+				s = u;
+			lang = lcmake(s);
 		}
-		if (a)
+		else
+			lang = 0;
+		if ((s = getenv("LC_ALL")) && *s)
 		{
-			if (streq(a, local) && (u || (u = native_locale(locale, tmp, sizeof(tmp)))))
-				a = u;
-			if (composite(a, 1))
-				a = 0;
+			if (streq(s, local) && (u || (u = native_locale(locale, tmp, sizeof(tmp)))))
+				s = u;
+			lc_all = lcmake(s);
 		}
-		p = 0;
+		else
+			lc_all = 0;
 		for (i = 1; i < AST_LC_COUNT; i++)
-		{
-			if (!lc_categories[i].prev)
+			if ((s = getenv(lc_categories[i].name)) && *s)
 			{
-				if (!p && !(p = lcmake(a)))
-					break;
-				lc_categories[i].prev = p;
+				if (streq(s, local) && (u || (u = native_locale(locale, tmp, sizeof(tmp)))))
+					s = u;
+				lc_categories[i].prev = lcmake(s);
 			}
-			if (!single(i, lc_categories[i].prev))
+			else
+				lc_categories[i].prev = 0;
+		for (i = 1; i < AST_LC_COUNT; i++)
+			if (!single(i, lc_all ? lc_all : lc_categories[i].prev))
 			{
 				while (i--)
 					single(i, NiL);
 				return 0;
 			}
-		}
 		if (ast.locale.set & AST_LC_debug)
 			for (i = 1; i < AST_LC_COUNT; i++)
-				sfprintf(sfstderr, "locale env  %17s %s\n", lc_categories[i].name, locales[i]->name);
+				sfprintf(sfstderr, "locale env  %17s %16s %16s\n", lc_categories[i].name, locales[i]->name, lc_categories[i].prev ? lc_categories[i].prev->name : (char*)0);
 		initialized = 1;
 		goto compose;
 	}
-	else if (!(p = lc_categories[category].prev))
+	else if (category == AST_LC_LANG || !(p = lc_categories[category].prev))
 		p = lcmake("C");
-	if (category != AST_LC_ALL)
+	if (category == AST_LC_LANG)
+	{
+		if (lang != p)
+		{
+			lang = p;
+			if (!lc_all)
+				for (i = 1; i < AST_LC_COUNT; i++)
+					if (!single(i, lc_categories[i].prev))
+					{
+						while (i--)
+							single(i, NiL);
+						return 0;
+					}
+		}
+	}
+	else if (category != AST_LC_ALL)
 		return single(category, p);
-	else if (!(i = composite(locale, 0)))
+	else if ((i = composite(locale, 0)) < 0)
+		return 0;
+	else if (lc_all != p)
 	{
-		if (!p)
-			return 0;
+		lc_all = p;
 		for (i = 1; i < AST_LC_COUNT; i++)
-			if (!single(i, p))
+			if (!single(i, lc_all ? lc_all : lc_categories[i].prev))
 			{
 				while (i--)
 					single(i, NiL);
 				return 0;
 			}
 	}
-	else if (i < 0)
-		return 0;
 	goto compose;
 }
--- a/src/lib/libast/include/ast_std.h
+++ b/src/lib/libast/include/ast_std.h
@@ -154,6 +154,7 @@ extern char*		strerror(int);
 #define AST_LC_MEASUREMENT	12
 #define AST_LC_PAPER		13
 #define AST_LC_COUNT		14
+#define AST_LC_LANG		255
 
 #define AST_LC_find		(1L<<28)
 #define AST_LC_debug		(1L<<29)
@@ -202,6 +203,9 @@ extern char*		strerror(int);
 #ifndef LC_PAPER
 #define LC_PAPER		(-AST_LC_PAPER)
 #endif
+#ifndef LC_LANG
+#define LC_LANG			(-AST_LC_LANG)
+#endif
 
 #undef	extern
 
--- a/src/lib/libast/port/lc.c
+++ b/src/lib/libast/port/lc.c
@@ -125,6 +125,7 @@ lcindex(int category, int min)
 	case LC_COLLATE:	return AST_LC_COLLATE;
 	case LC_CTYPE:		return AST_LC_CTYPE;
 	case LC_IDENTIFICATION:	return AST_LC_IDENTIFICATION;
+	case LC_LANG:		return AST_LC_LANG;
 	case LC_MEASUREMENT:	return AST_LC_MEASUREMENT;
 	case LC_MESSAGES:	return AST_LC_MESSAGES;
 	case LC_MONETARY:	return AST_LC_MONETARY;

@JohnoKing
Copy link

Unsetting the $LANG variable can be used as a workaround for the bug:

$ trap : KEYBD
$ unset LANG
$ print 印度是最好的
印度是最好的

@posguy99
Copy link
Author

posguy99 commented Jan 5, 2022

Unsetting $LANG isn't a perfect workaround on macOS here. It will let you input the characters in question... but now try to backspace. It doesn't seem to realize how wide they are. Each character takes two backspaces to actually remove.

$ unset LANG
$ print 印度是最好的
印度是最好的
$ 
$ print 印度是最好  <-- one backspace here removed the character, but left a space
印度是最好? <- question mark output
$ 

@McDutchie
Copy link

Yes, that is the logical consequence of unsetting LANG – because that is how ksh (and any other program) knows it's in a locale that uses multi-byte characters.

@posguy99
Copy link
Author

Yes, that is the logical consequence of unsetting LANG – because that is how ksh (and any other program) knows it's in a locale that uses multi-byte characters.

Just for fun...

ksh93u+m has the odd backspace behavior when you unset LANG, but bash and mksh do not.

zsh has an even weirder behavior:

$ zsh
mwilson@mbp13 ~ % unset LANG
mwilson@mbp13 ~ % ?<008d>?度?<0098>??<009c><0080>好?<009a><0084>

oksh has the odd backspace behavior whether LANG is set or not. 😄

@McDutchie
Copy link

A fix remains elusive, but we've got to do something with this bug before releasing 1.0. Currently, if a KEYBD trap is set in a multibyte locale (which almost everyone uses these days), not just the trap but the entire shell essentially becomes unusable until the trap is unset. That's not an acceptable situation.

The patch below simply refuses to execute the KEYBD trap if a multibyte locale is active (mbwide()) and the character is greater than 128. For some reason that I don't understand yet, the value of the character turned negative is stored in the integer variable c, so we check for c > -128.

That effectively fixes this bug, leaving a much more benign bug or limitation in its place; in multibyte locales, the KEYBD trap is only triggered for ASCII characters.

diff --git a/src/cmd/ksh93/edit/edit.c b/src/cmd/ksh93/edit/edit.c
index d9933b63b..be31ec7cb 100644
--- a/src/cmd/ksh93/edit/edit.c
+++ b/src/cmd/ksh93/edit/edit.c
@@ -1122,7 +1122,9 @@ int ed_getchar(register Edit_t *ep,int mode)
 				killpg(getpgrp(),SIGINT);
 				siglongjmp(ep->e_env, UINTR);
 			}
-			if(mode<=0 && sh.st.trap[SH_KEYTRAP])
+			if(mode<=0 && sh.st.trap[SH_KEYTRAP]
+			/* workaround for https://github.com/ksh93/ksh/issues/307 -- ignore non-ASCII in multibyte locale */
+			&& (!mbwide() || c > -128))
 			{
 				ep->e_keytrap = 1;
 				n=1;

@posguy99
Copy link
Author

Github needs a WONTFIXRIGHTNOW. 😏

McDutchie added a commit that referenced this issue Feb 17, 2022
In UTF-8 locales, ksh breaks when a KEYBD trap is active, even a
dummy no-op one like 'trap : KEYBD'. Entering multi-byte characters
fails (the input is interrupted and a new prompt is displayed) and
pasting content with multi-byte characters produces corrupted
results.

The cause is that the KEYBD trap code is not multibyte-ready.
Unfortunately nobody yet understands the edit.c code well enough
to implement a proper fix. Pending that, this commit implements
a workaround that at least avoids breaking the shell.

src/cmd/ksh93/edit/edit.c: ed_getchar():
- When a multi-byte locale is active, do not trigger the the KEYBD
  trap except for ASCII characters (1-127).

Resolves: #307
McDutchie added a commit that referenced this issue Dec 30, 2024
The KEYBD trap should now be fully functional for UTF-8 and other
multibyte locales. Thanks to Johnothan King for finding this fix!

Analysis: The KEYBD trap code processes character code points
stored in e_lbuf by ed_read(). But shell variables store bytes, not
characters. So, in UTF-8 locales for example, the Unicode code
points need to be converted to multibyte UTF-8 encoding. This is
needed to calculate the length of each encoded character in bytes
(which fixes the corruption issue) and for keytrap() to store its
UTF-8 representation in ${.sh.edchar}.

src/cmd/ksh93/edit/edit.c: ed_getchar():
- Remove the workaround from the referenced commit.
- Use mbconv to convert innput codepoints to bytes before adding
  them to inbuff, a char array that is passed on to keytrap().

Related:  https://bugzilla.redhat.com/show_bug.cgi?id=1503922
Related:  att#197
Related:  #307
Resolves: #460
Co-authored-by: Johnothan King <[email protected]>
McDutchie added a commit that referenced this issue Dec 30, 2024
The KEYBD trap should now be fully functional for UTF-8 and other
multibyte locales. Thanks to Johnothan King for finding this fix!

Analysis: The KEYBD trap code processes character code points
stored in e_lbuf by ed_read(). But shell variables store bytes, not
characters. So, in UTF-8 locales for example, the Unicode code
points need to be converted to multibyte UTF-8 encoding. This is
needed to calculate the length of each encoded character in bytes
(which fixes the corruption issue) and for keytrap() to store its
UTF-8 representation in ${.sh.edchar}.

src/cmd/ksh93/edit/edit.c: ed_getchar():
- Remove the workaround from the referenced commit.
- Use mbconv to convert innput codepoints to bytes before adding
  them to inbuff, a char array that is passed on to keytrap().

Related:  https://bugzilla.redhat.com/show_bug.cgi?id=1503922
Related:  att#197
Related:  #307
Resolves: #460
Co-authored-by: Johnothan King <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is not working
Projects
None yet
Development

No branches or pull requests

4 participants