Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make KEYBD trap handling code multibyte-aware #460

Closed
McDutchie opened this issue Feb 12, 2022 · 9 comments
Closed

Make KEYBD trap handling code multibyte-aware #460

McDutchie opened this issue Feb 12, 2022 · 9 comments
Labels
TODO Things to be done before releasing

Comments

@McDutchie
Copy link

For background, see #307 and 6b77da2.

To do:

  1. Figure out how the edit.c code works
  2. Remove workaround applied in 6b77da2
  3. Rewrite what needs to be rewritten to make keytrap() and the code that calls it in ed_getchar() multibyte aware
@McDutchie McDutchie added the TODO Things to be done before releasing label Feb 12, 2022
@hckiang
Copy link

hckiang commented Dec 6, 2022

If you have figured out how edit.c works, it may also be good to extend the buffer size of .sh.edchar by the way. Currently if you set .sh.edchar to a longer string (say more than 100 chararacters) in the KEYBD trap the sequence will just be truncated. I have my own programmable autocompletion using the .sh.edchar mechanism but my completion is sometimes truncated because of this limitation.

@posguy99
Copy link

posguy99 commented Dec 7, 2022

I have my own programmable autocompletion...

Is that something you can share? Sounds very useful.

@hckiang
Copy link

hckiang commented Dec 8, 2022

very useful

I use it myself daily but it maybe not so useful for others yet because it's kind of hacky and half-baked. Nonetheless I've just uploaded it here. See the comments in autocmpt.sh to see how to use.

@JohnoKing
Copy link

JohnoKing commented Dec 30, 2024

I might be a bit too hopeful, but I managed to get it to work by simply using mbconv. The patch works AFAICT:

diff --git a/src/cmd/ksh93/edit/edit.c b/src/cmd/ksh93/edit/edit.c
index cbd39ee4c..4ad71b65a 100644
--- a/src/cmd/ksh93/edit/edit.c
+++ b/src/cmd/ksh93/edit/edit.c
@@ -831,14 +831,11 @@ int ed_getchar(Edit_t *ep,int mode)
 		{
 			if(mode<=0 && -c == ep->e_intr)
 				killpg(getpgrp(),SIGINT);
-			if(mode<=0 && sh.st.trap[SH_KEYTRAP]
-			/* workaround for <https://github.com/ksh93/ksh/issues/307>:
-			 * do not trigger KEYBD for non-ASCII in multibyte locale */
-			&& (!mbwide() || c > -128))
+			if(mode<=0 && sh.st.trap[SH_KEYTRAP])
 			{
 				ep->e_keytrap = 1;
-				n=1;
-				if((readin[0]= -c) == ESC)
+				n=mbconv(readin,-c);
+				if(readin[0] == ESC)
 				{
 					while(1)
 					{

Test kshrc for $ENV (pilfered from here, but with 。and 、 added to test multibyte characters in the trap itself.

keybd_trap () {
  case ${.sh.edchar} in
    $'\e[1~') .sh.edchar=$'\001';;      # Home = beginning-of-line
    $'\e[4~') .sh.edchar=$'\005';;      # End = end-of-line
    $'\e[5~') .sh.edchar=$'\e>';;       # PgUp = history-previous
    $'\e[6~') .sh.edchar=$'\e<';;       # PgDn = history-next
    $'\e[3~') .sh.edchar=$'\004';;      # Delete = delete-char
    '。') .sh.edchar=j ;;
    '、') .sh.edchar=k ;;
  esac
}
trap keybd_trap KEYBD

@McDutchie
Copy link
Author

Yes, it works! Thanks, nice find! That was much simpler than I expected. I have not been able to break it so far.

Just one minor thing: to be fully correct in testing for the ASCII ESC control character, we need to check for a byte length of 1 as well. In UTF-8 this will make no difference because all bytes in multibyte characters have the 8th bit set, but some encodings (at least SJIS) have multibyte characters whose individual bytes may be in the 7-bit ASCII range.

diff --git a/src/cmd/ksh93/edit/edit.c b/src/cmd/ksh93/edit/edit.c
index cbd39ee4c..fab11b17c 100644
--- a/src/cmd/ksh93/edit/edit.c
+++ b/src/cmd/ksh93/edit/edit.c
@@ -831,14 +831,11 @@ int ed_getchar(Edit_t *ep,int mode)
 		{
 			if(mode<=0 && -c == ep->e_intr)
 				killpg(getpgrp(),SIGINT);
-			if(mode<=0 && sh.st.trap[SH_KEYTRAP]
-			/* workaround for <https://github.com/ksh93/ksh/issues/307>:
-			 * do not trigger KEYBD for non-ASCII in multibyte locale */
-			&& (!mbwide() || c > -128))
+			if(mode<=0 && sh.st.trap[SH_KEYTRAP])
 			{
 				ep->e_keytrap = 1;
-				n=1;
-				if((readin[0]= -c) == ESC)
+				n = mbconv(readin, -c);
+				if(n==1 && readin[0]==ESC)
 				{
 					while(1)
 					{

@posguy99
Copy link

Works for me too.

Fixes the UTF string originally reported, fixes @hyenias Japanese, fixes the RH bugreport,

@McDutchie
Copy link
Author

Thanks for testing it, @posguy99!

McDutchie added a commit that referenced this issue Dec 30, 2024
The KEYBD trap should now be fully functional for UTF-8 and other
multibyte locales. Thanks to Johnothan King for finding this fix!

Analysis: The KEYBD trap code processes character code points
stored in e_lbuf by ed_read(). But shell variables store bytes, not
characters. So, in UTF-8 locales for example, the Unicode code
points need to be converted to multibyte UTF-8 encoding. This is
needed to calculate the length of each encoded character in bytes
(which fixes the corruption issue) and for keytrap() to store its
UTF-8 representation in ${.sh.edchar}.

src/cmd/ksh93/edit/edit.c: ed_getchar():
- Remove the workaround from the referenced commit.
- Use mbconv to convert innput codepoints to bytes before adding
  them to inbuff, a char array that is passed on to keytrap().

Related:  https://bugzilla.redhat.com/show_bug.cgi?id=1503922
Related:  att#197
Related:  #307
Resolves: #460
Co-authored-by: Johnothan King <[email protected]>
@McDutchie
Copy link
Author

@hckiang wrote:

If you have figured out how edit.c works, it may also be good to extend the buffer size of .sh.edchar by the way. Currently if you set .sh.edchar to a longer string (say more than 100 chararacters) in the KEYBD trap the sequence will just be truncated.

That limit is here:

#define LOOKAHEAD 80
It seems to be enough simply to increase that to, say, 400.

It has been defined as 80 since the earliest ksh sources I have, which are from 1986.

@JohnoKing
Copy link

It seems to be enough simply to increase that to, say, 400.

I don't see why the stack couldn't be resized on the fly to whatever it needs to be, using something like e.g. realloc or the stack mechanism (stkopen and stkalloc).
Below is a (leaky) proof of concept using realloc.

sloppy-extemporaneous-proof-of-concept.patch
diff --git a/src/cmd/ksh93/edit/completion.c b/src/cmd/ksh93/edit/completion.c
index e1c57cc91..71ede541c 100644
--- a/src/cmd/ksh93/edit/completion.c
+++ b/src/cmd/ksh93/edit/completion.c
@@ -598,7 +598,10 @@ int ed_macro(Edit_t *ep, int i)
 {
 	char *out;
 	Namval_t *np;
-	genchar buff[LOOKAHEAD+1];
+	size_t out_len, buff_len;
+	genchar *buff;
+	buff_len = ep->e_lbuflen+1;
+	buff = sh_calloc(1, buff_len);
 	if(i != '@')
 		ep->e_macro[1] = i;
 	/* macros of the form <ESC>[c evoke alias __c */
@@ -610,19 +613,16 @@ int ed_macro(Edit_t *ep, int i)
 	{
 #if SHOPT_MULTIBYTE
 		/* copy to buff in internal representation */
-		int c = 0;
-		if( strlen(out) > LOOKAHEAD )
+		out_len = strlen(out);
+		if( out_len > ep->e_lbuflen )
 		{
-			c = out[LOOKAHEAD];
-			out[LOOKAHEAD] = 0;
+			buff = sh_realloc(buff,out_len);
+			buff_len = out_len;
 		}
 		i = ed_internal(out,buff);
-		if(c)
-			out[LOOKAHEAD] = c;
 #else
-		strncpy((char*)buff,out,LOOKAHEAD);
-		buff[LOOKAHEAD] = 0;
-		i = strlen((char*)buff);
+		strcpy((char*)buff,out,buf_len-1);
+		i = (int)strlen((char*)buff);
 #endif /* SHOPT_MULTIBYTE */
 		while(i-- > 0)
 			ed_ungetchar(ep,buff[i]);
diff --git a/src/cmd/ksh93/edit/edit.c b/src/cmd/ksh93/edit/edit.c
index fab11b17c..841cd9c38 100644
--- a/src/cmd/ksh93/edit/edit.c
+++ b/src/cmd/ksh93/edit/edit.c
@@ -288,6 +288,24 @@ void ed_ringbell(void)
 	write(ERRIO,bellchr,1);
 }
 
+/*
+ * (Pre)allocate the lookahead and search buffers with a length of 1/2 a kibibyte.
+ */
+void ed_allocate_buffers(Edit_t *ep)
+{
+	size_t default_len = 512;
+	if(!ep->e_lbuf)
+	{
+		ep->e_lbuf = sh_calloc(1, default_len);
+		ep->e_lbuflen = default_len;
+	}
+	if(!ep->e_search)
+	{
+		ep->e_search = sh_calloc(1, default_len);
+		ep->e_sbuflen = default_len;
+	}
+}
+
 #if SHOPT_ESH || SHOPT_VSH
 
 /*
@@ -549,8 +567,11 @@ void	ed_setup(Edit_t *ep, int fd, int reedit)
 	if(ep->e_default && (pp = nv_getval(ep->e_default)))
 	{
 		n = strlen(pp);
-		if(n > LOOKAHEAD)
-			n = LOOKAHEAD;
+		if(n > ep->e_lbuflen)
+		{
+			ep->e_lbuflen += n;
+			ep->e_lbuf = sh_realloc(ep->e_lbuf, ep->e_lbuflen);
+		}
 		ep->e_lookahead = n;
 		while(n-- > 0)
 			ep->e_lbuf[n] = *pp++;
@@ -600,6 +621,7 @@ int ed_read(void *context, int fd, char *buff, int size, int reedit)
 	sh_onstate(SH_TTYWAIT);
 	errno = EINTR;
 	sh.waitevent = 0;
+	ed_allocate_buffers(ep);
 	while(rv<0 && errno==EINTR)
 	{
 		if(sh.trapnote&(SH_SIGSET|SH_SIGTRAP))
@@ -725,6 +747,11 @@ static int putstack(Edit_t *ep,char string[], int nbyte, int type)
 	int size, offset = ep->e_lookahead + nbyte;
 	*(endp = &p[nbyte]) = 0;
 	endp = &p[nbyte];
+	//if(ep->e_lbuflen < offset)  /* FIXME: Switch to stk(3) and fix the memory leak here */
+	{
+		ep->e_lbuflen += offset;
+		ep->e_lbuf = sh_realloc(ep->e_lbuf, ep->e_lbuflen);
+	}
 	do
 	{
 		c = (int)((*p) & STRIP);
@@ -814,13 +841,13 @@ static int putstack(Edit_t *ep,char string[], int nbyte, int type)
 int ed_getchar(Edit_t *ep,int mode)
 {
 	int n = 0, c;
-	char readin[LOOKAHEAD+1];
+	char readin[ep->e_lbuflen+1];
 	if(!ep->e_lookahead)
 	{
 		ed_flush(ep);
 		ep->e_inmacro = 0;
 		*ep->e_vi_insert = (mode==-2);
-		if((n=ed_read(ep,ep->e_fd,readin,-LOOKAHEAD,0)) > 0)
+		if((n=ed_read(ep,ep->e_fd,readin,-ep->e_lbuflen,0)) > 0)
 			n = putstack(ep,readin,n,1);
 		*ep->e_vi_insert = 0;
 	}
@@ -859,7 +886,7 @@ int ed_getchar(Edit_t *ep,int mode)
 							break;
 					}
 				}
-				if(n=keytrap(ep,readin,n,LOOKAHEAD-n,mode))
+				if(n=keytrap(ep,readin,n,ep->e_lbuflen-n,mode))
 				{
 					putstack(ep,readin,n,0);
 					c = ep->e_lbuf[--ep->e_lookahead];
@@ -885,7 +912,7 @@ int ed_getchar(Edit_t *ep,int mode)
 #if SHOPT_ESH || SHOPT_VSH
 void ed_ungetchar(Edit_t *ep,int c)
 {
-	if (ep->e_lookahead < LOOKAHEAD)
+	if (ep->e_lookahead < ep->e_lbuflen)
 		ep->e_lbuf[ep->e_lookahead++] = c;
 	return;
 }
@@ -1297,8 +1324,7 @@ static int keytrap(Edit_t *ep,char *inbuff,int insize, int bufsize, int mode)
 		nv_unset(ED_CHRNOD);
 	else if(bufsize>0)
 	{
-		strncopy(inbuff,cp,bufsize);
-		inbuff[bufsize-1]='\0';
+		strlcpy(inbuff,cp,bufsize);
 		insize = strlen(inbuff);
 	}
 	else
diff --git a/src/cmd/ksh93/edit/emacs.c b/src/cmd/ksh93/edit/emacs.c
index 56fac2d70..b3d51fd91 100644
--- a/src/cmd/ksh93/edit/emacs.c
+++ b/src/cmd/ksh93/edit/emacs.c
@@ -185,6 +185,7 @@ int ed_emacsread(void *context, int fd,char *buff,int scend, int reedit)
 		ep->prevdirection =  1;
 		location.hist_command =  -5;
 	}
+	ed_allocate_buffers(ep->ed);
 	Prompt = prompt;
 	ep->screen = Screen;
 	ep->lastdraw = FINAL;
@@ -1238,12 +1239,19 @@ static void xcommands(Emacs_t *ep,int count)
 
 static int dosearch(Emacs_t *ep, genchar *out, int direction)
 {
-	if(cur>0 && eol==cur && (cur<(SEARCHSIZE-2) || ep->prevdirection == -2))
+	if(cur>0 && eol==cur && (cur<(ep->ed->e_sbuflen-2) || ep->prevdirection == -2))
 	{
 		if(ep->lastdraw==APPEND)
 		{
+			size_t len;
 			/* Fetch the current command line and save it for later searches */
 			out[cur] = 0;
+			len = genlen(out) + 1;
+			if(len > ep->ed->e_sbuflen)
+			{
+				lstring = sh_realloc(lstring, len);
+				ep->ed->e_sbuflen = len;
+			}
 			gencpy((genchar*)lstring+1,out);
 #if SHOPT_MULTIBYTE
 			ed_external((genchar*)lstring+1,lstring+1);
@@ -1351,11 +1359,17 @@ static void search(Emacs_t* ep,genchar *out,int direction)
 		direction = -1;
 	if (i != 4)
 	{
+		size_t len;
 #if SHOPT_MULTIBYTE
 		ed_external(string,(char*)string);
 #endif /* SHOPT_MULTIBYTE */
-		strncopy(lstring,((char*)string)+4,SEARCHSIZE-1);
-		lstring[SEARCHSIZE-1] = 0;
+		len = strlen((char*)string) - 4;
+		if (len > ep->ed->e_sbuflen)
+		{
+			lstring = sh_realloc(lstring, len);
+			ep->ed->e_sbuflen = len;
+		}
+		strcpy(lstring,((char*)string)+4);
 		ep->prevdirection = direction;
 	}
 	else
diff --git a/src/cmd/ksh93/edit/vi.c b/src/cmd/ksh93/edit/vi.c
index e24dfd3b2..fdbc9c175 100644
--- a/src/cmd/ksh93/edit/vi.c
+++ b/src/cmd/ksh93/edit/vi.c
@@ -216,6 +216,7 @@ int ed_viread(void *context, int fd, char *shbuf, int nchar, int reedit)
 		vp->direction = -1;
 		vp->ed = ed;
 	}
+	ed_allocate_buffers(vp->ed);
 
 	/*** setup prompt ***/
 
@@ -2011,6 +2012,7 @@ static int search(Vi_t* vp,int mode)
 	int oldcurhline;
 	int i;
 	Histloc_t  location;
+	size_t len;
 
 	if( vp->direction == -2 && mode != 'n' && mode != 'N' )
 		vp->direction = -1;
@@ -2069,8 +2071,13 @@ static int search(Vi_t* vp,int mode)
 		location = hist_find(sh.hist_ptr,((char*)virtual)+1, curhline, 1, new_direction);
 	}
 	cur_virt = i;
-	strncopy(lsearch, ((char*)virtual)+1, SEARCHSIZE-1);
-	lsearch[SEARCHSIZE-1] = 0;
+	len = strlen((char*)virtual) - 1;
+	if(len > vp->ed->e_sbuflen)
+	{
+		lsearch = sh_realloc(lsearch, len);
+		vp->ed->e_sbuflen = len;
+	}
+	strcpy(lsearch, ((char*)virtual)+1);
 	if( (curhline=location.hist_command) >=0 )
 	{
 		vp->ocur_virt = INVALID;
@@ -2095,15 +2102,22 @@ static int search(Vi_t* vp,int mode)
 static int dosearch(Vi_t *vp, int direction)
 {
 	int mode;
+	size_t len;
 
 	if(direction)
 		mode = 'n';
 	else
 		mode = 'N';
 
-	if(cur_virt>=0 && cur_virt<(SEARCHSIZE-2) && cur_virt == last_virt)
+	if(cur_virt>=0 && cur_virt<(vp->ed->e_sbuflen-2) && cur_virt == last_virt)
 	{
 		virtual[last_virt + 1] = '\0';
+		len = strlen((char*)virtual);
+		if(len > vp->ed->e_sbuflen)
+		{
+			lsearch = sh_realloc(lsearch, len);
+			vp->ed->e_sbuflen = len;
+		}
 #if SHOPT_MULTIBYTE
 		ed_external(virtual,lsearch+1);
 #else
diff --git a/src/cmd/ksh93/include/edit.h b/src/cmd/ksh93/include/edit.h
index b39cc7bb8..c2fc83635 100644
--- a/src/cmd/ksh93/include/edit.h
+++ b/src/cmd/ksh93/include/edit.h
@@ -25,14 +25,11 @@
  *
  */
 
-#define SEARCHSIZE	80
-
 #include	"FEATURE/cmds"
 #include        "FEATURE/locale"
 #include	"terminal.h"
 
 #define STRIP		0377
-#define LOOKAHEAD	80
 
 #if SHOPT_MULTIBYTE
 #   include	"national.h"
@@ -88,9 +85,11 @@ typedef struct edit
 	genchar	*e_inbuf;	/* pointer to input buffer */
 	char	*e_prompt;	/* pointer to trimmed final line of PS1 prompt, used when redrawing command line */
 	genchar	*e_killbuf;	/* pointer to delete buffer */
-	char	e_search[SEARCHSIZE];	/* search string */
+	char	*e_search;	/* search string */
 	genchar	*e_physbuf;	/* temporary workspace buffer */
-	int	e_lbuf[LOOKAHEAD];/* pointer to look-ahead buffer */
+	int	*e_lbuf;	/* pointer to look-ahead buffer */
+	size_t	e_lbuflen;	/* current size of the lookahead buffer */
+	size_t	e_sbuflen;	/* current size of the search buffer */
 	int	e_fd;		/* file descriptor */
 	int	e_ttyspeed;	/* line speed, also indicates tty parameters are valid */
 	int	e_tabcount;
@@ -153,6 +152,7 @@ typedef struct edit
 extern void	ed_putchar(Edit_t*, int);
 extern void	ed_putstring(Edit_t*, const char*);
 extern void	ed_ringbell(void);
+extern void	ed_allocate_buffers(Edit_t *ep);
 extern void	ed_setup(Edit_t*,int, int);
 extern void	ed_flush(Edit_t*);
 extern int	ed_getchar(Edit_t*,int);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
TODO Things to be done before releasing
Projects
None yet
Development

No branches or pull requests

4 participants