Skip to content

Commit

Permalink
Fix backslash and ~(...) processing of patterns from split fields
Browse files Browse the repository at this point in the history
*** Problem 1:

The following bug shows up in pathname expansion, in every ksh93
version:

    $ mkdir testdir
    $ cd testdir
    $ touch a b c ./-   # GNU touch won't accept - as file name
    $ p='[a\-c]'
    $ echo $p
    a b c

As the - in the bracket expression is backslash-escaped, it should
not be treated as a bracket expression range operator, so the
expected output is:

    - a c

Analysis: For compatibility with the Bourne/POSIX/ksh88, extended
patterns (with a syntax involving parentheses) are not expanded in
fields resulting from field splitting, e.g., from unquoted variable
expansions. The shell adds an extra level of backslash-escaping to
fields resulting from field splitting, so that characters that
indicate extended pattern syntax can be disabled by escaping them.
Of course, the backslash itself also acquires an extra backslash,
and that is the cause of the bug above.

The fix is to remove one layer of backslash-escaping from patterns
resulting from field splitting before expanding them. But we cannot
do this unconditionally; we also need a new flag to check if the
extra escaping is disallowing an extended pattern. See below.

*** Problem 2:

The fix for unwanted brace expansion of ~(E)... patterns and the
like was incomplete; it only checked for ~( at the beginning of the
pattern, but it may occur anywhere in the pattern.

src/cmd/ksh93/sh/expand.c:
- path_expand():
  - Add a new 'musttrim' paraneter.
  - If that parameter is nonzero, trim the pattern before expanding
    by copying it into sh.strbuf and calling sh_trim() on it.
  - Replace the unexpanded result by the original untrimmed pattern
    if it wasn't expanded; this avoids regressions with unquoted
    variables and comsubs.
- must_disallow_bracepat():
  - Add new 'withbackslash' parameter to handle escaped patterns.
  - Don't bother checking for initial ~ as this will now be called
    by path_generate() on encountering a ~ anywhere in the pattern.
  - Return -1 if there is no ( or if the option's don't indicate a
    change in the brace expansion compatibility state.
- path_generate():
  - Add 'musttrim' parameter to pass on to path_expand() and
    must_disallow_bracepat().
  - While scanning the pattern, when encountering '~' and we're not
    already in a brace expansion, call must_disallow_bracepat()
    there instead of at the start, so that the 'nobracepat' state
    is kept up to date throughout the scan. This fixes problem 2.

src/cmd/ksh93/sh/macro.c:
- Add two new fields to the Mac_t struct:
  1. noextpat: set to disallow extended patterns; this is set in
     addition to backslash-escaping & | ( ).
  2. wasexpan: set when varsub() is called (except in the 'nosub:'
     case) or when comsubst() is called; this should cover all the
     expansions that may result in field splitting.
- mac_copy(): When escaping to disallow extended patterns, also set
  noextpat.
- end_field(): Introduce a 'musttrim' flag that is set if wasexpan
  is set (it was a variable or comsub expansion), there is no
  disallowed extended pattern, and the string contains a backslash.
  Pass this flag on to path_expand() or path_generate(). Along with
  the changes in expand.c, this fixes Problem 1.

Resolves: #549
  • Loading branch information
McDutchie committed May 27, 2023
1 parent 2026829 commit 4361b6a
Show file tree
Hide file tree
Showing 6 changed files with 110 additions and 29 deletions.
16 changes: 12 additions & 4 deletions NEWS
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,24 @@ This documents significant changes in the 1.0 branch of ksh 93u+m.
For full details, see the git log at: https://github.com/ksh93/ksh/tree/1.0
Uppercase BUG_* IDs are shell bug IDs as used by the Modernish shell library.

2023-05-27:

- Fixed a bug in the handling of backslashes in pathname expansion patterns
resulting from field splitting. For example, the following:
$ mkdir testdir; cd testdir; touch b; p='[a\-c]'; echo $p
now correctly recognises the backslash escape and no longer outputs 'b'.

- The 2023-05-19 fix for brace expansion interfering with extended patterns has
been improved to also recognise and handle extended patterns that are parts
of larger patterns, e.g., 'ls file~(E)[[:digit:]]{3,6}\.txt$' now works.

2023-05-19:

- Fixed a bug in the loop invariants optimizer where a variable for which
a .get or .getn discipline function is set within a loop is incorrectly
treated as invariant.

- Fixed three bugs in pathname expansion:
- Fixed two bugs in pathname expansion:
1. Brace expansion interfered with EREs and other extended pattern
types used with ~(...) pattern option prefixes, e.g. the {3,6} in
'ls ~(E)^file[[:digit:]]{3,6}\.txt$' was incorrectly parsed as a
Expand All @@ -17,9 +28,6 @@ Uppercase BUG_* IDs are shell bug IDs as used by the Modernish shell library.
2. With brace expansion disabled (set +B), any '{' character in a file
name pattern stopped it from being resolved unless a wildcard followed
it. Now, 'set +B; ls *{' will list files with names ending in '{'.
3. Patterns with options such as ~(E)foo failed to match files if the
pattern resulted from an unquoted variable expansion. With this fix,
'var="~(E)foo", ls $var' will list files with names containing 'foo'.

2023-05-18:

Expand Down
4 changes: 2 additions & 2 deletions src/cmd/ksh93/include/path.h
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ extern Namval_t *path_gettrackedalias(const char*);
extern Pathcomp_t *path_absolute(const char*, Pathcomp_t*, int);
extern char *path_basename(const char*);
extern char *path_fullname(const char*);
extern int path_expand(const char*, struct argnod**);
extern int path_expand(const char*, struct argnod**, int);
extern noreturn void path_exec(const char*,char*[],struct argnod*);
extern pid_t path_spawn(const char*,char*[],char*[],Pathcomp_t*,int);
extern int path_open(const char*,Pathcomp_t*);
Expand All @@ -88,7 +88,7 @@ extern int path_search(const char*,Pathcomp_t**,int);
extern char *path_relative(const char*);
extern int path_complete(const char*, const char*,struct argnod**);
#if SHOPT_BRACEPAT
extern int path_generate(struct argnod*,struct argnod**);
extern int path_generate(struct argnod*,struct argnod**, int);
#endif /* SHOPT_BRACEPAT */

#if SHOPT_DYNAMIC
Expand Down
6 changes: 4 additions & 2 deletions src/cmd/ksh93/sh.1
Original file line number Diff line number Diff line change
Expand Up @@ -2615,9 +2615,9 @@ and
This is also known as \f2globbing\fP
or sometimes \f2filename generation\fP.
Pathname expansion is disabled if the
.B -f
.B \-f
a.k.a.
.B --noglob
.B \-\-noglob
shell option is on.
Otherwise, if certain special characters are found in a
.I word
Expand All @@ -2641,10 +2641,12 @@ are scanned only for the characters
.BR ? ,
and
.B \*(OK\^
for compatibility reasons
(in which case the
.B (
character is not special and any pattern syntax described below
that involves parentheses does not apply).
.PP
Each file name component that contains a recognized pattern character
is replaced with a lexicographically sorted set of names
that matches the pattern
Expand Down
69 changes: 54 additions & 15 deletions src/cmd/ksh93/sh/expand.c
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ static char *nextdir(glob_t *gp, char *dir)
return NULL;
}

int path_expand(const char *pattern, struct argnod **arghead)
int path_expand(const char *pattern, struct argnod **arghead, int musttrim)
{
glob_t gdata;
struct argnod *ap;
Expand Down Expand Up @@ -93,9 +93,34 @@ int path_expand(const char *pattern, struct argnod **arghead)
gp->gl_suffix = sufstr;
gp->gl_intr = &sh.trapnote;
suflen = 0;
if(strncmp(pattern,"~(N",3)==0)
flags &= ~GLOB_NOCHECK;
glob(pattern, flags, 0, gp);
/*
* If we expanded an unquoted variable/comsub containing a pattern, that pattern
* will have shell-special characters backslash-escaped for correct expansion of
* the value. However, the pattern must be trimmed of those escapes before
* resolving it, while keeping the escapes if the pattern doesn't resolve.
*/
if(musttrim)
{
char *trimmedpat;
sfputr(sh.strbuf,pattern,-1);
trimmedpat = sfstruse(sh.strbuf);
sh_trim(trimmedpat);
glob(trimmedpat,flags,0,gp);
/*
* If there is only one result and it is identical to the trimmed pattern, then the pattern didn't
* resolve, and we now need to replace it with the untrimmed pattern to avoid regressions with the
* expansion of unquoted variables. (Note: globlist_t (glob.h) is binary compatible with struct
* argnod (argnod.h); thus, gl_path and argval have the same offset (ARGVAL) in the struct.)
*/
if((ap = (struct argnod*)gp->gl_list) && !ap->argnxt.ap && strcmp(ap->argval,trimmedpat)==0)
{
gp->gl_list = (globlist_t*)stkalloc(sh.stk,ARGVAL+strlen(pattern)+1);
memcpy(gp->gl_list,ap,ARGVAL); /* copy fields *before* argval/gl_path */
strcpy(gp->gl_list->gl_path,pattern);
}
}
else
glob(pattern,flags,0,gp);
sh_sigcheck();
for(ap= (struct argnod*)gp->gl_list; ap; ap = ap->argnxt.ap)
{
Expand Down Expand Up @@ -144,7 +169,7 @@ int path_complete(const char *name,const char *suffix, struct argnod **arghead)
{
sufstr = suffix;
suflen = strlen(suffix);
return path_expand(name,arghead);
return path_expand(name,arghead,0);
}

#if SHOPT_BRACEPAT
Expand All @@ -155,18 +180,22 @@ static int checkfmt(Sfio_t* sp, void* vp, Sffmt_t* fp)
}

/*
* Return true if ~(...) pattern options indicate a pattern type that is
* Return 1 if ~(...) pattern options indicate a pattern type that is
* syntactically incompatible with brace expansion because it uses braces
* for its own purposes (e.g., bounds in regular expressions).
* Return 0 if not.
* Return -1 if pat is not a ~(...) pattern or no relevant options are given.
* Do the minimum parsing of the option syntax necessary to determine this.
*
* NOTE: cp is expected to point to the character *after* the '~'.
*/
static int must_disallow_bracepat(char *pat)
static int must_disallow_bracepat(char *cp, int withbackslash)
{
int32_t incompat = 0;
char shellpat = 0, minus = 0, c;
if (*pat++ != '~' || *pat++ != '(')
return 0;
while ((c = *pat++) && c != ':' && c != ')') switch (c)
char change = 0, shellpat = 0, minus = 0, c;
if ((withbackslash && *cp++ != '\\') || *cp++ != '(')
return -1;
while ((c = *cp++) && c != ':' && c != ')') switch (c)
{
case 'E': /* extended regular expression */
case 'F': /* fixed pattern */
Expand All @@ -181,9 +210,11 @@ static int must_disallow_bracepat(char *pat)
}
else
incompat &= ~(1 << c - 'A');
change = 1;
break;
case 'K': /* shell pattern */
shellpat = !minus;
change = 1;
break;
case '-': /* disable the following options */
minus = 1;
Expand All @@ -192,18 +223,18 @@ static int must_disallow_bracepat(char *pat)
minus = 0;
break;
}
return c && incompat && !shellpat;
return change ? (c && incompat && !shellpat) : -1;
}

int path_generate(struct argnod *todo, struct argnod **arghead)
int path_generate(struct argnod *todo, struct argnod **arghead, int musttrim)
/*@
assume todo!=0;
return count satisfying count>=1;
@*/
{
const int nobracepat = must_disallow_bracepat(todo->argval);
char *cp;
int brace;
int nobracepat = 0;
struct argnod *ap;
struct argnod *top = 0;
struct argnod *apin;
Expand Down Expand Up @@ -312,6 +343,14 @@ int path_generate(struct argnod *todo, struct argnod **arghead)
case '\\':
cp++;
break;
case '~':
if(!brace)
{
int r = must_disallow_bracepat(cp,musttrim);
if(r>=0)
nobracepat = r;
}
break;
case 0:
/* insert on stack */
ap->argchn.ap = top;
Expand All @@ -322,7 +361,7 @@ int path_generate(struct argnod *todo, struct argnod **arghead)
{
apin = ap->argchn.ap;
if(!sh_isoption(SH_NOGLOB) || sh_isstate(SH_COMPLETE) || sh_isstate(SH_FCOMPLETE))
brace=path_expand(ap->argval,arghead);
brace = path_expand(ap->argval,arghead,musttrim);
else
{
ap->argchn.ap = *arghead;
Expand Down
19 changes: 14 additions & 5 deletions src/cmd/ksh93/sh/macro.c
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,8 @@ typedef struct _mac_
char split; /* set when word splitting is possible */
char pattern; /* set when glob pattern expansion or matching follows */
char patfound; /* set if pattern character found */
char noextpat; /* set to disallow extended patterns */
char wasexpan; /* set when word resulted from a $ or ` expansion */
char assign; /* set for assignments */
char arith; /* set for ((...)) */
char arrayok; /* $x[] ok for arrays */
Expand Down Expand Up @@ -175,7 +177,7 @@ char *sh_mactrim(char *str, int mode)
{
/* expand only if unique */
struct argnod *arglist=0;
if((mode=path_expand(str,&arglist))==1)
if((mode=path_expand(str,&arglist,0))==1)
str = arglist->argval;
else if(mode>1)
{
Expand Down Expand Up @@ -1129,6 +1131,7 @@ static int varsub(Mac_t *mp)
char *idx = 0;
int var=1,addsub=0,oldpat=mp->pattern,idnum=0,flag=0,d;
Stk_t *stkp = sh.stk;
mp->wasexpan = 1;
retry1:
idbuff[0] = 0;
idbuff[1] = 0;
Expand Down Expand Up @@ -2112,6 +2115,7 @@ static int varsub(Mac_t *mp)
if(type)
mac_error();
fcseek(-1);
mp->wasexpan = 0;
return 0;
}

Expand All @@ -2130,7 +2134,7 @@ static void comsubst(Mac_t *mp,Shnode_t* t, int type)
Stk_t *stkp = sh.stk;
Fcin_t save;
struct slnod *saveslp = sh.st.staklist;
struct _mac_ savemac = *mp;
Mac_t savemac = *mp;
int savtop = stktell(stkp);
char lastc=0, *savptr = stkfreeze(stkp,0);
int was_history = sh_isstate(SH_HISTORY);
Expand All @@ -2139,6 +2143,7 @@ static void comsubst(Mac_t *mp,Shnode_t* t, int type)
int newlines,bufsize,nextnewlines;
Sfoff_t foff;
Namval_t *np;
savemac.wasexpan = 1;
sh.argaddr = 0;
sh.st.staklist=0;
if(type)
Expand Down Expand Up @@ -2451,11 +2456,14 @@ static void mac_copy(Mac_t *mp,const char *str, int size)
size -= len;
continue;
}
if(n==S_ESC || n==S_EPAT)
if(n==S_ESC)
sfputc(stkp,ESCAPE);
else if(n==S_EPAT)
{
/* don't allow extended patterns in this case */
mp->patfound = mp->pattern;
sfputc(stkp,ESCAPE);
mp->noextpat = 1;
}
else if(n==S_PAT)
mp->patfound = mp->pattern;
Expand Down Expand Up @@ -2550,14 +2558,15 @@ static void endfield(Mac_t *mp,int split)
mp->atmode = 0;
if(mp->patfound)
{
int musttrim = mp->wasexpan && !mp->noextpat && strchr(argp->argval,'\\');
sh.argaddr = 0;
#if SHOPT_BRACEPAT
/* in POSIX mode, disallow brace expansion for unquoted expansions */
if(sh_isoption(SH_BRACEEXPAND) && !(sh_isoption(SH_POSIX) && mp->pattern==1))
count = path_generate(argp,mp->arghead);
count = path_generate(argp,mp->arghead,musttrim);
else
#endif /* SHOPT_BRACEPAT */
count = path_expand(argp->argval,mp->arghead);
count = path_expand(argp->argval,mp->arghead,musttrim);
if(count)
mp->fields += count;
else if(split) /* pattern is null string */
Expand Down
25 changes: 24 additions & 1 deletion src/cmd/ksh93/tests/glob.sh
Original file line number Diff line number Diff line change
Expand Up @@ -433,11 +433,20 @@ test_glob '<[^N]>' [\^N]
test_glob '<[]-z]>' [\]\-z]
test_glob '<[]-z]>' [']-z']
test_glob '<[]-z]>' ["]-z"]
# check internal escaping of bracket expression in glob pattern resulting from field splitting
# https://github.com/ksh93/ksh/issues/549
unquoted_patvar='[\!N]'; test_glob '<[\!N]>' $unquoted_patvar
unquoted_patvar='[\^N]'; test_glob '<[\^N]>' $unquoted_patvar
unquoted_patvar='[a\-c]'; test_glob '<[a\-c]>' $unquoted_patvar
: > -; test_glob '<->' $unquoted_patvar
: > a > c; test_glob '<-> <a> <c>' $unquoted_patvar
: > !; unquoted_patvar='[\!N]'; test_glob '<!>' $unquoted_patvar
: > ^; unquoted_patvar='[\^N]'; test_glob '<^>' $unquoted_patvar
cd ..

# ======
# https://www.reddit.com/r/ksh/comments/13k9af3/globbing_eres/jkjpk5a/
touch 'ef{' 'o1_mf_1_13962_l5p4ts16_.arc'
: > 'ef{' > 'o1_mf_1_13962_l5p4ts16_.arc'
unquoted_patvar='*f{'
((SHOPT_BRACEPAT)) && set +B
test_glob '<ef{>' *f{
Expand All @@ -463,6 +472,14 @@ test_glob '<~(E)^o1_mf_1_\d{1,6}_[[:alnum:]]{8}_.arc$>' $unquoted_patvar
if ((SHOPT_BRACEPAT))
then set -B
test_glob '<o1_mf_1_13962_l5p4ts16_.arc>' ~(E)^o1_mf_1_\d{1,6}_[[:alnum:]]{8}_.arc$
test_glob '<o1_mf_1_13962_l5p4ts16_.arc>' o~(E)1_mf_1_\d{1,6}_[[:alnum:]]{8}_.arc$
test_glob '<o1_mf_1_13962_l5p4ts16_.arc>' o1~(E)_mf_1_\d{1,6}_[[:alnum:]]{8}_.arc$
test_glob '<o1_mf_1_13962_l5p4ts16_.arc>' o1_~(E)mf_1_\d{1,6}_[[:alnum:]]{8}_.arc$
test_glob '<o1_mf_1_13962_l5p4ts16_.arc>' o1_m~(E)f_1_\d{1,6}_[[:alnum:]]{8}_.arc$
test_glob '<o1_mf_1_13962_l5p4ts16_.arc>' o1_mf~(E)_1_\d{1,6}_[[:alnum:]]{8}_.arc$
test_glob '<o1_mf_1_13962_l5p4ts16_.arc>' o1_mf_~(E)1_\d{1,6}_[[:alnum:]]{8}_.arc$
test_glob '<o1_mf_1_13962_l5p4ts16_.arc>' o1_mf_1~(E)_\d{1,6}_[[:alnum:]]{8}_.arc$
test_glob '<o1_mf_1_13962_l5p4ts16_.arc>' o1_mf_1_~(E)\d{1,6}_[[:alnum:]]{8}_.arc$
# in non-POSIX mode, brace expansion *is* expanded from unquoted var, though extended patterns are not.
# TODO: consider fixing this insanity as of ksh 93u+m/1.1 (incompatible change)
test_glob '<~(E)^o1_mf_1_\d1_[[:alnum:]]{8}_.arc$> <~(E)^o1_mf_1_\d6_[[:alnum:]]{8}_.arc$>' $unquoted_patvar
Expand All @@ -476,5 +493,11 @@ then set -B
test_glob '<~(i:A)N*> <~(i:A)S*>' $unquoted_patvar
fi

# ======
: > 'a\\b.txt'
unquoted_patvar='a\\\\b.*'
test_glob '<a\\b.txt>' a\\\\b.*
test_glob '<a\\b.txt>' $unquoted_patvar

# ======
exit $((Errors<125?Errors:125))

0 comments on commit 4361b6a

Please sign in to comment.