Skip to content

Commit

Permalink
Fix backslash and ~(...) processing of patterns from split fields
Browse files Browse the repository at this point in the history
*** Problem 1:

The following bug shows up in pathname expansion, in every ksh93
version:

    $ mkdir testdir
    $ cd testdir
    $ touch a b c ./-   # GNU touch won't accept - as file name
    $ p='[a\-c]'
    $ echo $p
    a b c

As the - in the bracket expression is backslash-escaped, it should
not be treated as a bracket expression range operator, so the
expected output is:

    - a c

Analysis: For compatibility with the Bourne/POSIX/ksh88, extended
patterns (with a syntax involving parentheses) are not expanded in
fields resulting from field splitting, e.g., from unquoted variable
expansions. The shell adds an extra level of backslash-escaping to
fields resulting from field splitting, so that characters that
indicate extended pattern syntax can be disabled by escaping them.
Of course, the backslash itself also acquires an extra backslash,
and that is the cause of the bug above.

The fix is to remove one layer of backslash-escaping from patterns
resulting from field splitting before expanding them. But we cannot
do this unconditionally; we also need a new flag to check if the
extra escaping is disallowing an extended pattern. See below.

[v1.1] Additionally, this commit adds a --globex shell option that
allows globbing of fields resulting from field splitting to ex-pand
ex-tended patterns, achieving consistent behaviour at the expense
of full backward compatibility. It is off by default.

*** Problem 2:

The fix for unwanted brace expansion of ~(E)... patterns and the
like was incomplete; it only checked for ~( at the beginning of the
pattern, but it may occur anywhere in the pattern.

src/cmd/ksh93/data/options.c,
src/cmd/ksh93/include/shell.h:
- [v1.1] Add new --globex (SH_GLOBEX) shell option.

src/cmd/ksh93/sh/expand.c:
- path_expand():
  - Add a new 'musttrim' paraneter.
  - If that parameter is nonzero, trim the pattern before expanding
    by copying it into sh.strbuf and calling sh_trim() on it.
  - Replace the unexpanded result by the original untrimmed pattern
    if it wasn't expanded; this avoids regressions with unquoted
    variables and comsubs.
- must_disallow_bracepat():
  - Add new 'withbackslash' parameter to handle escaped patterns.
  - Don't bother checking for initial ~ as this will now be called
    by path_generate() on encountering a ~ anywhere in the pattern.
  - Return -1 if there is no ( or if the option's don't indicate a
    change in the brace expansion compatibility state.
- path_generate():
  - Add 'musttrim' parameter to pass on to path_expand() and
    must_disallow_bracepat().
  - While scanning the pattern, when encountering '~' and we're not
    already in a brace expansion, call must_disallow_bracepat()
    there instead of at the start, so that the 'nobracepat' state
    is kept up to date throughout the scan. This fixes problem 2.

src/cmd/ksh93/sh/macro.c:
- Add two new fields to the Mac_t struct:
  1. noextpat: set to disallow extended patterns; this is set in
     addition to backslash-escaping & | ( ).
  2. wasexpan: set when varsub() is called (except in the 'nosub:'
     case) or when comsubst() is called; this should cover all the
     expansions that may result in field splitting.
- mac_copy(): When escaping to disallow extended patterns, also set
  noextpat unless the --globex shell option is on.
- end_field(): Introduce a 'musttrim' flag that is set if wasexpan
  is set (it was a variable or comsub expansion), there is no
  disallowed extended pattern, and the string contains a backslash.
  Pass this flag on to path_expand() or path_generate(). Along with
  the changes in expand.c, this fixes Problem 1.

Resolves: #549
  • Loading branch information
McDutchie committed May 27, 2023
1 parent d195f57 commit 6d76174
Show file tree
Hide file tree
Showing 9 changed files with 170 additions and 30 deletions.
11 changes: 10 additions & 1 deletion ANNOUNCE
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,13 @@
* finished state and some information may be missing, obsolete or wrong. *
**************************************************************************

The announcement text for ksh 93u+m/1.1 is not being developed yet.
The announcement text for ksh 93u+m/1.1 is under development.

### MAIN CHANGES between ksh 93u+m/1.0.x and 93u+m/1.1.0 ###

New features in shell options:

- A new --globex shell option allows pathname expansion of extended patterns
(those with a syntax involving parentheses) from fields resulting from
field splitting (e.g., unquoted variables as command arguments). This
remains disabled by default for compatibility with POSIX and ksh88.
23 changes: 19 additions & 4 deletions NEWS
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,31 @@ This documents significant changes in the dev branch of ksh 93u+m.
For full details, see the git log at: https://github.com/ksh93/ksh
Uppercase BUG_* IDs are shell bug IDs as used by the Modernish shell library.

2023-05-27:

- Fixed a bug in the handling of backslashes in pathname expansion patterns
resulting from field splitting. For example, the following:
$ mkdir testdir; cd testdir; touch b; p='[a\-c]'; echo $p
now correctly recognises the backslash escape and no longer outputs 'b'.

- The 2023-05-19 fix for brace expansion interfering with extended patterns has
been improved to also recognise and handle extended patterns that are parts
of larger patterns, e.g., 'ls file~(E)[[:digit:]]{3,6}\.txt$' now works.

- [v1.1] Added a new 'globex' shell option that causes extended patterns,
i.e. those with a syntax involving parentheses, to be expanded in fields
resulting from field splitting. These are normally disabled in that context
for compatibility with other shells and old ksh versions. For example,
$ var='~(E)foo'; set --globex; ls $var
will now list files with names containing 'foo'.

2023-05-19:

- Fixed a bug in the loop invariants optimizer where a variable for which
a .get or .getn discipline function is set within a loop is incorrectly
treated as invariant.

- Fixed three bugs in pathname expansion:
- Fixed two bugs in pathname expansion:
1. Brace expansion interfered with EREs and other extended pattern
types used with ~(...) pattern option prefixes, e.g. the {3,6} in
'ls ~(E)^file[[:digit:]]{3,6}\.txt$' was incorrectly parsed as a
Expand All @@ -17,9 +35,6 @@ Uppercase BUG_* IDs are shell bug IDs as used by the Modernish shell library.
2. With brace expansion disabled (set +B), any '{' character in a file
name pattern stopped it from being resolved unless a wildcard followed
it. Now, 'set +B; ls *{' will list files with names ending in '{'.
3. Patterns with options such as ~(E)foo failed to match files if the
pattern resulted from an unquoted variable expansion. With this fix,
'var="~(E)foo", ls $var' will list files with names containing 'foo'.

2023-05-18:

Expand Down
1 change: 1 addition & 0 deletions src/cmd/ksh93/data/options.c
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@ const Shtable_t shtab_options[] =
#if SHOPT_GLOBCASEDET || !defined(SHOPT_GLOBCASEDET)
"globcasedetect", SH_GLOBCASEDET,
#endif
"globex", SH_GLOBEX,
"globstar", SH_GLOBSTARS,
#if SHOPT_ESH
"gmacs", SH_GMACS,
Expand Down
4 changes: 2 additions & 2 deletions src/cmd/ksh93/include/path.h
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ extern Namval_t *path_gettrackedalias(const char*);
extern Pathcomp_t *path_absolute(const char*, Pathcomp_t*, int);
extern char *path_basename(const char*);
extern char *path_fullname(const char*);
extern int path_expand(const char*, struct argnod**);
extern int path_expand(const char*, struct argnod**, int);
extern noreturn void path_exec(const char*,char*[],struct argnod*);
extern pid_t path_spawn(const char*,char*[],char*[],Pathcomp_t*,int);
extern int path_open(const char*,Pathcomp_t*);
Expand All @@ -88,7 +88,7 @@ extern int path_search(const char*,Pathcomp_t**,int);
extern char *path_relative(const char*);
extern int path_complete(const char*, const char*,struct argnod**);
#if SHOPT_BRACEPAT
extern int path_generate(struct argnod*,struct argnod**);
extern int path_generate(struct argnod*,struct argnod**, int);
#endif /* SHOPT_BRACEPAT */

#if SHOPT_DYNAMIC
Expand Down
1 change: 1 addition & 0 deletions src/cmd/ksh93/include/shell.h
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,7 @@ typedef union Shnode_u Shnode_t;
#define SH_NOLOG 28
#define SH_NOTIFY 29
#define SH_DICTIONARY 30
#define SH_GLOBEX 31
#define SH_PIPEFAIL 32
#define SH_GLOBSTARS 33
#if SHOPT_GLOBCASEDET || !defined(SHOPT_GLOBCASEDET)
Expand Down
21 changes: 19 additions & 2 deletions src/cmd/ksh93/sh.1
Original file line number Diff line number Diff line change
Expand Up @@ -2618,9 +2618,9 @@ and
This is also known as \f2globbing\fP
or sometimes \f2filename generation\fP.
Pathname expansion is disabled if the
.B -f
.B \-f
a.k.a.
.B --noglob
.B \-\-noglob
shell option is on.
Otherwise, if certain special characters are found in a
.I word
Expand All @@ -2644,10 +2644,16 @@ are scanned only for the characters
.BR ? ,
and
.B \*(OK\^
for compatibility reasons
(in which case the
.B (
character is not special and any pattern syntax described below
that involves parentheses does not apply).
The
.B \-\-globex
option eliminates that difference by enabling this extended pattern
syntax with parentheses also for fields resulting from field splitting.
.PP
Each file name component that contains a recognized pattern character
is replaced with a lexicographically sorted set of names
that matches the pattern
Expand Down Expand Up @@ -7800,6 +7806,17 @@ if its parent directory exists on a case-insensitive file system.
This option is only present on operating systems that support
case-insensitive file systems.
.TP 8
.B globex
When this is enabled, extended patterns,
i.e. those that use a syntax involving parentheses (see
.I "Pathname Expansion"
above), are expanded in fields that result from field splitting
(see
.I "Field Splitting"
above).
These pattern types are disabled by default in field splitting contexts
for compatibility reasons.
.TP 8
.B globstar
Same as
.BR \-G .
Expand Down
69 changes: 54 additions & 15 deletions src/cmd/ksh93/sh/expand.c
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ static char *nextdir(glob_t *gp, char *dir)
return NULL;
}

int path_expand(const char *pattern, struct argnod **arghead)
int path_expand(const char *pattern, struct argnod **arghead, int musttrim)
{
glob_t gdata;
struct argnod *ap;
Expand Down Expand Up @@ -93,9 +93,34 @@ int path_expand(const char *pattern, struct argnod **arghead)
gp->gl_suffix = sufstr;
gp->gl_intr = &sh.trapnote;
suflen = 0;
if(strncmp(pattern,"~(N",3)==0)
flags &= ~GLOB_NOCHECK;
glob(pattern, flags, 0, gp);
/*
* If we expanded an unquoted variable/comsub containing a pattern, that pattern
* will have shell-special characters backslash-escaped for correct expansion of
* the value. However, the pattern must be trimmed of those escapes before
* resolving it, while keeping the escapes if the pattern doesn't resolve.
*/
if(musttrim)
{
char *trimmedpat;
sfputr(sh.strbuf,pattern,-1);
trimmedpat = sfstruse(sh.strbuf);
sh_trim(trimmedpat);
glob(trimmedpat,flags,0,gp);
/*
* If there is only one result and it is identical to the trimmed pattern, then the pattern didn't
* resolve, and we now need to replace it with the untrimmed pattern to avoid regressions with the
* expansion of unquoted variables. (Note: globlist_t (glob.h) is binary compatible with struct
* argnod (argnod.h); thus, gl_path and argval have the same offset (ARGVAL) in the struct.)
*/
if((ap = (struct argnod*)gp->gl_list) && !ap->argnxt.ap && strcmp(ap->argval,trimmedpat)==0)
{
gp->gl_list = (globlist_t*)stkalloc(sh.stk,ARGVAL+strlen(pattern)+1);
memcpy(gp->gl_list,ap,ARGVAL); /* copy fields *before* argval/gl_path */
strcpy(gp->gl_list->gl_path,pattern);
}
}
else
glob(pattern,flags,0,gp);
sh_sigcheck();
for(ap= (struct argnod*)gp->gl_list; ap; ap = ap->argnxt.ap)
{
Expand Down Expand Up @@ -144,7 +169,7 @@ int path_complete(const char *name,const char *suffix, struct argnod **arghead)
{
sufstr = suffix;
suflen = strlen(suffix);
return path_expand(name,arghead);
return path_expand(name,arghead,0);
}

#if SHOPT_BRACEPAT
Expand All @@ -155,18 +180,22 @@ static int checkfmt(Sfio_t* sp, void* vp, Sffmt_t* fp)
}

/*
* Return true if ~(...) pattern options indicate a pattern type that is
* Return 1 if ~(...) pattern options indicate a pattern type that is
* syntactically incompatible with brace expansion because it uses braces
* for its own purposes (e.g., bounds in regular expressions).
* Return 0 if not.
* Return -1 if pat is not a ~(...) pattern or no relevant options are given.
* Do the minimum parsing of the option syntax necessary to determine this.
*
* NOTE: cp is expected to point to the character *after* the '~'.
*/
static int must_disallow_bracepat(char *pat)
static int must_disallow_bracepat(char *cp, int withbackslash)
{
int32_t incompat = 0;
char shellpat = 0, minus = 0, c;
if (*pat++ != '~' || *pat++ != '(')
return 0;
while ((c = *pat++) && c != ':' && c != ')') switch (c)
char change = 0, shellpat = 0, minus = 0, c;
if ((withbackslash && *cp++ != '\\') || *cp++ != '(')
return -1;
while ((c = *cp++) && c != ':' && c != ')') switch (c)
{
case 'E': /* extended regular expression */
case 'F': /* fixed pattern */
Expand All @@ -181,9 +210,11 @@ static int must_disallow_bracepat(char *pat)
}
else
incompat &= ~(1 << c - 'A');
change = 1;
break;
case 'K': /* shell pattern */
shellpat = !minus;
change = 1;
break;
case '-': /* disable the following options */
minus = 1;
Expand All @@ -192,18 +223,18 @@ static int must_disallow_bracepat(char *pat)
minus = 0;
break;
}
return c && incompat && !shellpat;
return change ? (c && incompat && !shellpat) : -1;
}

int path_generate(struct argnod *todo, struct argnod **arghead)
int path_generate(struct argnod *todo, struct argnod **arghead, int musttrim)
/*@
assume todo!=0;
return count satisfying count>=1;
@*/
{
const int nobracepat = must_disallow_bracepat(todo->argval);
char *cp;
int brace;
int nobracepat = 0;
struct argnod *ap;
struct argnod *top = 0;
struct argnod *apin;
Expand Down Expand Up @@ -312,6 +343,14 @@ int path_generate(struct argnod *todo, struct argnod **arghead)
case '\\':
cp++;
break;
case '~':
if(!brace)
{
int r = must_disallow_bracepat(cp,musttrim);
if(r>=0)
nobracepat = r;
}
break;
case 0:
/* insert on stack */
ap->argchn.ap = top;
Expand All @@ -322,7 +361,7 @@ int path_generate(struct argnod *todo, struct argnod **arghead)
{
apin = ap->argchn.ap;
if(!sh_isoption(SH_NOGLOB) || sh_isstate(SH_COMPLETE) || sh_isstate(SH_FCOMPLETE))
brace=path_expand(ap->argval,arghead);
brace = path_expand(ap->argval,arghead,musttrim);
else
{
ap->argchn.ap = *arghead;
Expand Down
20 changes: 15 additions & 5 deletions src/cmd/ksh93/sh/macro.c
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,8 @@ typedef struct _mac_
char split; /* set when word splitting is possible */
char pattern; /* set when glob pattern expansion or matching follows */
char patfound; /* set if pattern character found */
char noextpat; /* set to disallow extended patterns */
char wasexpan; /* set when word resulted from a $ or ` expansion */
char assign; /* set for assignments */
char arith; /* set for ((...)) */
char arrayok; /* $x[] ok for arrays */
Expand Down Expand Up @@ -175,7 +177,7 @@ char *sh_mactrim(char *str, int mode)
{
/* expand only if unique */
struct argnod *arglist=0;
if((mode=path_expand(str,&arglist))==1)
if((mode=path_expand(str,&arglist,0))==1)
str = arglist->argval;
else if(mode>1)
{
Expand Down Expand Up @@ -1129,6 +1131,7 @@ static int varsub(Mac_t *mp)
char *idx = 0;
int var=1,addsub=0,oldpat=mp->pattern,idnum=0,flag=0,d;
Stk_t *stkp = sh.stk;
mp->wasexpan = 1;
retry1:
idbuff[0] = 0;
idbuff[1] = 0;
Expand Down Expand Up @@ -2112,6 +2115,7 @@ static int varsub(Mac_t *mp)
if(type)
mac_error();
fcseek(-1);
mp->wasexpan = 0;
return 0;
}

Expand All @@ -2130,7 +2134,7 @@ static void comsubst(Mac_t *mp,Shnode_t* t, int type)
Stk_t *stkp = sh.stk;
Fcin_t save;
struct slnod *saveslp = sh.st.staklist;
struct _mac_ savemac = *mp;
Mac_t savemac = *mp;
int savtop = stktell(stkp);
char lastc=0, *savptr = stkfreeze(stkp,0);
int was_history = sh_isstate(SH_HISTORY);
Expand All @@ -2139,6 +2143,7 @@ static void comsubst(Mac_t *mp,Shnode_t* t, int type)
int newlines,bufsize,nextnewlines;
Sfoff_t foff;
Namval_t *np;
savemac.wasexpan = 1;
sh.argaddr = 0;
sh.st.staklist=0;
if(type)
Expand Down Expand Up @@ -2451,11 +2456,15 @@ static void mac_copy(Mac_t *mp,const char *str, int size)
size -= len;
continue;
}
if(n==S_ESC || n==S_EPAT)
if(n==S_ESC)
sfputc(stkp,ESCAPE);
else if(n==S_EPAT)
{
/* don't allow extended patterns in this case */
mp->patfound = mp->pattern;
sfputc(stkp,ESCAPE);
if(!sh_isoption(SH_GLOBEX))
mp->noextpat = 1;
}
else if(n==S_PAT)
mp->patfound = mp->pattern;
Expand Down Expand Up @@ -2550,14 +2559,15 @@ static void endfield(Mac_t *mp,int split)
mp->atmode = 0;
if(mp->patfound)
{
int musttrim = mp->wasexpan && !mp->noextpat && strchr(argp->argval,'\\');
sh.argaddr = 0;
#if SHOPT_BRACEPAT
/* in POSIX mode, disallow brace expansion for unquoted expansions */
if(sh_isoption(SH_BRACEEXPAND) && !(sh_isoption(SH_POSIX) && mp->pattern==1))
count = path_generate(argp,mp->arghead);
count = path_generate(argp,mp->arghead,musttrim);
else
#endif /* SHOPT_BRACEPAT */
count = path_expand(argp->argval,mp->arghead);
count = path_expand(argp->argval,mp->arghead,musttrim);
if(count)
mp->fields += count;
else if(split) /* pattern is null string */
Expand Down
Loading

0 comments on commit 6d76174

Please sign in to comment.