-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect internal escaping of glob pattern resulting from unquoted variable expansion #549
Comments
I've managed to trace where the incorrect escaping happens. It is the Lines 2423 to 2428 in d84067a
(note that ESCAPE is defined as a backslash char value on line 78 in macro.c)
If we delete that
|
Patch with regression testsdiff --git a/src/cmd/ksh93/tests/glob.sh b/src/cmd/ksh93/tests/glob.sh
index 1ea2e46ad..607734402 100755
--- a/src/cmd/ksh93/tests/glob.sh
+++ b/src/cmd/ksh93/tests/glob.sh
@@ -429,6 +429,14 @@ test_glob '<[^N]>' ["^"N]
test_glob '<[a-c]>' [a\-c]
test_glob '<[!N]>' [\!N]
test_glob '<[^N]>' [\^N]
+# Incorrect internal escaping of glob pattern resulting from unquoted variable expansion
+# https://github.com/ksh93/ksh/issues/549
+p='[a\-c]'; test_glob '<[a\-c]>' $p
+p='[\!N]'; test_glob '<[\!N]>' $p
+p='[\^N]'; test_glob '<[\^N]>' $p
+: > -; p='[a\-c]'; test_glob '<->' $p
+: > !; p='[\!N]'; test_glob '<!>' $p
+: > ^; p='[\^N]'; test_glob '<^>' $p
cd ..
# ====== Currently failing:
|
This appears to be a regression introduced in ksh93n- 2002-06-28 by the following diff: diff --git a/src/cmd/ksh93/sh/macro.c b/src/cmd/ksh93/sh/macro.c
index 1069f05f..132f76c8 100644
--- a/src/cmd/ksh93/sh/macro.c
+++ b/src/cmd/ksh93/sh/macro.c
@@ -1637,19 +1643,7 @@ static void mac_copy(register Mac_t *mp,register const char *str, register int s
}
while(size-->0)
{
- if((n=state[c= *(unsigned char*)cp++])==S_ESC)
- {
- mp->patfound = mp->pattern;
- stakputc(ESCAPE);
- if((n=sh_lexstates[ST_MACRO][*(unsigned char*)cp])==S_PAT || n==S_DIG || *cp=='-' || *cp=='\\')
- {
- stakputc(ESCAPE);
- stakputc(ESCAPE);
- size--;
- c = *cp++;
- }
- }
- else if(n==S_EPAT)
+ if((n=state[c= *(unsigned char*)cp++])==S_ESC || n==S_EPAT)
{
/* don't allow extended patterns in this case */
mp->patfound = mp->pattern; Reverting that change allows the reproducer to work correctly. Unfortunately, reverting it in 93u+m causes a few regression test failures:
93u+m revert patchdiff --git a/src/cmd/ksh93/sh/macro.c b/src/cmd/ksh93/sh/macro.c
index dfe087ad4..fc487ea39 100644
--- a/src/cmd/ksh93/sh/macro.c
+++ b/src/cmd/ksh93/sh/macro.c
@@ -2441,7 +2441,19 @@ static void mac_copy(register Mac_t *mp,register const char *str, register int s
size -= len;
continue;
}
- if(n==S_ESC || n==S_EPAT)
+ if(n==S_ESC)
+ {
+ mp->patfound = mp->pattern;
+ stakputc(ESCAPE);
+ if((n=sh_lexstates[ST_MACRO][*(unsigned char*)cp])==S_PAT || n==S_DIG || *cp=='-' || *cp=='\\')
+ {
+ stakputc(ESCAPE);
+ stakputc(ESCAPE);
+ size--;
+ c = *cp++;
+ }
+ }
+ else if(n==S_EPAT)
{
/* don't allow extended patterns in this case */
mp->patfound = mp->pattern;
|
When I apply the revert patch and change |
Actually, it doesn't fix it -- it just masks it. After applying the debug patch again, we get:
So now there's a spurious escaped backslash included in the pattern as well as the escaped dash. Which means...
Yup, it matches a file called Back to the drawing board. |
Regression tests (two currently failing): diff --git a/src/cmd/ksh93/tests/glob.sh b/src/cmd/ksh93/tests/glob.sh
index aa06b3d2c..e82438dc9 100755
--- a/src/cmd/ksh93/tests/glob.sh
+++ b/src/cmd/ksh93/tests/glob.sh
@@ -435,5 +435,15 @@ test_glob '<[]-z]>' [']-z']
test_glob '<[]-z]>' ["]-z"]
cd ..
+# ======
+# Incorrect internal escaping of glob pattern resulting from unquoted variable expansion
+# https://github.com/ksh93/ksh/issues/549
+p='[a\-c]'; test_glob '<[a\-c]>' $p
+p='[\!N]'; test_glob '<[\!N]>' $p
+p='[\^N]'; test_glob '<[\^N]>' $p
+: > -; p='[a\-c]'; test_glob '<->' $p
+: > !; p='[\!N]'; test_glob '<!>' $p
+: > ^; p='[\^N]'; test_glob '<^>' $p
+
# ======
exit $((Errors<125?Errors:125)) |
*** Problem 1: The following bug shows up in pathname expansion, in every ksh93 version: $ mkdir testdir $ cd testdir $ touch a b c ./- # GNU touch won't accept - as file name $ p='[a\-c]' $ echo $p a b c As the - in the bracket expression is backslash-escaped, it should not be treated as a bracket expression range operator, so the expected output is: - a c Analysis: For compatibility with the Bourne/POSIX/ksh88, extended patterns (with a syntax involving parentheses) are not expanded in fields resulting from field splitting, e.g., from unquoted variable expansions. The shell adds an extra level of backslash-escaping to fields resulting from field splitting, so that characters that indicate extended pattern syntax can be disabled by escaping them. Of course, the backslash itself also acquires an extra backslash, and that is the cause of the bug above. The fix is to remove one layer of backslash-escaping from patterns resulting from field splitting before expanding them. But we cannot do this unconditionally; we also need a new flag to check if the extra escaping is disallowing an extended pattern. See below. *** Problem 2: The fix for unwanted brace expansion of ~(E)... patterns and the like was incomplete; it only checked for ~( at the beginning of the pattern, but it may occur anywhere in the pattern. src/cmd/ksh93/sh/expand.c: - path_expand(): - Add a new 'musttrim' paraneter. - If that parameter is nonzero, trim the pattern before expanding by copying it into sh.strbuf and calling sh_trim() on it. - Replace the unexpanded result by the original untrimmed pattern if it wasn't expanded; this avoids regressions with unquoted variables and comsubs. - must_disallow_bracepat(): - Add new 'withbackslash' parameter to handle escaped patterns. - Don't bother checking for initial ~ as this will now be called by path_generate() on encountering a ~ anywhere in the pattern. - Return -1 if there is no ( or if the option's don't indicate a change in the brace expansion compatibility state. - path_generate(): - Add 'musttrim' parameter to pass on to path_expand() and must_disallow_bracepat(). - While scanning the pattern, when encountering '~' and we're not already in a brace expansion, call must_disallow_bracepat() there instead of at the start, so that the 'nobracepat' state is kept up to date throughout the scan. This fixes problem 2. src/cmd/ksh93/sh/macro.c: - Add two new fields to the Mac_t struct: 1. noextpat: set to disallow extended patterns; this is set in addition to backslash-escaping & | ( ). 2. wasexpan: set when varsub() is called (except in the 'nosub:' case) or when comsubst() is called; this should cover all the expansions that may result in field splitting. - mac_copy(): When escaping to disallow extended patterns, also set noextpat. - end_field(): Introduce a 'musttrim' flag that is set if wasexpan is set (it was a variable or comsub expansion), there is no disallowed extended pattern, and the string contains a backslash. Pass this flag on to path_expand() or path_generate(). Along with the changes in expand.c, this fixes Problem 1. Resolves: #549
@dicktyr (Richard Taityr) reports: > globs are not expanded with the following: > > % echo "/"{bin,sbin}"/*" > /bin/* /sbin/* > > % v=/; echo "$v"bin"/*" > /bin/* > > but globbing is unexpectedly performed if both parameter > substitution and brace expansion are present: > > % v=/; echo "$v"{bin,sbin}"/*" > [output omitted for the sake of brevity] So, quoted pattern characters are expanded. No es bueno. The closest I've been able to come to fixing this so far is: src/cmd/ksh93/sh/macro.c: endfield(): - Do not set musttrim if mp->quoted is set, i.e., if the argument node contains any quoted parts. This fixes the major bug. But it also partially reintroduces the previously fixed bug <#549> for cases where any part of the word is quoted -- even if the quoted part is empty. For example: $ mkdir testdir $ cd testdir $ touch a b c ./- $ p='[a\-c]' $ echo $p # OK - a c $ echo ""$p # BUG a b c $ echo $p"" # BUG a b c The fundamental problem is that, by the time endfield() is reached, we have a complete word and we can no longer distinguish between the quoted and unquoted parts of it. The Mac_t flags, such as mp->quoted, apply to the entire word. This is a fundamental flaw in the current design, where quotes are internally translated to backslash escapes for certain characters. To the best of my knowledge (and I've tried hard), it is not possible to fix this without introducing other regressions. A radical redesign of this entire mechanism is needed where this internal backslash escaping is replaced by a way for every 'struct argnod' argument node to track for each individual character whether it is quoted or not, without modifying the argument string itself. Because it is incorrect to modify the argument string. But, for the foreseeable future, that is a pipe dream, because there is no one who fully understands all this code with all its myriad kludges and workarounds going decades back. Resolves: #660
@dicktyr (Richard Taityr) reports: > globs are not expanded with the following: > > % echo "/"{bin,sbin}"/*" > /bin/* /sbin/* > > % v=/; echo "$v"bin"/*" > /bin/* > > but globbing is unexpectedly performed if both parameter > substitution and brace expansion are present: > > % v=/; echo "$v"{bin,sbin}"/*" > [output omitted for the sake of brevity] So, quoted pattern characters are expanded. No es bueno. The closest I've been able to come to fixing this so far is: src/cmd/ksh93/sh/macro.c: endfield(): - Do not set musttrim if mp->quoted is set, i.e., if the argument node contains any quoted parts. This fixes the major bug. But it also partially reintroduces the previously fixed bug <#549> for cases where any part of the word is quoted -- even if the quoted part is empty. For example: $ mkdir testdir $ cd testdir $ touch a b c ./- $ p='[a\-c]' $ echo $p # OK - a c $ echo ""$p # BUG a b c $ echo $p"" # BUG a b c The fundamental problem is that, by the time endfield() is reached, we have a complete word and we can no longer distinguish between the quoted and unquoted parts of it. The Mac_t flags, such as mp->quoted, apply to the entire word. This is a fundamental flaw in the current design, where quotes are internally translated to backslash escapes for certain characters. To the best of my knowledge (and I've tried hard), it is not possible to fix this without introducing other regressions. A radical redesign of this entire mechanism is needed where this internal backslash escaping is replaced by a way for every 'struct argnod' argument node to track for each individual character whether it is quoted or not, without modifying the argument string itself. Because it is incorrect to modify the argument string. But, for the foreseeable future, that is a pipe dream, because there is no one who fully understands all this code with all its myriad kludges and workarounds going decades back. Resolves: #660
This is a continuation of #488 (comment) as it is a separate bug.
The following bug shows up in pathname expansion, in every ksh93 version:
That should not output
b
, as the-
operator in the bracket expression is escaped (after the shell's backslash removal).When we apply the following debug patch to libast glob():
…we can see what actually gets passed to glob():
So, the backslash is incorrectly escaped itself somewhere along the way, disabling its escaping of
-
.The text was updated successfully, but these errors were encountered: