Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arrays leak out of non-forked subshells #88

Closed
McDutchie opened this issue Jul 23, 2020 · 17 comments
Closed

Arrays leak out of non-forked subshells #88

McDutchie opened this issue Jul 23, 2020 · 17 comments
Labels
bug Something is not working

Comments

@McDutchie
Copy link

McDutchie commented Jul 23, 2020

Previous discussion and reproducer: att#416

A straightforward workaround/fix would be to fork when an associative array is declared in a subshell:

diff --git a/src/cmd/ksh93/bltins/typeset.c b/src/cmd/ksh93/bltins/typeset.c
index 906d0a7..0ae7d8c 100644
--- a/src/cmd/ksh93/bltins/typeset.c
+++ b/src/cmd/ksh93/bltins/typeset.c
@@ -259,6 +259,8 @@ int    b_typeset(int argc,register char *argv[],Shbltin_t *context)
 				tdata.tname = opt_info.arg;
 				break;
 			case 'A':
+				if(tdata.sh->subshell && !tdata.sh->subshare)
+					sh_subfork();
 				flag |= NV_ARRAY;
 				break;
 			case 'C':

Before applying that fix/workaround I'd like to try to figure out if it's possible to implement a fix for the non-forking subshell mechanism. I don't really have time to dive deeply into this right now, so… any ideas, anyone?

@McDutchie McDutchie added bug Something is not working help wanted Extra attention is needed labels Jul 23, 2020
@McDutchie
Copy link
Author

Ping @rdebath (original reporter at att#416)

@rdebath
Copy link

rdebath commented Jul 23, 2020

That would be fine with me.

  1. The suppression of "real" subshells is a performance improvement.
  2. I try to avoid using subshells, because they are a performance impact; especially on other shells.
  3. Because of this, the only times I've ever used arrays in a subshell are as error trapping when detecting the shell even has arrays.

@JohnoKing
Copy link

JohnoKing commented Jul 23, 2020

This bug does not occur in ksh93u+ 2012-02-29. It is a regression that was first introduced in version 93u+ 2012-05-18.
edit: I was incorrect; see below comment.

@McDutchie
Copy link
Author

McDutchie commented Jul 24, 2020

Through a process of systematic elimination of differences, I was able to determine what between 2012-02-29 and 2012-05-18 made the difference. It is the addition of these two lines:

else if(iarray && ap && ap->fun)
errormsg(SH_DICT,ERROR_exit(1),"cannot change associative array %s to index array",nv_name(np));

In other words, the original reproducer, which was basically

unset qq; (typeset -A qq); (typeset -a qq)

"succeeded" because ksh 2012-02-29 and older failed to fail; subshells were just as leaky. A better reproducer is:

unset qq; (typeset -A qq); typeset -p qq

If the output is typeset -A qq=(), it's got the bug.

@McDutchie
Copy link
Author

Adding an assignment value, even an empty array (()), appears to be an effective workaround. The following does not trigger the bug:

unset qq; (typeset -A qq=()); typeset -p qq

(empty output)

@McDutchie
Copy link
Author

The command typeset -p qq doesn't actually work on the oldest ksh version in the multishell repo, 2002-06-28. A new reproducer:

unset qq; (typeset -A qq); typeset | grep -Fx "associative array qq"

With that, the bug is confirmed present on 2002-06-28, and we can give up on hoping there being a bugfix to backport.

@McDutchie McDutchie changed the title Associative arrays leak out of non-forked subshells Arrays leak out of non-forked subshells Jul 24, 2020
@McDutchie
Copy link
Author

McDutchie commented Jul 24, 2020

Indexed arrays are leaky in the same way:

$ arch/*/bin/ksh -c 'unset qq; (typeset -a qq); typeset -p qq'
typeset -a qq

…and again, using typeset -a qq=() is an effective workaround.

@McDutchie
Copy link
Author

I think I found the problem, using that workaround as a clue. For array declarations without an assignment, this call to _nv_unset was not executed for arrays; this is necessary to make unset variables local to a virtual subshell.

if(comvar || (shp->last_root==shp->var_tree && (tp->tp || (!shp->st.real_fun && (nvflags&NV_STATIC)) || (!(flag&(NV_EXPORT|NV_RDONLY)) && nv_isattr(np,(NV_EXPORT|NV_IMPORT))==(NV_EXPORT|NV_IMPORT)))))
{
_nv_unset(np,0);
}

The following diff fixes the reproducer for me; please test.

diff --git a/src/cmd/ksh93/bltins/typeset.c b/src/cmd/ksh93/bltins/typeset.c
index 906d0a7..7c03078 100644
--- a/src/cmd/ksh93/bltins/typeset.c
+++ b/src/cmd/ksh93/bltins/typeset.c
@@ -693,10 +693,8 @@ static int     setall(char **argv,register int flag,Dt_t *troot,struct tdata *tp
 			}
 			if(!nv_isarray(np) && !strchr(name,'=') && !(shp->envlist  && nv_onlist(shp->envlist,name)))
 			{
-				if(comvar || (shp->last_root==shp->var_tree && (tp->tp || (!shp->st.real_fun && (nvflags&NV_STATIC)) || (!(flag&(NV_EXPORT|NV_RDONLY)) && nv_isattr(np,(NV_EXPORT|NV_IMPORT))==(NV_EXPORT|NV_IMPORT)))))
-{
+				if(comvar || iarray || (flag&NV_ARRAY) || (shp->last_root==shp->var_tree && (tp->tp || (!shp->st.real_fun && (nvflags&NV_STATIC)) || (!(flag&(NV_EXPORT|NV_RDONLY)) && nv_isattr(np,(NV_EXPORT|NV_IMPORT))==(NV_EXPORT|NV_IMPORT)))))
 					_nv_unset(np,0);
-}
 			}
 			if(troot==shp->var_tree)
 			{

Unfortunately, this diff causes one regression test failure:

test arrays begins at 2020-07-24+05:36:29
	arrays.sh[185]: initial value not preserved when typecast to associative
test arrays failed at 2020-07-24+05:36:29 with exit code 1 [ 146 tests 1 error ]

…but that will have to wait as the daystar is rising. Goodnight.

@JohnoKing
Copy link

The regression test failure can be fixed by checking if the array is empty with nv_isnull(np):

--- a/src/cmd/ksh93/bltins/typeset.c
+++ b/src/cmd/ksh93/bltins/typeset.c
@@ -693,10 +693,9 @@ static int     setall(char **argv,register int flag,Dt_t *troot,struct tdata *tp
 			}
 			if(!nv_isarray(np) && !strchr(name,'=') && !(shp->envlist  && nv_onlist(shp->envlist,name)))
 			{
-				if(comvar || (shp->last_root==shp->var_tree && (tp->tp || (!shp->st.real_fun && (nvflags&NV_STATIC)) || (!(flag&(NV_EXPORT|NV_RDONLY)) && nv_isattr(np,(NV_EXPORT|NV_IMPORT))==(NV_EXPORT|NV_IMPORT)))))
-{
+				if(comvar || ((iarray || (nvflags&NV_ARRAY)) && nv_isnull(np)) || (shp->last_root==shp->var_tree && (tp->tp || (!shp->st.real_fun && (nvflags&NV_STATIC))
+					|| (!(flag&(NV_EXPORT|NV_RDONLY)) && nv_isattr(np,(NV_EXPORT|NV_IMPORT))==(NV_EXPORT|NV_IMPORT)))))
 					_nv_unset(np,0);
-}
 			}
 			if(troot==shp->var_tree)
 			{

I'll note that the regression test failure for associative arrays also applies to indexed arrays, so nv_isnull must be used for both kinds of arrays. Reproducer:

# This should be added to arrays2.sh, since the regression test failure for
# associative arrays was also occurring with indexed arrays after `typeset -a`.
typeset foo=bar
typeset -a foo
if  [[ ${foo[0]} != bar ]]; then
    err_exit $'initial value of $foo not preserved after \'typeset -a\''
fi

@McDutchie
Copy link
Author

McDutchie commented Jul 24, 2020

Thanks, that works.

This is also a nice chance to make that whole tangled mess of parentheses more readable. Sometimes it's just counterproductive to make the precedence of && over || explicit; we can make things more readable by removing superfluous parentheses, combining those two ifs, and adding some unorthodox indentation.

diff --git a/src/cmd/ksh93/bltins/typeset.c b/src/cmd/ksh93/bltins/typeset.c
index 906d0a7..cc1a799 100644
--- a/src/cmd/ksh93/bltins/typeset.c
+++ b/src/cmd/ksh93/bltins/typeset.c
@@ -691,13 +691,28 @@ static int     setall(char **argv,register int flag,Dt_t *troot,struct tdata *tp
 				if(!comvar && !iarray)
 					continue;
 			}
-			if(!nv_isarray(np) && !strchr(name,'=') && !(shp->envlist  && nv_onlist(shp->envlist,name)))
-			{
-				if(comvar || (shp->last_root==shp->var_tree && (tp->tp || (!shp->st.real_fun && (nvflags&NV_STATIC)) || (!(flag&(NV_EXPORT|NV_RDONLY)) && nv_isattr(np,(NV_EXPORT|NV_IMPORT))==(NV_EXPORT|NV_IMPORT)))))
-{
-					_nv_unset(np,0);
-}
-			}
+
+			/* Determine if we need to create a null value in the current scope using _nv_unset(). */
+			if(!nv_isarray(np) &&
+				!strchr(name,'=') &&
+				!(shp->envlist && nv_onlist(shp->envlist,name)) &&
+				(
+					/* compound variable? */
+					comvar ||
+					/* indexed or associative array that is initialized as null? */
+					(iarray || (nvflags&NV_ARRAY)) && nv_isnull(np) ||
+					/* variable exists in parent of virtual subshell? [EDIT: this comment is wrong] */
+					shp->last_root==shp->var_tree &&
+					(
+						tp->tp ||
+						!shp->st.real_fun && (nvflags&NV_STATIC) ||
+						!(flag&(NV_EXPORT|NV_RDONLY)) &&
+							nv_isattr(np,(NV_EXPORT|NV_IMPORT))==(NV_EXPORT|NV_IMPORT)
+					)
+				)
+			)
+				_nv_unset(np,0);
+
 			if(troot==shp->var_tree)
 			{
 				if(iarray)

I'm not quite sure what shp->last_root==shp->var_tree actually means or what the code following that does exactly, so I'm not sure that comment is correct... any ideas?

edit: Comment is not correct. Inserting a strategically placed debug warning and running the regression tests revealed that the whole block following that comment is for supporting type commands created by typeset -T and enum, when declaring variables of such types without a value, such as foo_t var.

@McDutchie
Copy link
Author

McDutchie commented Jul 24, 2020

I'm experimenting further, learning to understand this code.

It turns out that things appear to work just fine if we replace the entirety of the tests in the old initial if statement

if(!nv_isarray(np) && !strchr(name,'=') && !(shp->envlist  && nv_onlist(shp->envlist,name)))

with a simple test for nv_isnull(np). Then we also don't need to add the nv_isnull(np) test to the check for an array. No regression test failures occur when I test this.

It's also quite logical, as this whole block decides whether a null scope needs to be created, which by definition should not be done for non-null nodes.

(FYI, I figured out what these three deletable tests did: the first checks if it's a pre-existing array, the second checks if it's a normal argument in the form of an assignment, the third checks envlist to see if it's either an assignment preceding the command like x=foo cmd, or an assignment added to a declaration commadn like export foo=bar. All of this appears to be superfluous if we use nv_isnull(np).)

@McDutchie
Copy link
Author

New diff:

diff --git a/src/cmd/ksh93/bltins/typeset.c b/src/cmd/ksh93/bltins/typeset.c
index 906d0a7..723d5d2 100644
--- a/src/cmd/ksh93/bltins/typeset.c
+++ b/src/cmd/ksh93/bltins/typeset.c
@@ -691,13 +691,27 @@ static int     setall(char **argv,register int flag,Dt_t *troot,struct tdata *tp
 				if(!comvar && !iarray)
 					continue;
 			}
-			if(!nv_isarray(np) && !strchr(name,'=') && !(shp->envlist  && nv_onlist(shp->envlist,name)))
-			{
-				if(comvar || (shp->last_root==shp->var_tree && (tp->tp || (!shp->st.real_fun && (nvflags&NV_STATIC)) || (!(flag&(NV_EXPORT|NV_RDONLY)) && nv_isattr(np,(NV_EXPORT|NV_IMPORT))==(NV_EXPORT|NV_IMPORT)))))
-{
-					_nv_unset(np,0);
-}
-			}
+
+			/* Determine if we need to create a null value in the current scope using _nv_unset() */
+			if (
+				nv_isnull(np) && (
+					/* indexed array, or... */
+					iarray ||
+					/* associative array, or ... */
+					nvflags & NV_ARRAY ||
+					/* compound variable, or... */
+					comvar ||
+					/* 'Foo_t var', where Foo_t is type command generated by 'typeset -T' or 'enum'? */
+					shp->last_root == shp->var_tree && (
+						tp->tp ||
+						!shp->st.real_fun && (nvflags & NV_STATIC) ||
+						!(flag & (NV_EXPORT | NV_RDONLY)) &&
+							nv_isattr(np,(NV_EXPORT | NV_IMPORT)) == (NV_EXPORT | NV_IMPORT)
+					)
+				)
+			)
+				_nv_unset(np,0);
+
 			if(troot==shp->var_tree)
 			{
 				if(iarray)

@McDutchie
Copy link
Author

McDutchie commented Jul 24, 2020

Following new regression test (for attributes.sh) reveals that two more attributes leak out of subshells for unset variables: -l and -u.

# unset readonly exported variables
unset expect
typeset -A expect=(
	[a]='typeset -x -r -a foo'
	[b]='typeset -x -r -b foo'
	[i]='typeset -x -r -i foo'
	[l]='typeset -x -r -l foo'
	[n]='typeset -n -x -r foo'
	[s]='typeset -x -r -s -i 0 foo=0'
	[u]='typeset -x -r -u foo'
	[A]='typeset -x -r -A foo=()'
	[E]='typeset -x -r -E foo'
	[F]='typeset -x -r -F foo'
	[H]='typeset -x -r -H foo'
	[L]='typeset -x -r -L 0 foo'
	[R]='typeset -x -r -R 0 foo'
	[X]='typeset -x -r -X 32 foo'
	[S]='typeset -x -r foo'
	[T]='typeset -x -r foo'
	[Z]='typeset -x -r -Z 0 -R 0 foo'
)
for flag in a b i l n s u A E F H L R X S T Z
do	unset foo
	actual=$(
		(typeset "-$flag" foo; readonly foo; export foo; typeset -p foo)
		typeset -p foo  # check that no attribute leaked out of subshell
	)
	if	[[ $actual != "${expect[$flag]}" ]]
	then	err_exit "unset readonly exported variable of type -$flag:" \
			"expected $(printf %q "${expect[$flag]}"), got $(printf %q "$actual")"
	fi
done
unset expect

@McDutchie
Copy link
Author

McDutchie commented Jul 24, 2020

More experimentation seems to reveal that all of the tests in that jumbled mess of parentheses are redundant if we just check for two things: no value was assigned and there are no pre-existing attributes (such as NV_EXPORT, NV_READONLY, etc).

Which means we can (hopefully) simplify it to the following diff:

diff --git a/src/cmd/ksh93/bltins/typeset.c b/src/cmd/ksh93/bltins/typeset.c
index 906d0a7..743c2db 100644
--- a/src/cmd/ksh93/bltins/typeset.c
+++ b/src/cmd/ksh93/bltins/typeset.c
@@ -691,13 +691,11 @@ static int     setall(char **argv,register int flag,Dt_t *troot,struct tdata *tp
 				if(!comvar && !iarray)
 					continue;
 			}
-			if(!nv_isarray(np) && !strchr(name,'=') && !(shp->envlist  && nv_onlist(shp->envlist,name)))
-			{
-				if(comvar || (shp->last_root==shp->var_tree && (tp->tp || (!shp->st.real_fun && (nvflags&NV_STATIC)) || (!(flag&(NV_EXPORT|NV_RDONLY)) && nv_isattr(np,(NV_EXPORT|NV_IMPORT))==(NV_EXPORT|NV_IMPORT)))))
-{
-					_nv_unset(np,0);
-}
-			}
+
+			/* No value or attribute assigned? Create null value in current scope */
+			if(nv_isnull(np) && !nv_isattr(np,~0))
+				_nv_unset(np,0);
+
 			if(troot==shp->var_tree)
 			{
 				if(iarray)

…and this also fixes the subshell leaks of both array attributes as well as the -l and -u attributes. All regression tests (old and new, as well as the modernish regression test suite) pass.

@McDutchie
Copy link
Author

Ah, but sadly it is not that simple...

$ arch/*/bin/ksh -c 'export x; (typeset -a x); typeset -p x'
typeset -x -a x

With the above diff, if an attribute exists in the parent shell, the leak returns.

@JohnoKing
Copy link

Ah, but sadly it is not that simple...

$ arch/*/bin/ksh -c 'export x; (typeset -a x); typeset -p x'
typeset -x -a x

With the above diff, if an attribute exists in the parent shell, the leak returns.

I can get that leak to go away by replacing !nv_isattr(np,~0) with !(flag & (NV_EXPORT | NV_RDONLY)):

--- a/src/cmd/ksh93/bltins/typeset.c
+++ b/src/cmd/ksh93/bltins/typeset.c
@@ -691,13 +691,11 @@ static int     setall(char **argv,register int flag,Dt_t *troot,struct tdata *tp
 				if(!comvar && !iarray)
 					continue;
 			}
-			if(!nv_isarray(np) && !strchr(name,'=') && !(shp->envlist  && nv_onlist(shp->envlist,name)))
-			{
-				if(comvar || (shp->last_root==shp->var_tree && (tp->tp || (!shp->st.real_fun && (nvflags&NV_STATIC)) || (!(flag&(NV_EXPORT|NV_RDONLY)) && nv_isattr(np,(NV_EXPORT|NV_IMPORT))==(NV_EXPORT|NV_IMPORT)))))
-{
-					_nv_unset(np,0);
-}
-			}
+
+			/* No value assigned? Create null value in current scope */
+			if(nv_isnull(np) && !(flag & (NV_EXPORT | NV_RDONLY)))
+				_nv_unset(np,0);
+
 			if(troot==shp->var_tree)
 			{
 				if(iarray)

@McDutchie
Copy link
Author

I can get that leak to go away by replacing !nv_isattr(np,~0) with !(flag & (NV_EXPORT | NV_RDONLY)):

That works to fix the leak. However, a pre-existing export flag is always cleared, which is not good:

$ ksh -c 'export foo; typeset -i foo; typeset -p foo'
typeset -i foo

(output should be typeset -x -i foo)

Of course, clearing pre-existing attributes is expected behaviour for unset. Why are we even calling _nv_unset when we merely want to create a local scope for the current virtual subshell? There is a function for that, sh_assignok(), which you had already fixed for unall() in 7b994b6 and bd3e2a8. _nv_unset() calls sh_assignok() as well.

According to my testing, the following appears to work 100%, fixing all the leaks without clearing any pre-existing attributes:

diff --git a/src/cmd/ksh93/bltins/typeset.c b/src/cmd/ksh93/bltins/typeset.c
index 906d0a7..d917506 100644
--- a/src/cmd/ksh93/bltins/typeset.c
+++ b/src/cmd/ksh93/bltins/typeset.c
@@ -691,13 +691,22 @@ static int     setall(char **argv,register int flag,Dt_t *troot,struct tdata *tp
 				if(!comvar && !iarray)
 					continue;
 			}
-			if(!nv_isarray(np) && !strchr(name,'=') && !(shp->envlist  && nv_onlist(shp->envlist,name)))
+
+			/* Create local scope for virtual subshell */
+			if(shp->subshell)
 			{
-				if(comvar || (shp->last_root==shp->var_tree && (tp->tp || (!shp->st.real_fun && (nvflags&NV_STATIC)) || (!(flag&(NV_EXPORT|NV_RDONLY)) && nv_isattr(np,(NV_EXPORT|NV_IMPORT))==(NV_EXPORT|NV_IMPORT)))))
-{
-					_nv_unset(np,0);
-}
+				if(!nv_isattr(np,NV_NODISC|NV_ARRAY) && !nv_isvtree(np))
+				{
+					/*
+					 * Variables with internal trap/discipline functions (LC_*, LINENO, etc.) need to be
+					 * cloned, as moving them will remove the discipline function.
+					 */
+					np=sh_assignok(np,2);
+				}
+				else
+					np=sh_assignok(np,0);
 			}
+
 			if(troot==shp->var_tree)
 			{
 				if(iarray)

New regression test for attributes.sh (which exposed another minor issue unrelated to this one, see the TODOs):

# unset exported readonly variables, combined with all other possible attributes
typeset -A expect=(
	[a]='typeset -x -r -a foo'
	[b]='typeset -x -r -b foo'
	[i]='typeset -x -r -i foo'
	[i37]='typeset -x -r -i 37 foo'
	[l]='typeset -x -r -l foo'
	[n]='typeset -n -r foo'
	[s]='typeset -x -r -s -i 0 foo=0'
	[u]='typeset -x -r -u foo'
	[A]='typeset -x -r -A foo=()'
	[C]='typeset -x -r foo=()'
	[E]='typeset -x -r -E foo'
	[E12]='typeset -x -r -E 12 foo'
	[F]='typeset -x -r -F foo'
	[F12]='typeset -x -r -F 12 foo'
	[H]='typeset -x -r -H foo'
	[L]='typeset -x -r -L 0 foo'
#	[L17]='typeset -x -r -L 17 foo'		# TODO: outputs 'typeset -x -r -L 0 foo'
	[Mtolower]='typeset -x -r -l foo'
	[Mtoupper]='typeset -x -r -u foo'
	[R]='typeset -x -r -R 0 foo'
#	[R17]='typeset -x -r -R 17 foo'		# TODO: outputs 'typeset -x -r -R 0 foo'
	[X]='typeset -x -r -X 32 foo'
	[X17]='typeset -x -r -X 17 foo'
	[S]='typeset -x -r foo'
	[T]='typeset -x -r foo'
	[Z]='typeset -x -r -Z 0 -R 0 foo'
#	[Z13]='typeset -x -r -Z 13 -R 13 foo'	# TODO: outputs 'typeset -x -r -Z 0 -R 0 foo'
)
for flag in a b i i37 l n s u A C E E12 F F12 H L Mtolower Mtoupper R X X17 S T Z
do	unset foo
	actual=$(
		redirect 2>&1
		export foo
		(typeset "-$flag" foo; readonly foo; typeset -p foo)
		typeset +x foo  # unexport
		leak=${ typeset -p foo; }
		[[ -n $leak ]] && print "SUBSHELL LEAK: $leak"
	)
	if	[[ $actual != "${expect[$flag]}" ]]
	then	err_exit "unset exported readonly with -$flag:" \
			"expected $(printf %q "${expect[$flag]}"), got $(printf %q "$actual")"
	fi
done
unset expect

@McDutchie McDutchie removed the help wanted Extra attention is needed label Jul 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is not working
Projects
None yet
Development

No branches or pull requests

3 participants