Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subshell job lost when suspended #390

Closed
McDutchie opened this issue Dec 20, 2021 · 15 comments
Closed

Subshell job lost when suspended #390

McDutchie opened this issue Dec 20, 2021 · 15 comments
Labels
blocker This had better be fixed before releasing bug Something is not working

Comments

@McDutchie
Copy link

McDutchie commented Dec 20, 2021

Reproducer: run vi in a subshell:

$ (vi)

vi opens; now press Ctrl+Z to suspend. The output is as expected:

[2] + Stopped                  (vi)

...but the exit status is 18 (SIGTSTP's signal number). That should have the 9th bit set.

Now

$ fg
(vi)

The exit status is 18 again, vi is not resumed, and the job is lost. You have to find vi's pid manually using ps and kill it.

A workaround is to fork the subshell in the first step using ulimit -t unlimited.

At least in 93u+ 2012-08-01 it's even worse – no job ever registers at all. In ksh2020, the behaviour is identical to 93u+m.

@McDutchie McDutchie added bug Something is not working blocker This had better be fixed before releasing labels Dec 20, 2021
@McDutchie
Copy link
Author

Data point: I can reproduce this bug for any external command, but not for built-ins.

@JohnoKing
Copy link

This is the relevant code for how ksh handles SIGTSTP in subshells:

/* Handles ^Z for shell builtins, subshells, and functs */
shp->lastsig = 0;
sh_onstate(SH_MONITOR);
sh_offstate(SH_STOPOK);
shp->trapnote = 0;
shp->forked = 1;
if(!shp->subshell && (sig=sh_fork(shp,0,NIL(int*))))
{
job.curpgid = 0;
job.parent = (pid_t)-1;
job_wait(sig);
shp->forked = 0;
job.parent = 0;
shp->sigflag[SIGTSTP] = 0;
/* wait for child to stop */
shp->exitval = (SH_EXITSIG|SIGTSTP);
/* return to prompt mode */
pp->mode = SH_JMPERREXIT;
}
else
{
if(shp->subshell)
sh_subfork();
/* child process, put to sleep */
sh_offstate(SH_STOPOK);
sh_offstate(SH_MONITOR);
shp->sigflag[SIGTSTP] = 0;
/* stop child job */
killpg(job.curpgid,SIGTSTP);
/* child resumes */
job_clear();
shp->exitval = (xno&SH_EXITMASK);
return;
}
}

I'll note that at lines 565-566 ksh forks subshells when handling ^Z. Perhaps the problem is that ksh isn't forking the subshell in the correct place.

@McDutchie
Copy link
Author

I've tried to fork the subshell in every other remotely plausible place I can think of in fault.c and xec.c, but I can't get anything to work properly. So far, the only functional workaround I've found is to not allow virtual subshells in the interactive main shell environment at all. Subshells of subshells can be virtual again. Since this only affects the interactive shell, performance isn't as much of a concern, and it does fix the problem.

--- a/src/cmd/ksh93/sh/subshell.c
+++ b/src/cmd/ksh93/sh/subshell.c
@@ -660,6 +660,9 @@ Sfio_t *sh_subshell(Shell_t *shp,Shnode_t *t, volatile int flags, int comsub)
 					sh_subfork();
 			}
 #endif /* _lib_fchdir */
+			/* Virtual subshells are not safe to suspend (^Z) in the interactive main shell. */
+			if(sh_isstate(SH_INTERACTIVE) && !sh.subshare)
+				sh_subfork();
 			sh_offstate(SH_INTERACTIVE);
 			sh_exec(t,flags);
 		}

Even with that, things still break horribly if you use a ${ subshare; }, and we cannot fork subshares because they'd no longer be subshares. But I've been working on separating subshares from the virtual subshell mechanism altogether (conceptually they should not need to be much more than a redirect plus a sigsetjmp and context push), so that should be solved if I get that to work.

@McDutchie
Copy link
Author

I've noticed that, for command substitutions, other shells (bash, dash, mksh, yash, zsh) simply block SIGTSTP while the command substitution is running. If you do v=$(sleep 10), ^Z does nothing while the sleep command runs. We should do the same in ksh93, which would fix the subshare problem as well, and we wouldn't need to fork comsubs.

Hmm. zsh is also buggy -- if you press ^Z during that, it gets stuck until you press ^C.

@McDutchie
Copy link
Author

McDutchie commented Dec 21, 2021

New patch, please try:

--- a/src/cmd/ksh93/sh/subshell.c
+++ b/src/cmd/ksh93/sh/subshell.c
@@ -660,6 +660,16 @@ Sfio_t *sh_subshell(Shell_t *shp,Shnode_t *t, volatile int flags, int comsub)
 					sh_subfork();
 			}
 #endif /* _lib_fchdir */
+#ifdef SIGTSTP
+			/* Virtual subshells are not safe to suspend (^Z, SIGTSTP) in the interactive main shell. */
+			if(sh_isstate(SH_INTERACTIVE))
+			{
+				if(comsub)
+					sigblock(SIGTSTP);
+				else
+					sh_subfork();
+			}
+#endif
 			sh_offstate(SH_INTERACTIVE);
 			sh_exec(t,flags);
 		}
@@ -686,6 +696,10 @@ Sfio_t *sh_subshell(Shell_t *shp,Shnode_t *t, volatile int flags, int comsub)
 	nv_restore(sp);
 	if(comsub)
 	{
+#ifdef SIGTSTP
+		if(savst.states & sh_state(SH_INTERACTIVE))
+			sigrelease(SIGTSTP);
+#endif
 		/* re-enable job control */
 		if(!sp->nofork)
 			sh_offstate(SH_NOFORK);
--- a/src/cmd/ksh93/sh/fault.c
+++ b/src/cmd/ksh93/sh/fault.c
@@ -547,7 +547,7 @@ void sh_exit(register int xno)
 		sh_offstate(SH_STOPOK);
 		shp->trapnote = 0;
 		shp->forked = 1;
-		if(!shp->subshell && (sig=sh_fork(shp,0,NIL(int*))))
+		if(sh_isstate(SH_INTERACTIVE) && (sig=sh_fork(shp,0,NIL(int*))))
 		{
 			job.curpgid = 0;
 			job.parent = (pid_t)-1;
@@ -564,7 +564,7 @@ void sh_exit(register int xno)
 		{
 			if(shp->subshell)
 				sh_subfork();
-			/* child process, put to sleep */
+			/* script or child process; put to sleep */
 			sh_offstate(SH_STOPOK);
 			sh_offstate(SH_MONITOR);
 			shp->sigflag[SIGTSTP] = 0;

edit: version 2 of patch: also update handling in sh_fault().

edit 2: probably safest to leave that sh_subfork() call in sh_fault() for scripts and child processes; they could still suspend in the middle of a virtual subshell and I don't know what would happen if we allowed that.

@McDutchie
Copy link
Author

Related commit: 48ba696 – the fix above depends on that.

@JohnoKing
Copy link

JohnoKing commented Dec 21, 2021

The patch in #390 (comment) works well on my system. It it unfortunate the top level subshell has to be forked in interactive shells, although it probably will not cause a significant performance loss since it doesn't affect command substitutions or nested subshells.

That said, is it necessary for the fix to be active in ~/.kshrc? Some testing shows kshrc doesn't interact well with SIGTSTP both before and after the patch (although I'll note SIGTSTP is even more broken in Bash's ~/.bashrc):

# With '(sleep 3)' added to ~/.kshrc
ksh $ arch/linux.i386-64/bin/ksh
^Z[2] + Stopped                  <command unknown>
ksh $ fg
<command unknown>
# The fg command above effectively kills the PS1.get discipline function
$

@McDutchie
Copy link
Author

McDutchie commented Dec 21, 2021

That said, is it necessary for the fix to be active in ~/.kshrc? Some testing shows kshrc doesn't interact well with SIGTSTP both before and after the patch (although I'll note SIGTSTP is even more broken in Bash's ~/.bashrc):

Good point. We should exclude sh_isstate(SH_PROFILE) from this and block SIGTSTP while executing profile scripts. (It may have to wait until tomorrow as I need to go and do other things now.)

@McDutchie
Copy link
Author

Profile scripts are executed here but the SH_INTERACTIVE state is not turned on until here – which is good as that avoids the need to check for sh_isstate(SH_PROFILE) in sh_subshell() and sh_exit(). All we need to do is block SIGTSTP during profile script execution.

Also, that beenhere variable could overflow as it's declared as a char, so let's fix that too.

Patch version three, please test:

--- a/src/cmd/ksh93/sh/main.c
+++ b/src/cmd/ksh93/sh/main.c
@@ -160,9 +160,14 @@ int sh_main(int ac, char *av[], Shinit_f userinit)
 	else
 		sh_onoption(SH_BRACEEXPAND);
 #endif
-	if((beenhere++)==0)
+	if(!beenhere)
 	{
+		beenhere++;
 		sh_onstate(SH_PROFILE);
+#ifdef SIGTSTP
+		/* disallow suspending profile scripts */
+		sigblock(SIGTSTP);
+#endif
 		if(shp->gd->ppid==1)
 			shp->login_sh++;
 		if(shp->login_sh >= 2)
@@ -236,6 +241,9 @@ int sh_main(int ac, char *av[], Shinit_f userinit)
 				sh_source(shp, iop, e_suidprofile);
 		}
 		shp->st.cmdname = error_info.id = command;
+#ifdef SIGTSTP
+		sigrelease(SIGTSTP);
+#endif
 		sh_offstate(SH_PROFILE);
 		if(rshflag)
 			sh_onoption(SH_RESTRICTED);
diff --git a/src/cmd/ksh93/sh/subshell.c b/src/cmd/ksh93/sh/subshell.c
index 81ab3ae0..c5af58e8 100644
--- a/src/cmd/ksh93/sh/subshell.c
+++ b/src/cmd/ksh93/sh/subshell.c
@@ -660,6 +660,16 @@ Sfio_t *sh_subshell(Shell_t *shp,Shnode_t *t, volatile int flags, int comsub)
 					sh_subfork();
 			}
 #endif /* _lib_fchdir */
+#ifdef SIGTSTP
+			/* Virtual subshells are not safe to suspend (^Z, SIGTSTP) in the interactive main shell. */
+			if(sh_isstate(SH_INTERACTIVE))
+			{
+				if(comsub)
+					sigblock(SIGTSTP);
+				else
+					sh_subfork();
+			}
+#endif
 			sh_offstate(SH_INTERACTIVE);
 			sh_exec(t,flags);
 		}
@@ -686,6 +696,10 @@ Sfio_t *sh_subshell(Shell_t *shp,Shnode_t *t, volatile int flags, int comsub)
 	nv_restore(sp);
 	if(comsub)
 	{
+#ifdef SIGTSTP
+		if(savst.states & sh_state(SH_INTERACTIVE))
+			sigrelease(SIGTSTP);
+#endif
 		/* re-enable job control */
 		if(!sp->nofork)
 			sh_offstate(SH_NOFORK);
--- a/src/cmd/ksh93/sh/fault.c
+++ b/src/cmd/ksh93/sh/fault.c
@@ -547,7 +547,7 @@ void sh_exit(register int xno)
 		sh_offstate(SH_STOPOK);
 		shp->trapnote = 0;
 		shp->forked = 1;
-		if(!shp->subshell && (sig=sh_fork(shp,0,NIL(int*))))
+		if(sh_isstate(SH_INTERACTIVE) && (sig=sh_fork(shp,0,NIL(int*))))
 		{
 			job.curpgid = 0;
 			job.parent = (pid_t)-1;
@@ -564,7 +564,7 @@ void sh_exit(register int xno)
 		{
 			if(shp->subshell)
 				sh_subfork();
-			/* child process, put to sleep */
+			/* script or child process; put to sleep */
 			sh_offstate(SH_STOPOK);
 			sh_offstate(SH_MONITOR);
 			shp->sigflag[SIGTSTP] = 0;

@JohnoKing
Copy link

JohnoKing commented Dec 21, 2021

The above patch makes the situation with SIGTSTP in kshrc/$ENV worse, as it now crashes after being sent SIGTSTP. Reproducer:

$ cat /tmp/env
set -o | grep interactive
sleep 3

$ ENV=/tmp/env arch/linux.i386-64/bin/ksh
interactive              on
^ZMemory fault(coredump)

EDIT: I should also note that top level subshells fork in the kshrc file after applying either of the patches for this issue:

$ cat /tmp/env
echo "Main shell pid: $$"
(echo "Subshell PID: ${.sh.pid}")
true

$ ENV=/tmp/env ksh   # Unpatched
Main shell pid: 290991
Subshell PID: 290991
$ ENV=/tmp/env arch/*/bin/ksh   # After patch
Main shell pid: 290993
Subshell PID: 290994

@McDutchie
Copy link
Author

McDutchie commented Dec 21, 2021

EDIT: I should also note that top level subshells fork in the kshrc file after applying either of the patches for this issue:

The cause of that particular problem is these two lines in sh_parse():

if(sh_isoption(SH_INTERACTIVE))
sh_onstate(SH_INTERACTIVE);
Those should simply be removed, they don't belong there. The parser may well be invoked in a subshell, e.g. if it runs eval, and the interactive state should never be on in subshells (reminder that this applies to the interactive state, not the interactive option).

Now why it crashes, I've no idea. :(

@JohnoKing
Copy link

JohnoKing commented Dec 21, 2021

Removing those two lines does allow subshells to avoid forking in kshrc, but it introduces another memory fault (not the same as the sigblock fault, but has similar results with the same reproducer). I've made this patch that instead just checks for SH_PROFILE as it seems like a safer approach (apply on top of #390 (comment)):

--- a/src/cmd/ksh93/sh/subshell.c
+++ b/src/cmd/ksh93/sh/subshell.c
@@ -666,7 +666,7 @@ Sfio_t *sh_subshell(Shell_t *shp,Shnode_t *t, volatile int flags, int comsub)
 			{
 				if(comsub)
 					sigblock(SIGTSTP);
-				else
+				else if(!sh_isstate(SH_PROFILE))
 					sh_subfork();
 			}
 #endif

@JohnoKing
Copy link

JohnoKing commented Dec 21, 2021

I have a new patch that uses SH_SIGIGNORE instead of sigblock and I haven't gotten it to crash yet. Diff below (edit: updated with ifdefs for SIGTSTP):

diff --git a/src/cmd/ksh93/sh/fault.c b/src/cmd/ksh93/sh/fault.c
index 92146f40..1ba4a38b 100644
--- a/src/cmd/ksh93/sh/fault.c
+++ b/src/cmd/ksh93/sh/fault.c
@@ -547,7 +547,7 @@ void sh_exit(register int xno)
 		sh_offstate(SH_STOPOK);
 		shp->trapnote = 0;
 		shp->forked = 1;
-		if(!shp->subshell && (sig=sh_fork(shp,0,NIL(int*))))
+		if(sh_isstate(SH_INTERACTIVE) && (sig=sh_fork(shp,0,NIL(int*))))
 		{
 			job.curpgid = 0;
 			job.parent = (pid_t)-1;
@@ -564,7 +564,7 @@ void sh_exit(register int xno)
 		{
 			if(shp->subshell)
 				sh_subfork();
-			/* child process, put to sleep */
+			/* script or child process; put to sleep */
 			sh_offstate(SH_STOPOK);
 			sh_offstate(SH_MONITOR);
 			shp->sigflag[SIGTSTP] = 0;
diff --git a/src/cmd/ksh93/sh/main.c b/src/cmd/ksh93/sh/main.c
index 0ce2d81a..1d732629 100644
--- a/src/cmd/ksh93/sh/main.c
+++ b/src/cmd/ksh93/sh/main.c
@@ -163,6 +163,9 @@ int sh_main(int ac, char *av[], Shinit_f userinit)
 	if((beenhere++)==0)
 	{
 		sh_onstate(SH_PROFILE);
+#ifdef SIGTSTP
+		shp->sigflag[SIGTSTP] |= SH_SIGIGNORE;
+#endif
 		if(shp->gd->ppid==1)
 			shp->login_sh++;
 		if(shp->login_sh >= 2)
@@ -236,6 +239,9 @@ int sh_main(int ac, char *av[], Shinit_f userinit)
 				sh_source(shp, iop, e_suidprofile);
 		}
 		shp->st.cmdname = error_info.id = command;
+#ifdef SIGTSTP
+		shp->sigflag[SIGTSTP] &= ~(SH_SIGIGNORE);
+#endif
 		sh_offstate(SH_PROFILE);
 		if(rshflag)
 			sh_onoption(SH_RESTRICTED);
diff --git a/src/cmd/ksh93/sh/subshell.c b/src/cmd/ksh93/sh/subshell.c
index 81ab3ae0..f76a6168 100644
--- a/src/cmd/ksh93/sh/subshell.c
+++ b/src/cmd/ksh93/sh/subshell.c
@@ -660,6 +660,16 @@ Sfio_t *sh_subshell(Shell_t *shp,Shnode_t *t, volatile int flags, int comsub)
 					sh_subfork();
 			}
 #endif /* _lib_fchdir */
+#ifdef SIGTSTP
+			/* Virtual subshells are not safe to suspend (^Z, SIGTSTP) in the interactive main shell. */
+			if(sh_isstate(SH_INTERACTIVE))
+			{
+				if(comsub)
+					sigblock(SIGTSTP);
+				else if(!sh_isstate(SH_PROFILE))
+					sh_subfork();
+			}
+#endif
 			sh_offstate(SH_INTERACTIVE);
 			sh_exec(t,flags);
 		}
@@ -686,6 +696,10 @@ Sfio_t *sh_subshell(Shell_t *shp,Shnode_t *t, volatile int flags, int comsub)
 	nv_restore(sp);
 	if(comsub)
 	{
+#ifdef SIGTSTP
+		if(savst.states & sh_state(SH_INTERACTIVE))
+			sigrelease(SIGTSTP);
+#endif
 		/* re-enable job control */
 		if(!sp->nofork)
 			sh_offstate(SH_NOFORK);

@McDutchie
Copy link
Author

Nice. That works on my end.

What's more, in my testing, using SH_SIGIGNORE also makes it work fine if we remove the lines that inappropriately set the interactive state in the parser.

The parser should have absolutely no reason to care about that, and it may cause the interactive state to be set in subshells which is wrong, so I do want those lines gone. If that is found to cause breakage elsewhere, then that is breakage that AT&T was papering over (as was their usual MO), and we should fix it properly instead.

@McDutchie
Copy link
Author

Turning on the SH_INTERACTIVE state bit in sh_parse() in fact cause at least one bug:

$ (sleep 1& echo done)                
done
$ (eval :; sleep 1& echo done)
[1]	30608
done

No job control output should be printed for a background process invoked from a subshell, not even after eval.

McDutchie added a commit that referenced this issue Dec 22, 2021
Reproducer:

	$ (sleep 1& echo done)
	done
	$ (eval "echo hi"; sleep 1& echo done)
	hi
	[1]	30587
	done

No job control output should be printed for a background process
invoked from a subshell, not even after 'eval'.

The cause: sh_parse() turns on the shell's interactive state bit
(sh_state(SH_INTERACTIVE)) if the interactive shell option is on.

This is incorrect. The parser should have no involvement with shell
interactivity in principle because that's not its domain.

Not only that, the parser may need to run in a subshell, e.g. when
executing traps or 'eval' commands (as above). By definition, a
subshell can never be interactive.

We already fixed many bugs related to job control and the shell's
interactive state. Even if these two lines previously papered over
some breakage, I can't find any now after simply removing them. If
any is found later, then it'll need to be fixed properly instead.

Related: #390
McDutchie added a commit that referenced this issue Dec 22, 2021
Reproducer: run vi in a subshell:

	$ (vi)

vi opens; now press Ctrl+Z to suspend. The output is as expected:

	[2] + Stopped                  (vi)

…but the exit status is 18 (SIGTSTP's signal number) instead of 0.

Now do:

	$ fg
	(vi)
	$

The exit status is 18 again, vi is not resumed, and the job is
lost. You have to find vi's pid manually using ps and kill it.

Forking all non-command substitution subshells invoked from the
interactive main shell is the only reliable and effective fix I've
found. I've tried to fork the subshell conditionally in every other
remotely plausible place I can think of in fault.c and xec.c, but I
can't get anything to work properly. If anyone can get this to work
without forking as much (or at all), please do submit a patch or PR
that supersedes this fix.

At least subshells of subshells don't need to fork, so the
performance impact can be limited. Plus, it's not as if most people
need maximum speed on the interactive command line. Scripts
(including login/profile scripts) are not affected at all.

Command substitutions can be handled differently. My testing shows
that all shells except ksh93 simply block SIGTSTP (the ^Z signal)
while they run. We should do the same, so they don't need to fork.

NOTE for any backporters: the subshell.c and fault.c changes depend
on commits 35b0262 and 48ba696 to work correctly.

src/cmd/ksh93/sh/subshell.c: sh_subshell():
- If the interactive shell state bit is on, then before executing
  the subshell's code:
  - for command substitutions, block SIGTSTP;
  - for other subshells, fork.
- For command substitutions, release SIGTSTP if the interactive
  shell state bit was on upon invoking the subshell.

src/cmd/ksh93/sh/fault.c:
- Instead of checking for a virtual subshell, check the shell's
  interactive state bit to decide whether to handle SIGTSTP, as
  that is only turned on in the interactive main shell.

src/cmd/ksh93/sh/main.c: sh_main():
- To avoid bugs, ignore SIGTSTP while running profile scripts.
  Blocking it doesn't work because delaying it until after
  sigrelease() will cause a crash. Thanks to @JohnoKing for this.
- While we're here, prevent a possible overflow of the 'beenhere'
  static char variable by only incrementing it once.

Co-authored-by: Johnothan King <[email protected]>
Resolves: #390
McDutchie added a commit that referenced this issue Dec 22, 2021
Reproducer:

	$ (sleep 1& echo done)
	done
	$ (eval "echo hi"; sleep 1& echo done)
	hi
	[1]	30587
	done

No job control output should be printed for a background process
invoked from a subshell, not even after 'eval'.

The cause: sh_parse() turns on the shell's interactive state bit
(sh_state(SH_INTERACTIVE)) if the interactive shell option is on.

This is incorrect. The parser should have no involvement with shell
interactivity in principle because that's not its domain.

Not only that, the parser may need to run in a subshell, e.g. when
executing traps or 'eval' commands (as above). By definition, a
subshell can never be interactive.

We already fixed many bugs related to job control and the shell's
interactive state. Even if these two lines previously papered over
some breakage, I can't find any now after simply removing them. If
any is found later, then it'll need to be fixed properly instead.

Related: #390
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocker This had better be fixed before releasing bug Something is not working
Projects
None yet
Development

No branches or pull requests

2 participants