-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
shell crashes when read times out #103
Comments
Crash dump:
|
Slightly different behavior with the shipping ksh93u+ 20120801...
So ksh-20120801 exits cleanly, since I'd set TMOUT, but u+m segfaults. Both of them have the 60 seconds behavior. Leaving aside the bizarre behavior that the interactive shell should exit just because I'd set the timeout, but that's the documented behavior...
(those timestamps look odd because 20120801 doesn't have the %T fix) |
I've got more than one crash file, sometimes it looks like this:
Sometimes |
Very odd. Try as I may, I cannot reproduce this crash at all, either on current 93u+m or 93u+. I'm on macOS 10.14.6. Could you try moving your existing bin/package make CCFLAGS='-O0 -g' This disables compiler optimisations and adds extra debugging information. If you can reproduce the crash with that ksh, then the crash log will show the exact source code line where the crash occurred, which will help in figuring out what happens. |
Ok, I got three test failures when I compiled with that line...
But it still works when I launch it and I can make it crash...
And here's the crash dump for it.
|
I just tried with HEAD, and I can get it to happen on 10.14.6 as well. |
I have If I
|
The following patch might fix the segfault with --- a/src/cmd/ksh93/sh/jobs.c
+++ b/src/cmd/ksh93/sh/jobs.c
@@ -451,6 +451,8 @@ int job_reap(register int sig)
}
#endif
}
+ if(!was_ttywait_on)
+ sh_offstate(SH_TTYWAIT);
if(errno==ECHILD)
{
errno = oerrno; |
The "60 second" behaviour is clearly intentional, there is a |
@posguy99, a closer look at your crash dump seems to suggest that your ksh is running a script on exit, as the traceback includes Unless of course @JohnoKing's diff fixes it, in which case never mind. |
The only trap I have is KEYBD. I fiddled briefly with an EXIT trap, but it's not reliable. If you kill Terminal.app with ⌘Q, ksh never gets a chance to run an EXIT trap, so I took it out again. Is it displaying the prompt that makes it look like it's running a subshell? I have dgkorn's PS1.set (well, a hacked up one) doing my prompt. I tried to only use shell constructs but it calls sed and git among other things.
|
No, it still segfaults. Crash dump:
|
Is that another issue to be opened, then? The man page talks re |
MOST interesting! If I reset PS1, I can't make the segfault occur. See here, where first it occurs, then reset PS1, then it doesn't occur. I have
My usual prompt is:
Which ends up being this after the discipline function gets through with it...
And here is prompt.ksh.
|
Ok, this is enough to make it segfault...
That's with letting the discipline function mangle PS1, I'm setting PS1 to If I remove the discipline function and set PS1 to the result of the expansion, I still get the segfault, so it's not
ksh u+ 20120801 does not segfault with the same prompt...
|
And the crash dump from the minimum reproducer segfault...
|
Twice now I have had the segfaulting subshell really mess up the parent's history file, so something's really getting corrupted somewhere. (wasn't that less than helpful?) Ok, at commit 88e8fa6 it still segfaults with the minimum reproducer.
But at commit db72f41, it does not.
|
And I obviously don't understand how this works, I reverted just that commit and it still crashes. |
The presentation of the fault has changed.
Wondering what the [1] represents... |
However, based on your crash backtrace, I came up with something. The backtrace includes diff --git a/src/cmd/ksh93/sh/jobs.c b/src/cmd/ksh93/sh/jobs.c
index 3170276..915258f 100644
--- a/src/cmd/ksh93/sh/jobs.c
+++ b/src/cmd/ksh93/sh/jobs.c
@@ -460,7 +460,7 @@ int job_reap(register int sig)
nochild = 1;
}
shp->gd->waitevent = waitevent;
- if(sh_isoption(SH_NOTIFY) && sh_isstate(SH_TTYWAIT))
+ if(job.jobcontrol && sh_isoption(SH_NOTIFY) && sh_isstate(SH_TTYWAIT))
{
outfile = sfstderr;
job_list(pw,JOB_NFLAG|JOB_NLFLAG); |
I can get the crash to occur on Linux with @posguy99's |
Yes of course, I neglected to If @posguy99 can confirm the fix as well, then I'll commit it. The reproducer is oddly specific (changing just about anything will make it fail to crash), but that's what you sometimes get with undefined behaviour. |
I can confirm that with this patch, I no longer get the crash with either my complex |
That would be |
I spoke too soon. I have not been able to reproduce it on 10.15.6 since the patch (except cancel that, because now I just did)... but now I have access to the 10.14.6 machine again, and I can on 10.14.6. I'll be damned if I can figure out a minimal reproducer this time, or even a reliable way to reproduce it. It's much harder to make it happen, while before the patch it was every time. Same TMOUT + read trick, but rather than every time, it's try like 10 times before it happens. It's easier to make it happen if you immediately launch the parent, launch the subshell, then try to make it fault. Here's one crash....
And here's another crash...
Here's one compiled with debugging...
|
Finally got a dump from 10.15.6.
|
Going by the backtraces, each of your crashes seem to end up in Let's try another shot in the dark. Simply refuse to list jobs if we're in any kind of subshell. Let me know if this makes a difference. diff --git a/src/cmd/ksh93/sh/jobs.c b/src/cmd/ksh93/sh/jobs.c
index 31b5d46..5164d32 100644
--- a/src/cmd/ksh93/sh/jobs.c
+++ b/src/cmd/ksh93/sh/jobs.c
@@ -460,7 +460,7 @@ int job_reap(register int sig)
nochild = 1;
}
shp->gd->waitevent = waitevent;
- if(job.jobcontrol && sh_isoption(SH_NOTIFY) && sh_isstate(SH_TTYWAIT))
+ if(!shp->subshell && !shp->comsub && job.jobcontrol && sh_isoption(SH_NOTIFY) && sh_isstate(SH_TTYWAIT))
{
outfile = sfstderr;
job_list(pw,JOB_NFLAG|JOB_NLFLAG); |
Hmmm. When we enter the 60 second grace period, this happens: Lines 1869 to 1872 in 9de6521
So it prints the 60 second warning message, sets a state flag ( SH_GRACE ), unblocks SIGALRM , and sets a flag indicating a trap is pending.
We could try using either the diff --git a/src/cmd/ksh93/sh/jobs.c b/src/cmd/ksh93/sh/jobs.c
index 31b5d46..b22ecbc 100644
--- a/src/cmd/ksh93/sh/jobs.c
+++ b/src/cmd/ksh93/sh/jobs.c
@@ -460,7 +460,7 @@ int job_reap(register int sig)
nochild = 1;
}
shp->gd->waitevent = waitevent;
- if(job.jobcontrol && sh_isoption(SH_NOTIFY) && sh_isstate(SH_TTYWAIT))
+ if(job.jobcontrol && sh_isoption(SH_NOTIFY) && sh_isstate(SH_TTYWAIT) && !(shp->trapnote & SH_SIGTRAP))
{
outfile = sfstderr;
job_list(pw,JOB_NFLAG|JOB_NLFLAG); |
Nope. Faulted again. And that time it faulted immediately, I didn't have to try more than once. Killed Terminal.app, relaunched, started a subshell, and it faulted.
|
OK, and this one then? diff --git a/src/cmd/ksh93/sh/jobs.c b/src/cmd/ksh93/sh/jobs.c
index 31b5d46..b22ecbc 100644
--- a/src/cmd/ksh93/sh/jobs.c
+++ b/src/cmd/ksh93/sh/jobs.c
@@ -460,7 +460,7 @@ int job_reap(register int sig)
nochild = 1;
}
shp->gd->waitevent = waitevent;
- if(job.jobcontrol && sh_isoption(SH_NOTIFY) && sh_isstate(SH_TTYWAIT))
+ if(job.jobcontrol && sh_isoption(SH_NOTIFY) && sh_isstate(SH_TTYWAIT) && !sh_isstate(SH_GRACE))
{
outfile = sfstderr;
job_list(pw,JOB_NFLAG|JOB_NLFLAG); |
Well, for what it's worth, I haven't been able to get this one to fault yet, on either 10.15.6 or 10.14.6. Have you been able to get current HEAD to fault again, irrespective of patches? |
Good news so far. Let me know if that changes. I can consistently reproduce the segfault with your minimum reproducer plus But it shouldn't really matter. The new |
Don't know if I want to put this out there, but I got a hang with the latest patch. Thought I'd try something deliberately job control related...
And now the shell is unresponsive. This is related to #89 , right? I've managed to get it to do that four times now just doing the above, but it's not reliably repeatable. If it does not display the 'Stopped' line without hitting Return again, it seems to never do it. Only if it does that on its own, and not always. I have to kill it from somewhere else or close Terminal.app.
^ from the latest one. Haven't been able to replicate the hang with HEAD yet, still trying. |
Thank you for reporting that hang. I was initially able to reproduce it on my ksh with the latest patch for this bug. And now I can't reproduce it any more, even though nothing changed. So it's another intermittent problem. And because of that I've not (yet) been able to determine if it's related to the fix for this bug at all. |
I've now been able to reproduce the intermittent hang once with a ksh compiled at 49ae483, which was before any fix for this bug. So I'm going to go with "not related". This thread is getting long, so I'll commit the new fix and close this bug. Could you open a new issue for the hang problem? |
Annnd reopening again. The fix breaks |
Unfortunately, though, I'm completely out of ideas. |
u+m introduces it, u+ doesn't have it. But which change? I thought I was getting somewhere backing up in commit history but it was like killing a flea with a nuke. And my end result got me nowhere. And I'm no C programmer. |
|
Another idea occurred to me, but it's (again) a shot in the dark. All these crashes occur while evaluating the expansions contained in So let's try to:
Here's a patch against current HEAD. @posguy99, since you're currently the only one who can reproduce these crashes, please test this and let me know if it makes a difference or not. diff --git a/src/cmd/ksh93/sh/io.c b/src/cmd/ksh93/sh/io.c
index 26ba79c..b914d22 100644
--- a/src/cmd/ksh93/sh/io.c
+++ b/src/cmd/ksh93/sh/io.c
@@ -2066,12 +2066,15 @@ static int io_prompt(Shell_t *shp,Sfio_t *iop,register int flag)
char *endprompt;
static short cmdno;
int sfflags;
+ char save_jc;
if(flag<3 && !sh_isstate(SH_INTERACTIVE))
flag = 0;
if(flag==2 && sfpkrd(sffileno(iop),buff,1,'\n',0,1) >= 0)
flag = 0;
if(flag==0)
return(sfsync(sfstderr));
+ save_jc = job.jobcontrol;
+ job.jobcontrol = 0;
sfflags = sfset(sfstderr,SF_SHARE|SF_PUBLIC|SF_READ,0);
if(!(shp->prompt=(char*)sfreserve(sfstderr,0,0)))
shp->prompt = "";
@@ -2124,6 +2127,7 @@ static int io_prompt(Shell_t *shp,Sfio_t *iop,register int flag)
done:
if(*shp->prompt && (endprompt=(char*)sfreserve(sfstderr,0,0)))
*endprompt = 0;
+ job.jobcontrol = save_jc;
sfset(sfstderr,sfflags&SF_READ|SF_SHARE|SF_PUBLIC,1);
return(sfsync(sfstderr));
}
diff --git a/src/cmd/ksh93/sh/jobs.c b/src/cmd/ksh93/sh/jobs.c
index c243634..31b5d46 100644
--- a/src/cmd/ksh93/sh/jobs.c
+++ b/src/cmd/ksh93/sh/jobs.c
@@ -460,7 +460,7 @@ int job_reap(register int sig)
nochild = 1;
}
shp->gd->waitevent = waitevent;
- if(job.jobcontrol && sh_isoption(SH_NOTIFY) && sh_isstate(SH_TTYWAIT) && !sh_isstate(SH_GRACE))
+ if(job.jobcontrol && sh_isoption(SH_NOTIFY) && sh_isstate(SH_TTYWAIT))
{
outfile = sfstderr;
job_list(pw,JOB_NFLAG|JOB_NLFLAG); |
I have not been able to crash it with this patch. Has anyone else? This is like one of those things where you visit the doctor and say "it hurts when I do this..." and the only thing the doctor can say is "well, don't do that?" |
Well, crashes are bad. It should not be possible to make the shell crash. It's good news that the patch appears to have made the crash go away. Unfortunately, the check for
No idea why that occurs on FreeBSD and not Linux or the Mac, but it still means we can't use that check. And we can't use the But it now looks like the key is to avoid calling Please test the following patch against current HEAD. Hopefully it still fixes this and #112. diff --git a/src/cmd/ksh93/sh/io.c b/src/cmd/ksh93/sh/io.c
index f1e72f4..179bd3f 100644
--- a/src/cmd/ksh93/sh/io.c
+++ b/src/cmd/ksh93/sh/io.c
@@ -2067,12 +2067,15 @@ static int io_prompt(Shell_t *shp,Sfio_t *iop,register int flag)
char *endprompt;
static short cmdno;
int sfflags;
+ int was_ttywait_on;
if(flag<3 && !sh_isstate(SH_INTERACTIVE))
flag = 0;
if(flag==2 && sfpkrd(sffileno(iop),buff,1,'\n',0,1) >= 0)
flag = 0;
if(flag==0)
return(sfsync(sfstderr));
+ was_ttywait_on = sh_isstate(SH_TTYWAIT);
+ sh_offstate(SH_TTYWAIT);
sfflags = sfset(sfstderr,SF_SHARE|SF_PUBLIC|SF_READ,0);
if(!(shp->prompt=(char*)sfreserve(sfstderr,0,0)))
shp->prompt = "";
@@ -2125,6 +2128,8 @@ static int io_prompt(Shell_t *shp,Sfio_t *iop,register int flag)
done:
if(*shp->prompt && (endprompt=(char*)sfreserve(sfstderr,0,0)))
*endprompt = 0;
+ if(was_ttywait_on)
+ sh_onstate(SH_TTYWAIT);
sfset(sfstderr,sfflags&SF_READ|SF_SHARE|SF_PUBLIC,1);
return(sfsync(sfstderr));
}
diff --git a/src/cmd/ksh93/sh/jobs.c b/src/cmd/ksh93/sh/jobs.c
index c243634..3170276 100644
--- a/src/cmd/ksh93/sh/jobs.c
+++ b/src/cmd/ksh93/sh/jobs.c
@@ -460,7 +460,7 @@ int job_reap(register int sig)
nochild = 1;
}
shp->gd->waitevent = waitevent;
- if(job.jobcontrol && sh_isoption(SH_NOTIFY) && sh_isstate(SH_TTYWAIT) && !sh_isstate(SH_GRACE))
+ if(sh_isoption(SH_NOTIFY) && sh_isstate(SH_TTYWAIT))
{
outfile = sfstderr;
job_list(pw,JOB_NFLAG|JOB_NLFLAG); |
Applied to 10.15.6. Trying to break it now. I won't have access to the 10.14.6 machine again until tonight. |
I ran it all day on 10.15.6 and it didn't crash, either line editing or trying to force the read timeout segfault. Amusingly, I put it on 10.14.6 and the first thing I got was the background job hang. :). Nothing else to report so far.
|
OK, I deleted all of that thread, it's nonsense, I made up a new testing set up and am testing again on 10.14.6. |
Results... it hasn't crashed on 10.15.6, it hasn't crashed on 10.14.6. My self-imposed carelessness last night with copying around versions of the source tree did not lead to verifiable testing (hence the now-deleted report of a worse crash). That was entirely on me and I apologize for the noise. |
No problem, thanks for your persistence! Your testing has been really important and I am now hopeful that we've got a correct fix. As for the hang, that's #111 and will hopefully be dealt with separately at some point. |
For the historic record: The fix for this bug was incomplete and papered over the problem. I believe I applied the real fix for this bug in 51b2e36 on 20th February 2021. That crash was really another manifestation of this one. In job_reap(), the local |
The fix for #103 was incomplete and papered over the problem. The real fix for this bug was applied in 51b2e36 on 20th February 2021. That crash was the same one triggered differently. #103 (comment)
The fix for #103 was incomplete and papered over the problem. The real fix for this bug was applied in 51b2e36 on 20th February 2021. That crash was the same one triggered differently. #103 (comment)
I read https://bugzilla.redhat.com/show_bug.cgi?id=573936 and tested it, didn't see the behavior the bug talked about, figured it must no longer be a problem.
Then the window I was running ksh in closed.
The
read
timed out after 5 seconds, and the prompt came back. Then ksh popped the timeout message and the memory fault, and the shell crashed.I can't make it happen if I put it in a script and call the script, but it happens repeatably if I do it interactively. If I don't set a timeout and just let the
read
sit there, it doesn't crash.(macOS 10.15.6, Version AJM 93u+m 2020-07-29)
The text was updated successfully, but these errors were encountered: