-
Notifications
You must be signed in to change notification settings - Fork 1k
Use gentler subprocess termination semantics #857
Conversation
This allows us to rely on gentler termination semantics (os.Interrupt) for UNIX-y systems, as such interrupts do not work on Windows. Terminations are triggered by e.g. ctrl-C interrupts. Relying generally on Process.Kill() for such purposes was causing git to often exit in unclean ways, resulting in a dirty cache dir and putting users in an awkward position. We maintain the invariant that we ensure the process IS either exited, or forcibly Process.Kill()ed, before exiting monitoredCmd.run(). This allows us to ensure we do not have orphan subprocesses that exist beyond the lifetime of the parent sourceMgr.
Not sure if this was discussed elsewhere, but would it be easier to just retain kill semantics and simply delete the folder git was working on? Then you don't have to worry about what state git leaves the repo in, you just assume if it's not done, it's not in a good state. |
If you hope to terminate all of child processes, you've better to use exec.Command("taskkill", "/F", "/T", "/PID", fmt.Sprint(cmd.Process.Pid)).Run() Or call TerminateProcess recursibly. https://github.com/mattn/psutil/blob/master/psutil_windows.go#L53 |
I think git is quite careful to ensure it cleans up after itself - possibly even incrementally, so that progress is saved? idk. anyway. It might end up being safer to blow away dirs with in-process activity in them when a cancellation is received. But this isn't simple:
@mattn cool, i'll have a look tomorrow. any options for terminating just a specific child process, or does it have to be all of them at once? |
I've seen it be the case that trying to call As the name implies, I once had a need to ensure process trees were fully killed on Windows and the solution that worked for me was to use Windows Job Objects to get the OS to do all the hard work. That API is not currently wrapped by the standard library (golang/go#17608), but @alexbrainman has a package that has (almost?) everything we need at github.com/alexbrainman/ps (and already with a Go license). Depending on exactly how we want this feature to work, github.com/alexbrainman/ps may need some additions. It is missing a wrapper for |
AFAIK, SetInformationJobObject is not required. Simply create job handle with Note: As you know, using Job Object may remain files or directories that should be deleted. |
great discussion guys, thank you. i'm gonna ask we move it to #862, so that we can wrap up the UNIX-related parts here. |
oh, no wonder i was uncomfortable with the |
@sdboyer I pulled down this PR and added 4 fmt.Println statements so I could tell which case was being selected in the process monitoring loop. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was hoping that pure channel magic could have handled this, but I see why you used the atomic package.
I tested this out and beat the hell out of it. I was able to reproduce other bugs which aren't covered by this PR: the canary panic, existing locks, etc. But didn't have any trouble with linger git processes or corrupt caches.
@@ -90,6 +130,7 @@ func (c *monitoredCmd) combinedOutput(ctx context.Context) ([]byte, error) { | |||
return c.stderr.buf.Bytes(), err | |||
} | |||
|
|||
// FIXME(sdboyer) this is not actually combined output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😂
// until a message comes through the channel indicating that the command has | ||
// exited. | ||
// | ||
// TODO(sdboyer) if the signaling process errored (resulting in a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's possible. Not sure what to do about it if a hard kill doesn't do its job. Seems like at that point it's out of our hands?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, i'd tend to agree. i still want to leave the TODO just as a reminder, and in the event that someone who knows more about this sorta thing can come along and say something definitive. but i agree, it seems mostly out of our hands for now.
awesome, thank you for testing! yeah, i maybe could have accomplished this just with channels, but it would've been tremendously awkward. i have no problem with using sync/atomic when it's called for, and this rather seemed like one of those situations. even though it makes the merging... |
What does this do / why do we need it?
This should help address the overwhelming majority of reports of cache corruption, which occur as a result of Ctrl-c causing git to exit abruptly and leaving a corrupted repository state on disk. Given how common it is for people to Ctrl-c during their first dep init while dep is cloning down bunches of repos, this could help a lot. As @natefinch put it in #823:
What should your reviewer look out for in this PR?
Note that this PR sidesteps the problem for Windows. I'm hoping one of our Windows folks might be able to do a follow-up on that side - @ChrisHines? @mattn? (I did see this)
Do you need help or clarification on anything?
I really don't like checking
cmd.ProcessState != nil
as a way of determining if the process has exited, but that was the most obvious solution that jumped out at me.If folks could give this a shot to see if their issues go away, that'd be great. Of course, proving there ISN'T an problem through empirical tests is not fully possible, so just best-effort would be fine.
Which issue(s) does this PR fix?
fixes #469
fixes #790
fixes #409
fixes #823 (well enough for now, at least)