Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

syscall: add support for Windows job objects #17608

Open
osher opened this issue Oct 26, 2016 · 10 comments
Open

syscall: add support for Windows job objects #17608

osher opened this issue Oct 26, 2016 · 10 comments
Labels
FeatureRequest help wanted NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Windows
Milestone

Comments

@osher
Copy link

osher commented Oct 26, 2016

I would like to reopen #6720.
I excitedly read all the thread, ending with the greatest facepalm I had in the past years.

Look, this is not a solution!
If nodejs knows how to pass these signals between processes on windows - then no reason that golang should not. No magic involved.

Full story bellow.
Mind that if the go shim is out of the loop - it works as expected, so I deduct it has to be something that golang does wrong.

I would appreciate your input on this...

What version of Go are you using (go version)?

donno. the one that [email protected] uses

What operating system and processor architecture are you using (go env)?

Windows 10, x64

What did you do?

I'm using nodist which allows me to run different versions of nodejs side by side.
nodist uses a shim layer of go to have a look around on the env vars and local folder to detect the desired node version and call the relevant executable accordingly

See here:
nodists/nodist#179

What did you expect to see?

nodists/nodist#179

What did you see instead?

nodists/nodist#179

@rakyll rakyll changed the title SIGTERM does not propagate well os: SIGTERM does not propagate well Oct 26, 2016
@bradfitz
Copy link
Contributor

I excitedly read all the thread, ending with the greatest facepalm I had in the past years.

Look, this is not a solution!

This is not a good way to start a bug report. You've now established yourself as an adversary rather than a collaborator.

Something to keep in mind for future bug reports. Tone matters.

@rsc
Copy link
Contributor

rsc commented Oct 26, 2016

I'm sorry, but we need more information to understand what you are asking for on the Go side. I did read nodist/issues#179, but there is not a line of Go code in sight.

You also mentioned reopening #6720, which as I read it is about p.Signal(os.Interrupt), where p is an os.Process, not being implemented and therefore always returning an error on Windows. That issue was closed by 05cc78d8, which did:

$ git show 05cc78d
commit 05cc78d8d32f6af6fc4373e10da0b4a12f0a1ad4
Author:     Alex Brainman <[email protected]>
AuthorDate: Fri May 23 12:29:29 2014 +1000
Commit:     Alex Brainman <[email protected]>
CommitDate: Fri May 23 12:29:29 2014 +1000

    os: document that Interrupt might not work on every os

    Fixes #6720.

    LGTM=bradfitz
    R=golang-codereviews, iant, bradfitz
    CC=golang-codereviews
    https://golang.org/cl/92340043

diff --git a/src/pkg/os/doc.go b/src/pkg/os/doc.go
index bc700b6..389a8eb 100644
--- a/src/pkg/os/doc.go
+++ b/src/pkg/os/doc.go
@@ -46,6 +46,7 @@ func (p *Process) Wait() (*ProcessState, error) {
 }

 // Signal sends a signal to the Process.
+// Sending Interrupt on Windows is not implemented.
 func (p *Process) Signal(sig Signal) error {
    return p.signal(sig)
 }
$ 

As discussed in #6720, we don't know of an obvious way to implement p.Signal(os.Interrupt) on Windows, or we would have. But none of us are Windows experts. A few long time users, maybe, but not experts. We do the best we can by reading the Microsoft documentation and a liberal amount of Stack Overflow and trial and error.

One possibility is that nodejs ctx.child.kill('SIGTERM') is sending a Ctrl-Break to the entire process group, but if that were the case I don't understand why that wouldn't hit both the go wrapper and the nodejs server it started.

Another possibility is that nodejs ctx.child.kill('SIGTERM') knows some kind of magic to send a Ctrl-Break to just that one process. If so and you can tell us what that magic is, we can probably implement it in Go.

Another possibility is that SIGTERM doesn't mean Ctrl-Break at all here. But then what does it mean?

I looked in github.com/nodejs/node and there is no mention of what SIGTERM means on Windows. They must be using the Microsoft C runtime library, but what does that do? I looked in the mingw sources and the closest I found was mingw/include/signal.h, which says:

/*
 * The actual signal values. Using other values with signal
 * produces a SIG_ERR return value.
 *
 * NOTE: SIGINT is produced when the user presses Ctrl-C.
 *       SIGILL has not been tested.
 *       SIGFPE doesn't seem to work?
 *       SIGSEGV does not catch writing to a NULL pointer (that shuts down
 *               your app; can you say "segmentation violation core dump"?).
 *       SIGTERM comes from what kind of termination request exactly?
 *       SIGBREAK is indeed produced by pressing Ctrl-Break.
 *       SIGABRT is produced by calling abort.
 * TODO: The above results may be related to not installing an appropriate
 *       structured exception handling frame. Results may be better if I ever
 *       manage to get the SEH stuff down.
 */

"SIGTERM comes from what kind of termination request exactly?". Exactly.

It sounds like you maybe you know what Go should be doing instead, or maybe what SIGTERM means on Windows. If so, can you tell us? Thanks.

@rsc rsc added this to the Go1.8Maybe milestone Oct 26, 2016
@rsc rsc added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. labels Oct 26, 2016
@pbnjay
Copy link
Contributor

pbnjay commented Oct 27, 2016

It looks like Node uses libuv, which treats SIGKILL, SIGTERM, and SIGINT the same on windows: https://github.com/nodejs/node/blob/db1087c9757c31a82c50a1eba368d8cba95b57d0/deps/uv/src/win/process.c#L1166

@rsc
Copy link
Contributor

rsc commented Oct 27, 2016

@pbnjay, thanks for finding that.

To recap, there is a Node parent that ran a Go process that ran a Node child.

On Unix systems, if the Node parent sends SIGTERM ("please stop") to the Go process, then the Go process's signal handler runs and can do something in response to the signal, like send SIGTERM to the Node child, wait for the Node child to exit gracefully, and then exit itself.

On Windows, my reading of what @pbnjay found is that the Node parent calls TerminateProcess on the Go process. That doesn't send a nice "please stop" to the Go process. It just terminates it, like Unix SIGKILL. There is no signal sent, no time to react; the operating system just destroys the process. In this case the Node child is left behind. It would have to be: the Go process had no chance to do anything.

On Linux, there is still a way to cope with SIGKILL. When the Go program starts the subprocess, it can pass a SysProcAttr with Pdeathsig!=0, which makes the forked child call prctl(PR_SET_PDEATHSIG, Pdeathsig) before exec'ing the actual new program. That setting means "if my parent dies, send me this signal", so that even if the Go program dies with kill -9 or some other path that forgets to do cleanup, the child can be notified that the parent is gone and clean up after itself.

It looks like maybe the Windows equivalent of PR_SET_PDEATHSIG is "job objects". It is unclear to me whether this still works in current versions of Windows, but some way to support that would be the obvious next thing to try. I'm going to retitle this bug to be about that.
https://msdn.microsoft.com/en-us/library/ms682409(VS.85).aspx
http://stackoverflow.com/questions/53208/how-do-i-automatically-destroy-child-processes-in-windows

@rsc rsc changed the title os: SIGTERM does not propagate well syscall: add support for Windows job objects Oct 27, 2016
@osher
Copy link
Author

osher commented Oct 27, 2016

I'm away from keyboard until Sunday, however - I wanted to say I appreciate the seriousity and professionalism the thread gets, despite the aforementioned ...tone.

I wish I had more lower level info - I would have shared it at start.

I can be just a bit more elaborate, I will next week.

@osher
Copy link
Author

osher commented Oct 30, 2016

Here's the picture, I hope it helps a little.
go nodist

@osher
Copy link
Author

osher commented Oct 30, 2016

In case it implies on your priorities - I owe you an update.

Specifically for my case a workaround has been found:
Instead of passing through the Nodist go shim and resolving the paths the usual way - I use a nodejs api to invoke the spawned server on the same executable as the current process that runs the test-suite (the API is called process.execPath).

This results in removing the 3rd and 2nd last steps, jumping streight to the last.

In this case, the SIGTERM terminates the server and the system works as expected without leaving hung processes.

However, we're still get hanging processes whenever I Ctrl+C to the main process.
Mind that my actual use-case is complicated by one more level
(generator-test-suite --> generated-project-test-suite --> generated-project-server
instead of
project-tet-siote-->project server)

FYI.

@mattn
Copy link
Member

mattn commented Oct 30, 2016

FYI, AssignProcessToJobObject fail on Windows7. AFAIK, it have to terminate process with walking children processes using CreateToolhelp32Snapshot on Windows7 or older. One another issue, as alex said in #6720, GenerateConsoleCtrlEvent have another problem. The API require "console". So if the process doesn't have a console, it doesn't work. For example, if the process call AllocConsole, it works fine.
Below is a implementation using GenerateConsoleCtrlEvent & CREATE_NEW_PROCESS_GROUP .
https://github.com/mattn/goemon/blob/master/proc_windows.go

@alexbrainman
Copy link
Member

It looks like maybe the Windows equivalent of PR_SET_PDEATHSIG is "job objects".

I do not know about PR_SET_PDEATHSIG, but you can use "job objects" to control process groups on Windows. I even have github.com/alexbrainman/ps package with some APIs. We have used "job objects" to collect benchmark run statistics in golang.org/x/benchmarks/driver. From what I remember "job objects" provide facilities for child processes to start their own group too, so you would need some cooperation from your clients.

It is unclear to me whether this still works in current versions of Windows

I think it works on all Go supported Windows versions.

Alex

@rsc rsc modified the milestones: Go1.9Early, Go1.8Maybe Nov 2, 2016
@bradfitz bradfitz removed the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Jan 6, 2017
@bradfitz bradfitz modified the milestones: Go1.10, Go1.9Early May 3, 2017
@bradfitz
Copy link
Contributor

bradfitz commented May 3, 2017

/cc @johnsonj

@gopherbot gopherbot modified the milestones: Go1.11, Unplanned May 23, 2018
fho added a commit to simplesurance/baur that referenced this issue Dec 12, 2022
…eebsd

When baur executes a task and the baur process gets killed, the task subprocess
continues to run. This was reproduced on Linux, on other OSes it was not tested
but they are probably also affected.

Prevent that this can happen by setting Pdeathsig for the executed process.
If the parent thread is killed, the specified signal (SIGKILL) will be sent to
the child.
Pdeathsig is sent when then parent thread dies, to prevent that thread on which
the go-routine ran that started the process dies, runtime.LockOSThread is
called[^1].

This fixes the issue only on Linux and FreeBSD.
Windows & Darwin do not have Pdeathsig in their SysProcAttrs.
To achieve the same on Windows support for job objects in Golang might be
needed[^2].

[^1]: golang/go#27505 (comment)
[^2]: golang/go#17608
fho added a commit to simplesurance/baur that referenced this issue Dec 12, 2022
When baur executes a task and the baur process gets killed, the task subprocess
continues to run. This was reproduced on Linux, on other OSes it was not tested
but they are probably also affected.

Prevent that this can happen by setting Pdeathsig for the executed process.
If the parent thread is killed, the specified signal (SIGKILL) will be sent to
the child.
Pdeathsig is sent when then parent thread dies, to prevent that thread on which
the go-routine ran that started the process dies, runtime.LockOSThread is
called[^1].

This fixes the issue only on Linux and FreeBSD.
Windows & Darwin do not have Pdeathsig in their SysProcAttrs.
To achieve the same on Windows support for job objects in Golang might be
needed[^2].

[^1]: golang/go#27505 (comment)
[^2]: golang/go#17608
fho added a commit to simplesurance/baur that referenced this issue Dec 12, 2022
When baur executes a task and the baur process gets killed, the task subprocess
continues to run. This was reproduced on Linux, on other OSes it was not tested
but they are probably also affected.

Prevent that this can happen by setting Pdeathsig for the executed process.
If the parent thread is killed, the specified signal (SIGKILL) will be sent to
the child.
Pdeathsig is sent when then parent thread dies, to prevent that thread on which
the go-routine ran that started the process dies, runtime.LockOSThread is
called[^1].

This fixes the issue only on Linux and FreeBSD.
Windows & Darwin do not have Pdeathsig in their SysProcAttrs.
To achieve the same on Windows support for job objects in Golang might be
needed[^2].

[^1]: golang/go#27505 (comment)
[^2]: golang/go#17608
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FeatureRequest help wanted NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Windows
Projects
None yet
Development

No branches or pull requests

8 participants