Ensure child creation time is after parent creation time #1

nebel · 2023-10-01T14:38:43Z

Ensure child creation time is after parent creation time to avoid linking unrelated processes into the kill chain (caused by reused PPids).

See jesseduffield/lazygit#3008

I've marked this as a draft because I've never used go before and wasn't able to test with lazygit, but running this code in isolation seemed to fix the issue for me.

…king via reused PPids

Jim-with-a-J · 2023-10-01T20:55:29Z

kill_windows.go

+	if err != nil {
+		return
+	}
+


Should there be a defer CloseHandle(handle) here? I think this might leak otherwise.

Thanks for catching this! I added a call to closeHandle(HANDLE(handle)) to reuse the existing function. I also noticed that windows.CloseHandle is available with the new import, though it can also return an error. Not sure if that way would be better or not.

I'm still not even a beginner with Go, so I'm not qualified to comment on best practice :) The only thing that stands out to me is that you never return anything explicitly, while the rest of the code does. I assume that's purely a stylistic choice and Go does something sensible with the returns (editing previous dumb comment, does it create local variables with the names in the trailing part of the function definition and return those?)

I assume that's purely a stylistic choice and Go does something sensible with the returns (editing previous dumb comment, does it create local variables with the names in the trailing part of the function definition and return those?)

Yes, I was also surprised at how this works but the names before the types in the function return type are local variables which will be used as return values when doing a blank return. Seems to be commonly used to avoid having to supply the full (ok, err) tuple at each return, and is used by GetProcs in this code for example. But I'm certainly not qualified to comment on best practices either.

for the record I'm not a fan of naked returns (when you specify the variable name as part of the function signature) because they confuse everybody (myself included) who comes across them. I am aware this is my repo so it's strange that the GetProcs function uses naked returns but in truth I hadn't formed this opinion before the PR was originally made (it wasn't made by me)

jesseduffield · 2023-10-02T11:29:39Z

I'm not much of a windows guy (as you can tell by the fact that the existing code kills random processes) but what are your thoughts on the approach taken here? https://gist.github.com/hallazzang/76f3970bfc949831808bbebc8ca15209?permalink_comment_id=2948162#gistcomment-2948162

Jim-with-a-J · 2023-10-02T12:00:51Z

I'm not much of a windows guy (as you can tell by the fact that the existing code kills random processes) but what are your thoughts on the approach taken here? https://gist.github.com/hallazzang/76f3970bfc949831808bbebc8ca15209?permalink_comment_id=2948162#gistcomment-2948162

Using job objects does seem a lot cleaner - children are added to the job by default if the parent is, and you don't have to walk through the processes yourself. I had a go (ha!) at doing something like this, but I kept looking for a way to use cmd.process.handle, which is not accessible (I see the gist code "cheats" with unsafe operations to use that, while @nebel takes a more API-appropriate path and gets the handle again). It's a bit unfortunate that there needs to be some co-operation between the CreateProcess code and the killing code to support this though. With the spawning and killing handled in separate modules, it seems a bit ugly.

I also wonder if it's possible to have a race condition this way - if a child is spawned before the new process gets added to the job group, it might never get terminated. With the time it takes to spawn a process, this is probably not a huge concern though.

My thought was to start the process suspended until the job object was created and the new process added, and also use the PID to generate a job name (e.g. jduffield_kill_[pid]) so at least the kill interface didn't need to change. It could try to open a job object with that name, and if it failed fall back on the existing behaviour.

nebel · 2023-10-02T22:02:56Z

@Jim-with-a-J pretty much summed up my thoughts as well. I actually did test the code in the gist you linked, @jesseduffield, and confirmed that it does work--with the rather large caveat that to kill the process tree we don't actually invoke Kill (which works the same as usual for processes even in such a job) but rather need to call windows.CloseHandle on the job handle (which will kill the entire tree).

In addition to the race condition Jim mentioned (which I was also concerned about, however unlikely it seems) there is also the fact that in Windows 7 and earlier a process can't be in multiple jobs. I don't know if we're concerned about supporting Windows 7 but for example if lazygit itself was in a job (which is actually quite possible as the terminal could be in a job), trying to add an additional job to a child process of lazygit would fail. So in that case we might need to fall back to the old kill method.

But my main concern is just the overhead in additional code of doing things the correct way, as Jim mentioned. Probably we would need to wrap the mechanisms for both launching (to set the job) and killing (to close the job handle) in OS-specific code, and pass around e.g. a WrappedCmd which contains the necessary job handle on Windows. It's certainly not impossible but it's no longer a library-only change and would necessitate some larger changes in lazygit itself.

That said, I could try to implement it if you're interested. Being a novice at Go I don't know if my design would be the best, though, and it's also possible we'd need to keep this time-checking tree-diving version in here as a fallback for when job creation fails for some reason.

Let me know what you think.

Jim-with-a-J · 2023-10-02T22:38:14Z

Is there any way to know how many other projects are using the kill library? Since co-operation is needed from the calling code (and there's still the potential for the job object approach to fail), I think probably this approach using the running time should still be added as the default behaviour. Any client that doesn't want to/can't immediately switch to using job objects doesn't have to, but they still benefit from not killing unrelated processes, and some extra time can be taken to figure out the best way to introduce job objects for Windows.

nebel · 2023-10-03T06:19:16Z

Here's an example of what an API for using jobs to kill the process tree might look like:

https://gist.github.com/nebel/ff407aa88e5a7b261f5d3c673340a884

It will fall back to the current kill approach if job creation or adding processes to a job fails.

Running the test will run cmd.exe from which you can run notepad.exe manually to check how things are being killed. And you can uncomment out various kill strategies here:

	//go waitKill(wrapped.cmd)
	//go waitKillTree(wrapped.cmd)
	go waitTerminate(&wrapped)

From my testing both waitKillTree and waitTerminate will terminate both processes while waitKill (simple kill) will not, for example.

I'm not sure what the non-Windows alternative would do. Should we set Setpgid? I noticed the current function to do that in the Linux side is not used, but I don't know why.

I will say though that I still don't particularly like how unwieldy this approach is, but if it's preferable then I think it could be used. Again though we'd still need to fix the existing kill method as we probably need it as a fallback.

jesseduffield · 2023-10-11T11:52:53Z

Great progress! Sorry, I haven't had much time to dive into the details here. But I have just now done some more googling to see if we've missed anything. Some assorted links:

Go issue about support windows job groups:
golang/go#17608

How buildkite handles killing child processes in windows:
buildkite/agent#879

Pulumi dealing with the same issue:
pulumi/pulumi#8696

Situation in lazygit where we failed to kill processes due to a race condition:
jesseduffield/lazygit#1265 (comment)

Also for the record I'm happy to ignore windows 7; I doubt many people are still using it. I'm also willing to require a change in lazygit in order to have the best outcome, though it is regrettable.

nebel · 2023-10-11T15:54:01Z

Thanks for researching. I took a look at those issues. At first the buildkite method seemed good since it works based on the PID alone (no need for wrapping) but after testing it, it seems like it relies on all subprocesses reacting to the break and shutting themselves down, which may work but I don't know if it's correct to rely on it. It's really more about broadcasting a signal to so- called console processes rather than actually killing them.

The pulumi situation is also a little different in that they control their subprocesses so they just ended up polling them via gRPC I guess.

At this point I feel like the job solution with a fallback of killing the tree "manually" (but checking process creation time) is probably still the most robust solution, even though it does mean having a wrapper for cmd.

I also saw this library https://github.com/kolesnikovae/go-winjob which has examples in which they take the extra step of starting a process suspended then resuming after the job is set, in order to guarantee it can't exit or spawn new processes before the job is assigned. This is probably an even more robust approach, but also increases the complexity further (not so much in the suspend part but certainly in the resume part).

nebel · 2023-10-11T16:02:52Z

I would also like to add that this PR in its current state, while not a perfect solution, should dramatically improve the upon the current situation. At least it should make the "killing unrelated long-running processes" problem extremely unlikely--without making any other existing problems any worse.

sibouras · 2024-03-25T01:31:43Z

any news on this

nebel · 2024-10-04T05:38:40Z

Note that I have been using a custom build including these changes on Windows for several months without issue.

Ensure child creation time is after parent creation time to avoid lin…

a1db4b6

…king via reused PPids

nebel mentioned this pull request Oct 1, 2023

Lazygit terminating unrelated processes jesseduffield/lazygit#3008

Open

Jim-with-a-J reviewed Oct 1, 2023

View reviewed changes

Ensure handle is closed in GetCreationTime

9dc1e4d

jesseduffield mentioned this pull request Oct 11, 2023

Hold down arrow key to switch file, diff window stop render. jesseduffield/lazygit#1265

Closed

nebel marked this pull request as ready for review October 4, 2024 05:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure child creation time is after parent creation time #1

Ensure child creation time is after parent creation time #1

nebel commented Oct 1, 2023 •

edited

Loading

Jim-with-a-J Oct 1, 2023

nebel Oct 2, 2023

Jim-with-a-J Oct 2, 2023 •

edited

Loading

nebel Oct 2, 2023 •

edited

Loading

jesseduffield Oct 2, 2023

jesseduffield commented Oct 2, 2023

Jim-with-a-J commented Oct 2, 2023

nebel commented Oct 2, 2023 •

edited

Loading

Jim-with-a-J commented Oct 2, 2023 •

edited

Loading

nebel commented Oct 3, 2023

jesseduffield commented Oct 11, 2023

nebel commented Oct 11, 2023 •

edited

Loading

nebel commented Oct 11, 2023 •

edited

Loading

sibouras commented Mar 25, 2024

nebel commented Oct 4, 2024

Ensure child creation time is after parent creation time #1

Are you sure you want to change the base?

Ensure child creation time is after parent creation time #1

Conversation

nebel commented Oct 1, 2023 • edited Loading

Jim-with-a-J Oct 1, 2023

Choose a reason for hiding this comment

nebel Oct 2, 2023

Choose a reason for hiding this comment

Jim-with-a-J Oct 2, 2023 • edited Loading

Choose a reason for hiding this comment

nebel Oct 2, 2023 • edited Loading

Choose a reason for hiding this comment

jesseduffield Oct 2, 2023

Choose a reason for hiding this comment

jesseduffield commented Oct 2, 2023

Jim-with-a-J commented Oct 2, 2023

nebel commented Oct 2, 2023 • edited Loading

Jim-with-a-J commented Oct 2, 2023 • edited Loading

nebel commented Oct 3, 2023

jesseduffield commented Oct 11, 2023

nebel commented Oct 11, 2023 • edited Loading

nebel commented Oct 11, 2023 • edited Loading

sibouras commented Mar 25, 2024

nebel commented Oct 4, 2024

nebel commented Oct 1, 2023 •

edited

Loading

Jim-with-a-J Oct 2, 2023 •

edited

Loading

nebel Oct 2, 2023 •

edited

Loading

nebel commented Oct 2, 2023 •

edited

Loading

Jim-with-a-J commented Oct 2, 2023 •

edited

Loading

nebel commented Oct 11, 2023 •

edited

Loading

nebel commented Oct 11, 2023 •

edited

Loading