Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plugins should be killed on exit #234

Closed
Mygod opened this issue May 4, 2020 · 39 comments
Closed

Plugins should be killed on exit #234

Mygod opened this issue May 4, 2020 · 39 comments

Comments

@Mygod
Copy link
Contributor

Mygod commented May 4, 2020

https://stackoverflow.com/a/30540177/2245107

@zonyitoo
Copy link
Collaborator

zonyitoo commented May 5, 2020

Yes, they are killed on exit. Plugins are hold in a Plugins structure, so when it is dropped, all child processes are killed.

https://github.com/shadowsocks/shadowsocks-rust/blob/master/src/plugin/mod.rs#L49

@Mygod
Copy link
Contributor Author

Mygod commented May 5, 2020

Hmm on Android they're not being terminated correctly. Maybe need some other mechanisms like signaling? @madeye

@madeye
Copy link
Contributor

madeye commented May 5, 2020

Yeah, we send SIGTERM to shadowsocks-rust, which should be handled correctly.

Let me double check locally.

@zonyitoo
Copy link
Collaborator

zonyitoo commented May 5, 2020

kill() is called when Child is dropped: https://github.com/tokio-rs/tokio/blob/master/tokio/src/process/mod.rs#L691

@madeye
Copy link
Contributor

madeye commented May 5, 2020

I send SIGTERM to sslocal on Linux, the subprocess won't be killed.

It looks we need to handle SIGTERM in shadowsocks-rust properly.

@madeye
Copy link
Contributor

madeye commented May 5, 2020

Steps to reproduce:

sslocal --server-addr 127.0.0.1:6001 --local-addr 127.0.0.1:1080 -k password -m rc4-md5 --plugin v2ray-plugin --plugin-opts "host=example.com"
killall sslocal
ps aux | grep v2ray

@zonyitoo
Copy link
Collaborator

zonyitoo commented May 5, 2020

It works correctly in macOS (launchd) and Debian 9 (with systemd).

@madeye
Copy link
Contributor

madeye commented May 5, 2020

I tested on Ubuntu 18.04... It's interesting that we see different behavior.

@madeye
Copy link
Contributor

madeye commented May 5, 2020

And the output of shadowsocks-rust

2020-05-05T10:52:11.522+08:00 INFO  shadowsocks 1.8.11
2020-05-05T10:52:11.549+08:00 INFO  started plugin "v2ray-plugin" on 127.0.0.1:43065 <-> shcompute-pva3.nvidia.com:6001
2020/05/05 10:52:11 V2Ray 4.22.1 (V2Fly, a community-driven edition of V2Ray.) Custom (go1.13.4 linux/amd64)
2020/05/05 10:52:11 A unified platform for anti-censorship.
2020/05/05 10:52:11 [Warning] v2ray.com/core: V2Ray 4.22.1 started
2020-05-05T10:52:11.751+08:00 INFO  shadowsocks TCP listening on 127.0.0.1:1081
2020-05-05T10:52:29.802+08:00 INFO  received SIGTERM, exiting

@zonyitoo
Copy link
Collaborator

zonyitoo commented May 5, 2020

Ah, unfortunately, yes. v2ray is not killed! I can reproduce that.

@zonyitoo
Copy link
Collaborator

zonyitoo commented May 5, 2020

It should be fixed by this commit.

@zonyitoo
Copy link
Collaborator

zonyitoo commented May 5, 2020

Yeah, I can also reproduce it in Ubuntu 18.04. I should release a new version to fix this bug.

@Mygod
Copy link
Contributor Author

Mygod commented May 5, 2020

According to doc, kill sends SIGKILL but we should ideally want SIGTERM to allow clean ups.

https://stackoverflow.com/a/58156963/2245107

@zonyitoo
Copy link
Collaborator

zonyitoo commented May 5, 2020

That's because std only exposed kill(): https://doc.rust-lang.org/std/process/struct.Child.html#method.kill
Is there an equivilent SIGTERM on windows?

@Mygod
Copy link
Contributor Author

Mygod commented May 5, 2020

Yeah that does not seem to be cross platform... For now, we can just use a Unix-only block to ask the process to terminate first and then kill it after a timeout maybe. Something like this: https://github.com/shadowsocks/shadowsocks-android/blob/7fdcee61216ff35427bf0719d3c542b557ea1f79/core/src/main/java/com/github/shadowsocks/bg/GuardedProcessPool.kt#L93-L108

if unix {
    child.terminate()
    waitForExitOrTimeout(1000)
}
child.kill()

Related: rust-lang/rust#41822

@zonyitoo
Copy link
Collaborator

zonyitoo commented May 5, 2020

This is not an easy task to implement "kill and wait then timeout". Because we are running this in a drop() function, which doesn't have an available Runtime to execute timer and poll.

@zonyitoo
Copy link
Collaborator

zonyitoo commented May 5, 2020

There is a simple solution, pseudocode:

for plugin in &mut self.plugins {
    plugin.terminate();
}
// Blocks process for 500ms
sleep(500ms);
// Kills all of them
for plugin in &mut self.plugins {
    plugin.kill();
}

@Mygod
Copy link
Contributor Author

Mygod commented May 5, 2020

I guess we only terminate plugins on exit/being terminated so blocking the entire process/thread is probably acceptable.

@Mygod
Copy link
Contributor Author

Mygod commented May 5, 2020

Hmm 67de124 waits 500ms unconditionally but clean up might be much faster... Maybe consider some sort of waitpid? https://stackoverflow.com/a/20173592/2245107 (I guess this is kind of complicated and really sets up another "Runtime")

@madeye
Copy link
Contributor

madeye commented May 5, 2020

Verified locally on Android. The latest fix works well.

@zonyitoo
Copy link
Collaborator

zonyitoo commented May 5, 2020

It is unwise to do such a lot of things in drop(). Huh, is there another way...?

@Mygod
Copy link
Contributor Author

Mygod commented May 5, 2020

How about Runtime::block_on or something similar? https://docs.rs/tokio/0.2.20/tokio/runtime/struct.Runtime.html#method.block_on

@zonyitoo
Copy link
Collaborator

zonyitoo commented May 6, 2020

Nope. Runtime is also dropping.

@Mygod
Copy link
Contributor Author

Mygod commented May 6, 2020

Runtime should be dropped after everything else right...? We can also release stuff in Runtime's drop handler I guess?

@zonyitoo
Copy link
Collaborator

zonyitoo commented May 6, 2020

Nope. I tried. Spawning into a panicking Runtime will cause another panic.

@Mygod
Copy link
Contributor Author

Mygod commented May 6, 2020

Hmm if this is really that difficult we can try to clean up the child processes from JVM. A 500ms delay seems undesirable to me.

@madeye
Copy link
Contributor

madeye commented May 6, 2020

I think we'd better handle this in shadowsocks-rust, as it's not the problem only on Android.

Given we already use unsafe code to send SIGTERM through libc::kill, it maybe acceptable to use libc::waitpid as well.

@zonyitoo
Copy link
Collaborator

zonyitoo commented May 6, 2020

Yes, it should be a problem to be handled in shadowsocks-rust.

It seems that waitpid is the most simple option.

for plugin in &self.plugins {
    let mut status: libc::c_int = 0;
    libc::waitpid(plugin.id(), &mut status, 0);
}

Another option is to use sigtimedwait to wait for SIGCHLD. But we have lots of subprocesses, is it guaranteed that master process could receive all SIGCHLD even if it is not waiting on sigtimedwait?

A very tricky solution is to use alarm() to trigger a EINTR. Hmm..

@zonyitoo
Copy link
Collaborator

zonyitoo commented May 6, 2020

How about this.

@Mygod
Copy link
Contributor Author

Mygod commented May 6, 2020

I see the current solution is a busy spin -- sigtimedwait would probably be better but no hurries. 👍

@zonyitoo
Copy link
Collaborator

zonyitoo commented May 7, 2020

sigtimedwait cannot specify pid. That's why I didn't use it.

@Mygod
Copy link
Contributor Author

Mygod commented May 7, 2020

Well the code I quoted simply spawns (or actually reuse) another thread that runs waitpid and wait with timeout for it to finish.

@zonyitoo
Copy link
Collaborator

zonyitoo commented May 7, 2020

That is probably slower. I tested on my laptop and v2ray-plugin processes would be exited in about 2ms. Busy waiting is not a bad idea.

@Mygod
Copy link
Contributor Author

Mygod commented May 7, 2020

Yeah I guess I will take this fix for now. Thanks for bearing with me! 🤣

@Mygod
Copy link
Contributor Author

Mygod commented May 8, 2020

Maybe the way to do it is instead to catch SIGINT/SIGTERM signal, perform clean ups when Runtime is still functioning, and then shutdown?

@zonyitoo
Copy link
Collaborator

zonyitoo commented May 9, 2020

If any panic! occurs, Plugins instance will be destructed.

@Mygod
Copy link
Contributor Author

Mygod commented May 9, 2020

Well we can always use another macro for panic but you are right...

@zonyitoo
Copy link
Collaborator

zonyitoo commented Jul 5, 2020

Some related discussions about Asynchronous Destruction.

@Mygod
Copy link
Contributor Author

Mygod commented Jul 5, 2020

I think scoped concurrency might help as that way we can destruct objects with blocking the runtime before Runtime destructs itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants