-
Notifications
You must be signed in to change notification settings - Fork 1.6k
PVF worker: Mitigate process leaks #7296
base: master
Are you sure you want to change the base?
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -82,6 +82,15 @@ pub fn worker_event_loop<F, Fut>( | |
F: FnMut(UnixStream) -> Fut, | ||
Fut: futures::Future<Output = io::Result<Never>>, | ||
{ | ||
// Use `PR_SET_PDEATHSIG` to ensure that the child is sent a kill signal when the parent dies. | ||
// | ||
// NOTE: This technically has a race as the parent may have already died. In that case we will | ||
// fail to read the socket later and just shutdown. | ||
#[cfg(linux)] | ||
if libc::prctl(libc::PR_SET_PDEATHSIG, libc::SIGKILL, 0, 0, 0) != 0 { | ||
return | ||
} | ||
|
||
let worker_pid = std::process::id(); | ||
gum::debug!(target: LOG_TARGET, %worker_pid, "starting pvf worker ({})", debug_id); | ||
|
||
|
@@ -93,7 +102,7 @@ pub fn worker_event_loop<F, Fut>( | |
%worker_pid, | ||
"Node and worker version mismatch, node needs restarting, forcing shutdown", | ||
); | ||
kill_parent_node_in_emergency(); | ||
kill_parent_node_in_emergency(worker_pid); | ||
mrcnski marked this conversation as resolved.
Show resolved
Hide resolved
|
||
let err: io::Result<Never> = | ||
Err(io::Error::new(io::ErrorKind::Unsupported, "Version mismatch")); | ||
gum::debug!(target: LOG_TARGET, %worker_pid, "quitting pvf worker({}): {:?}", debug_id, err); | ||
|
@@ -174,20 +183,29 @@ pub fn stringify_panic_payload(payload: Box<dyn Any + Send + 'static>) -> String | |
} | ||
} | ||
|
||
/// In case of node and worker version mismatch (as a result of in-place upgrade), send `SIGTERM` | ||
/// to the node to tear it down and prevent it from raising disputes on valid candidates. Node | ||
/// restart should be handled by the node owner. As node exits, unix sockets opened to workers | ||
/// get closed by the OS and other workers receive error on socket read and also exit. Preparation | ||
/// jobs are written to the temporary files that are renamed to real artifacts on the node side, so | ||
/// no leftover artifacts are possible. | ||
fn kill_parent_node_in_emergency() { | ||
/// In case of node and worker version mismatch (as a result of in-place upgrade or | ||
/// incorrectly-built binaries), send `SIGTERM` to the node to tear it down and prevent it from | ||
/// raising disputes on valid candidates. Node restart should be handled by the node owner. As node | ||
/// exits, unix sockets opened to workers get closed by the OS and other workers receive error on | ||
/// socket read and also exit. Preparation jobs are written to the temporary files that are renamed | ||
/// to real artifacts on the node side, so no leftover artifacts are possible. | ||
fn kill_parent_node_in_emergency(worker_pid: u32) { | ||
unsafe { | ||
// NOTE: On non-Linux this has a race condition between getting the pid and sending the | ||
// signal -- the parent may have died and another process been assigned the same pid. On | ||
// Linux this is not a problem -- we use `PR_SET_PDEATHSIG` to ensure that the child is sent | ||
// a kill signal when the parent dies. | ||
// | ||
// SAFETY: `getpid()` never fails but may return "no-parent" (0) or "parent-init" (1) in | ||
// some corner cases, which is checked. `kill()` never fails. | ||
let ppid = libc::getppid(); | ||
if ppid > 1 { | ||
libc::kill(ppid, libc::SIGTERM); | ||
} else { | ||
gum::error!(target: LOG_TARGET, %worker_pid, "unexpected ppid {}", ppid); | ||
} | ||
|
||
libc::kill(worker_pid as i32, libc::SIGKILL); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Then why do it? Not saying we should not, but there should be some reason. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The idea was just to be explicit, and so we don't rely on coincidental architectural details that may change. But, maybe we should instead have it be explicit that the worker dies when it can't read from the socket. The parent may also die for some reason other than us killing it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would say, if we already die when read fails, let's leave it at that, but document that error handling accordingly. |
||
} | ||
} | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AKA, this isn't doing what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, hmm, I think since threads have their own pid this still addresses the race identified in this PR, I think.
So the issue moves up a level, i.e. it becomes about avoiding leaked threads in the parent as opposed to leaked/orphaned child processes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure which race you now mean, the one in the comment above or that we don't kill workers on node exit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean the race between getting the pid and sending the kill signal is addressed even with the caveat you posted from
man
.And as you pointed out threads will exit when the process does, so the caveat doesn't actually change anything. So I think
PR_SET_PDEATHSIG
works.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we spawn this on a blocking pool?
But yeah it would need to be documented that this is a hard requirement now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry @slumber I'm not really following - maybe my brain is running slow, can you explain more? I'm also not sure we should document implementation details too much,
PR_SET_PDEATHSIG
should work regardless of how the host is implemented.I also removed some of the comments for now. Maybe documenting such rare corner cases was overkill, it just leads to noise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I honestly don't know.
Basti quoted documentation mentioning that your prctl call will terminate worker once parent thread terminates, not the node itself. Suppose pvf-future spawns worker and moves to a different tokio thread (tokio workers steal work). This thread terminating would mean that a worker would get killed, and, as Basti mentioned, you're not doing what you wanted to.
Then I noted that pvf future in fact is spawned on a blocking pool. Essentially,
tokio::spawn_blocking(block_on(pvf_host))
. Given this, your PR works. This thread terminating is equivalent of terminating pvf host, which is sufficient for worker termination. If we replacespawn_blocking
withspawn
, your PR doesn't work anymore, this should be documented.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, you are correct. I'm not sure how I feel about relying on it being a blocking pool, since that fact can change.
Does work-stealing mean that the thread that spawned work can actually die while the work is still active? That would surprise me since it seems unsound (precisely for cases like
PR_SET_PDEATHSIG
), but TBH I don't know how tokio's scheduler works under the hood.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Work stealing means each worker thread has its own queue and they steal tasks from each other, "work stealing" is a term used by golang scheduler and tokio, you can read about it.
The main point is that task can move to another thread. For example, pvf future is on thread A and blocked, thread A steals another future and panics.