-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deadlocks on NetBSD around fork() #76600
Comments
We do consider UB in the standard library a bug, but we can only fix it if you point out where the issue is. "rewrite the |
There is one known async signal safety issue in the standard library, the use of EDIT: There is one more issue, Command is using execvp which isn't asnyc-signal-safe, unlike variants which don't need to know PATH. |
I can reproduce a deadlock on NetBSD; not including a reproducer, since it seems any trivial one will do. Backtrace generally shows The problem persisted after I patched calls to functions that aren't signal safe (removed the call to The problem persisted after I rewrote the test case in C: #include <stdio.h>
#include <assert.h>
#include <pthread.h>
#include <sys/types.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
#define N 4
static pthread_t threads[N];
static void spawn() {
pid_t p;
p = fork();
assert(p != -1);
if (p == 0) {
_exit(0);
} else {
pid_t r;
int wstatus;
r = waitpid(p, &wstatus, 0);
assert(r != -1);
assert(r == p);
}
}
static void *run(void *arg) {
for (int i=0; i != 10; ++i) {
spawn();
}
return NULL;
}
int main() {
int r;
for (int i=0; i != N; ++i) {
r = pthread_create(threads +i, NULL, run, NULL);
assert(r == 0);
}
for (int i=0; i != N; ++i) {
r = pthread_join(threads[i], NULL);
assert(r == 0);
}
return 0;
} |
So, doesn't this mean that this is not a Rust bug but a NetBSD bug?
It would still be helpful to have one. Little details often matter surprisingly much. |
What is that based on? I checked the manpage and couldn't find information about signal safety of the various |
|
Backtrace of child process after deadlock:
Potentially related NetBSD bug report https://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=49816 The |
This matches pretty well with one of the backtraces I've seen with
the hangs I had observed earlier have so far not re-surfaced with this particular program. So... Both this and the C-based reproducer earlier appears to indicate that this may be a problem in NetBSD and not in rust. However, I think the C-based reproducer above has a different root cause. On my 9.0/amd64 test system, it didn't result in a hang with N=4, but did with N=400, resulting in one zombie, and a number of threads either in "wait" or in "parked" states. |
I updated NetBSD from 9.0 to latest daily snapshot. The previous problems didn't reproduce so far, but I encountered another problem and reduced it to a program in C. Attaching / detaching debugger continues the execution, so that seems to be some form of missed notification. C source#include <stdio.h>
#include <assert.h>
#include <pthread.h>
#include <sys/types.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
#define N 4
static pthread_t threads[N];
static void spawn() {
pid_t p;
p = fork();
assert(p != -1);
if (p == 0) {
_exit(0);
} else {
pid_t r;
int wstatus;
r = waitpid(p, &wstatus, 0);
assert(r != -1);
assert(r == p);
}
}
static void *run(void *arg) {
int r;
pthread_attr_t attr;
pthread_t current = pthread_self();
r = pthread_getattr_np(current, &attr);
assert(r == 0);
r = pthread_attr_destroy(&attr);
assert(r == 0);
for (int i=0; i != 10; ++i) {
spawn();
}
return NULL;
}
int main() {
int r;
for (int i=0; i != N; ++i) {
r = pthread_create(threads +i, NULL, run, NULL);
assert(r == 0);
}
for (int i=0; i != N; ++i) {
r = pthread_join(threads[i], NULL);
assert(r == 0);
}
return 0;
} Backtrace
EDIT: This one can be reproduced without fork. EDIT: Similar problems discussed on current-users mailing list: |
Should this be closed, then? |
To the best of my knowledge, |
@tmiasko I had someone run your C program on NetBSD-current (not just NetBSD-9 stable), and there apparently the problem with stuck threads / processes was not reproducible, so there is hope that this bug won't be there when NetBSD 10 is released. |
I can reproduce the last issue on daily snapshot from 2020-09-18, usually after just a few seconds of executing program in the loop. If you do reproduce it at some point, I would appreciate if you could report this upstream, since I generally don't use NetBSD. Simplified reproducer, since it turns out neither fork nor pthread_getattr_np is required, except for the fact that latter is allocating memory: #include <assert.h>
#include <pthread.h>
#include <stdlib.h>
#define N 4
static pthread_t threads[N];
static void *run(void *arg) {
return malloc(1024);
}
int main() {
for (int i = 0; i != N; ++i) assert(pthread_create(&threads[i], NULL, run, NULL) == 0);
for (int i = 0; i != N; ++i) assert(pthread_join(threads[i], NULL) == 0);
} |
I'm going to close this, please report it upstream as a NetBSD issue. It's neither related to rust nor |
@tmiasko I reported http://gnats.netbsd.org/55670 -- this is apparently caused by concurrency bugs in our "jemalloc" implementation in the netbsd-9 code base. Reportedly this is fixed in NetBSD-current, and will be in NetBSD 10.0 when that comes out. |
I do realize this is likely to be ignored, but I just cannot let go of bringing it up anyway...
It appears to me that rust in certain circumstances is relying on undefined behavior, in that it is in general a multi-threaded program, and will in many cases do fork(), and perform many non-trivial tasks between fork() and exec(). Some of these things have manifested themselves in run-time problems observed on NetBSD: (detected) deadlocks in ld.elf_so (resulting in abort), deadlocks related to malloc() manifesting as hangs etc. In NetBSD, the fork() man page contains this passage:
A Linux/Debian man page for fork(2) also contains a similar passage:
Even though I don't point to offending code here, I have it on good authority that this is the root cause of the issues we have been observing on NetBSD. Is there any chance that the rust code could have a make-over so as to not rely on undefined behavior in this aspect?
The text was updated successfully, but these errors were encountered: