-
Notifications
You must be signed in to change notification settings - Fork 507
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deadlock? when using par_bridge() #690
Comments
I wonder if |
I'll open to suggestions... though I'm not sure I understand exactly how I would do that. Implementing Behind the scenes If you want to do expensive processing on each directory entry then you probably want per entry parallelism instead of per directory. I think best use of
|
I confirm I'm hitting this problem on Linux as well. A directly working par_bridge would be much better. |
It's no ideal, but I think you can avoid lockup now by adding:
To jwalk builder. |
If I may have mentioned this before, but I think we probably need some kind of "critical section" primitive in rayon-core to let a thread block without work-stealing. We would use this in |
Yes, I also found 2 separate pools are mandatory, because even with an unbounded channel, the receiver can block and all rayon threads could get locked on receiving end, and there would be no threads left for jwalk, hence it would deadlock forever. Unfortunately 2 pools are not good from perspective of system performance, because they add context switching between the producer and consumer sides. I noticed a big bounded channel (> 64k items) helps for performance at the expense of memory use. I still would like to be able to do that all in a single rayon pool, but this looks surprisingly complex. |
My current solution: pub fn walk_dirs(paths: Vec<PathBuf>, opts: WalkOpts) -> impl ParallelIterator<Item=PathBuf> {
let (tx, rx) = sync_channel(65536);
// We need to use a separate rayon thread-pool for walking the directories, because
// otherwise we may get deadlocks caused by blocking on the channel.
let thread_pool = Arc::new(
ThreadPoolBuilder::new()
.num_threads(opts.parallelism)
.build()
.unwrap());
for path in paths {
let tx = tx.clone();
let thread_pool = thread_pool.clone();
thread::spawn(move || {
WalkDir::new(&path)
.skip_hidden(opts.skip_hidden)
.follow_links(opts.follow_links)
.parallelism(Parallelism::RayonExistingPool(thread_pool))
.into_iter()
.for_each(move |entry| match entry {
Ok(e) if e.file_type.is_file() || e.file_type.is_symlink() =>
tx.send(e.path()).unwrap(),
Ok(_) =>
(),
Err(e) =>
eprintln!("Cannot access path {}: {}", path.display(), e)
});
});
}
rx.into_iter().par_bridge()
} |
|
I hope #997 will fix this, but I would appreciate folks here testing that. |
I can confirm this issue is fixed when upgrading to rayon-core 0.11 and jwalk 0.7. Upgrading both is necessary. Didn't test older rayon-core where this was supposedly fixed (0.10.2) |
Thanks for confirming! |
I'm using rayon 1.2.0 on Windows 10 and rustc 1.36.0 stable. I'm not sure how to best report this bug and I don't even know if it is a bug in rayon at all, but I thought it's better when someone has a look at this. I can provide additional information as requested.
I've written the following code:
I'm using jwalk 0.4.0; it uses rayon internally.
If I run this code on a big folder (
C:\Users\Home
orC:\
) it sometimes hangs indefinitely in thepar_bridge.rs
. It tries to aquire the lock on line 165, but always continues with theErr(TryLockError::WouldBlock)
match arm.The text was updated successfully, but these errors were encountered: