Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spawn is badly named #743

Open
Timmmm opened this issue Apr 2, 2020 · 12 comments
Open

spawn is badly named #743

Timmmm opened this issue Apr 2, 2020 · 12 comments

Comments

@Timmmm
Copy link

Timmmm commented Apr 2, 2020

We just had a difficult to diagnose deadlock bug essentially because spawn() is badly named. The situation was something like this:

rayon.spawn(|| {
   let r, s = unbounded();
   rayon.spawn(|| {
      let answer = do_some_slow_work();
      s.send(answer);
   });
   r.recv();
});

Normally this worked fine! But on some systems it seemed to get stuck. We eventually figured out that spawn(), despite its name does not actually spawn a thread. It schedules a task! On systems where the thread pool had 2 or more threads, both r.recv() and do_some_slow_work() could progress, but when we happened to run on a system where the thread pool only had one thread we could sometimes get to r.recv() and block forever so do_some_slow_work() never had a chance to run.

What's the solution? Don't use spawn() because it doesn't spawn a thread. Instead use spawn() which does spawn a thread! Pretty confusing naming right?

Here is the patch to fix the bug we had - hopefully you can why this was confusing!

-  rayon::scope(|s| {
+  crossbeam::scope(|s| {
   ...
-        rayon::scope(|s| {
+        crossbeam::scope(|s| {
   ...
-        });
+        }).unwrap();
   ...
-        rayon::scope(|s| {
+        crossbeam::scope(|s| {
   ...
-        });
+        }).unwrap();
   ...
-  });
+  }).unwrap();

I suggest something like execute(), schedule(), queue(), add_task(), etc.

@cuviper
Copy link
Member

cuviper commented Apr 2, 2020

I'm sorry for the confusion!

We eventually figured out that spawn(), despite its name does not actually spawn a thread. It schedules a task!

Yes -- rayon::spawn is documented that it "Fires off a task into the Rayon threadpool," while Scope::spawn says that it "Spawns a job into the fork-join scope." Neither case ever starts new threads, which seems normal to me for a thread pool, but I acknowledge that this wasn't obvious to you.

For comparison, the threadpool and scoped_threadpool crates call their similar methods execute (1, 2), and maybe that is a nicer name to avoid confusion. On the other hand, tokio and async-std both use spawn (3, 4) to create new tasks. I feel like rayon is more similar to the latter, actually, since its jobs can execute with arbitrary nesting and work-stealing in the pool, versus the former threadpools that dedicate a thread to just one specific execution at a time.

If we do change, we don't want to break the 1.x API with a rename/removal, so we could only deprecate the spawn* names in favor of something new.


In general, blocking a rayon job on the result from another rayon job is a dangerous thing to do. It's fine if that blocking is controlled by the pool itself, like a rayon::join call, but it's problematic for something like your channel send/receive pair. Rayon has no idea about your dependency between the two, and work-stealing can end up inverting these in the call stack, like issue #592 with mutexes.

@mratsim
Copy link

mratsim commented Jun 24, 2020

Just a remark from multithreading in other languages:

The spawn construct is inherited from Cilk and it by convention means that a task may be scheduled on another thread but it is not guaranteed.

What you are looking for is a threadpool for blocking tasks while Rayon is a scheduler for compute tasks.

The difference between a blocking task and a compute task is that a compute task must be actively scheduled to make progress, unlike reading for stdin for example.

I.e. spawn is used like Rayon uses in:

  • Cilk
  • Intel TBB
  • Julia
  • Nim
  • HPX
  • Kokkos
  • ...

Regarding your bug per se, this should be solved by Rayon futures (i.e. Rayon supporting sync from the spawn/sync Cilk/Task parallelism paradigm)

@Timmmm
Copy link
Author

Timmmm commented Jun 24, 2020

@mratsim Yes I understand the difference. But I think it is much clearer to talk about scheduling a task, and spawning a thread. That is the common lingo in my experience, in spite of the languages/libraries you listed naming things weirdly. Here are some examples of people talking about spawning threads: A, B, C. Here are some examples of thread pool schedule() APIs: A, B. "Schedule" may not be the best verb to be honest, but "spawn" is definitely misleading given how commonly it is used to refer to spawning a thread.

Rust itself seems to have got it right: std::thread::spawn() spawns a thread; it doesn't schedule a task.

@mratsim
Copy link

mratsim commented Jun 25, 2020

Cilk was created in 1995 and is the foundational research behind work-stealing schedulers:

They use spawn/sync for tasks: http://supertech.csail.mit.edu/papers/PPoPP95.pdf

pthread C11 and C++11 use create/join: https://en.cppreference.com/w/cpp/thread/thread

In my opinion, Rust standard library made a mistake in using spawn for threads. It will introduce confusion with other system languages.

The new Executor draft for C++ also use spawn for executors:

@RReverser
Copy link
Contributor

Would it be possible to add a method to Rayon that actually does spawn a new thread and not use a threadpool? I realise in most cases this would be equivalent to std::thread::spawn, but not when a custom thread handler is used (e.g. on WebAssembly build). In those cases it would be nice to be able to reuse Rayon configuration and create a new thread via the installer spawn handler.

@cuviper
Copy link
Member

cuviper commented Oct 7, 2020

@RReverser I don't think there's a way for us to keep the spawn_handler. That closure is only constrained as FnMut and we only keep it in the builder, but it would need Send + Sync + 'static to be a trait object in the ThreadPool or Registry.

@RReverser
Copy link
Contributor

@cuviper Hmm, yeah, I see. I suspect most custom spawn_handler out there already fulfill this constraint, but adding it to the signature would be still a breaking change. It's a shame, because it could have a potential of nicely fixing a class of common deadlocks when using Rayon.

@cuviper
Copy link
Member

cuviper commented Oct 7, 2020

Well, the provided build_scoped is an example that wouldn't be 'static, at least, and this is even used in rustc*.

*(rustc really uses a rustc-rayon fork, not rayon directly)

@RReverser
Copy link
Contributor

I wonder if there's a generic param we could add to ThreadPool, similar to one added in ThreadPoolBuilder, that would be backward compatible but allow to spawn threads if the used closure matches those constraints.

@cuviper
Copy link
Member

cuviper commented Oct 7, 2020

Is there a reason you can't or don't want to store an independent thread spawner for your case? It seems out of scope to me to have ThreadPool spawn threads that wouldn't actually participate in the pool.

@RReverser
Copy link
Contributor

Is there a reason you can't or don't want to store an independent thread spawner for your case?

The reason is that, on the library side, I'm not in control of the thread spawner, I'm only using rayon APIs and leaving it up to the actual app to decide whether it wants to use a regular pool, or to build_global() its own. As such, I can't retrieve and store it anywhere.

I guess it would be possible to add spawn handler as an option of the library itself, but then it's the user who needs to pass it to several different places rather than preconfigure just once, and seems like a poor DX.

@BenjaminLiuPenrose
Copy link

In my opinion, Rust standard library made a mistake in using spawn for threads. It will introduce confusion with other system languages.

Just ran into an issue related to misleading spawn naming. I agreed that Rust library and many other languages may pick a wrong name for spawn function. However, it is more likely Rust user/audience with no prior knowledge on clik will find the spawn naming confusing here. It is advised to name this function as execute as opposed to spawn, and maybe change it back to spawn when Rust and main language indeed change thread::spawn naming.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants