-
Notifications
You must be signed in to change notification settings - Fork 7.3k
child_process.spawn/exec blocks main thread while spawning child process #9250
Comments
If it works for your use-case, you could try: https://github.com/davepacheco/node-spawn-async |
@davepacheco that's pretty cool. I'm going to close this and say that if you need to spawn a lot of processes, maybe you should use something like node-spawn-async. If I'm wrong, we can revisit this. |
I'll have to try node-spawn-async. It looks like it just uses 1 additional thread, so I'm not sure that will help the big picture because that thread will just end up blocking. I think it would help keep other requests that don't spawn child_process from being blocked on the main event loop though. It hasn't been updated in 2 years so that also concerns me a bit, but it's worth a shot. Even so, are there any significant reasons why the standard child_process module shouldn't or can't be improved to prevent blocking the main loop? It seems like spawning a child process is a common way to handle computationally intensive tasks and that it is significantly bottlenecked by the rate at which child processes can be spawned by one thread. |
From an API perspective, I think it'd be reasonable for Node to provide non-blocking APIs here. I don't know how challenging that is from an implementation perspective. That said, forking and exec'ing are relatively heavyweight operations. I don't think it's a good idea to fork/exec at high rates as part of normal operation. (Besides the performance implications, it's often challenging to build robust argument passing, error handling, and error reporting for shell-like use.) spawn-async exists (and uses only a single worker process) in order to avoid latency bubbles for occasional forks (i.e., once/second or less), not to maximize throughput of forks/execs. As for its age: we've been using it in production at Joyent as part of the Manta service continuously since the module was created, and we do a few tens of thousands of spawns with it per day. It's not that it's abandoned -- it's just that it's basically done for what we wanted from it. |
Thanks for the insight into spawn-async! I agree that it's not ideal to fork very often - for example on every request. This is pretty conflicting, though. Node only provides access to a single thread in user-space, so to prevent blocking the main thread and utilize multi-core machines for CPU-intensive tasks, you have only two main options - spawn a child process, or write a server (http/unix socket) in another language and offload the work there. Spawning a child process has been shown to be too expensive to do hundreds of times concurrently, so one of the two options is basically out of the question in a production environment Node is obviously great for applications that spend a lot of time waiting on I/O. However, in the long term I don't think "use something else" is a great long-term answer for doing CPU intensive tasks, especially when Javascript happens to be one of the faster scripting languages. I'd like to take advantage of that. |
Forking isn't off the table for increased parallelism. You're just much better off by forking worker processes at startup and then not forking during request handling. This isn't really very different from multi-process models (like apache prefork) or even multi-threaded models (like thread pools). In all of these cases, you're much better off amortizing the cost of creating the workers (whether they're processes or threads) across a large number of requests. The built-in cluster module, which admittedly has its flaws, provides a pattern for doing this. Outside of the cluster module, we use the pattern of forking (ncpus) worker processes for each logical Node service and then either fronting those processes with haproxy or else having clients know about all of the workers and load-balancing on the clients. You're right that these are important considerations, and they're not trivial. But I don't think Node is intrinsically any worse off here than anything else. |
Responses to queued clients are currently blocked on the current child_process.spawn() invocation. See nodejs/node-v0.x-archive#9250
Offloading expensive computations to other processes and cores using
spawn
orexec
blocks the main thread for significant portions of time. Obviously the spawned process is not blocking node's event loop while it is running, but the act of spawning the process within node seems to be expensive and blocks. When dealing with a high number of concurrent spawns, the major bottleneck is the node event loop blocking while trying to create child processes.I noticed this by spawning hundreds of concurrent child processes and noting that only 1 CPU core was reaching 90+% utilization. The other cores running the child processes had less than 50% utilization.
Is there a more efficient way to spawn child processes than what is currently done in node? Or is there a way to spawn processes on a background node thread instead of blocking the main thread?
The text was updated successfully, but these errors were encountered: