Deadlock when creating many processes #1397

greschd · 2018-04-10T15:22:41Z

When many processes are created, the workers will clog up and keep checking the status of a few processes, without actually running the processes that it depends on.

This is due to the prefetch limit.

Possible ways to solve this:

Set the prefetch limit to 0 (no limit). Maybe this would create some problems with the heartbeat?
Add a scheduler which tells the workers to "drop" waiting processes and pick up new ones.

sphuber · 2019-04-08T11:23:10Z

The minimum amount of work for 1.0.0 should be to at least to see if we can add a warning if a user launches a process and the available daemon worker slots are below a certain percentage/number. Not sure where this should be triggered. The most logical point would be in the launchers, but that means that anytime this is called some sort of query to all daemon workers has to be performed to get the number of open slots, if this is even possible, which might be quite slot.

Alternative which is less accurate but more feasible is to simply get number of workers and multiply with prefetch count global constant and then compare with active number of processes. Actually, just run this only when calling verdi process list and issue the warning at the bottom, just like when the daemon is not running for example. Add instructions with what to do, verdi daemon incr but maybe also link to documentation explaining the problem. Suggested threshold 90%

ConradJohnston · 2019-04-16T14:46:27Z

I pushed an implementation of this warning fix to a branch in my fork.

What we learned during this process is that the query to generate the table in verdi process list can be quite costly (~ 18 secs on @sphuber 's 10.5 million node DB). The problem with piggy-backing this query to get the number of active processes is that really one needs to do a second query for only the active processes. This is because the user can apply any filters they want to the output of verdi process list but we want to guarantee that the warning will be printed in all cases (except for raw output). 30+ secs to get a response is excessive. The reason for this performance is likely related to filtering on process state, which is stored as an attribute.

A second problem is that getting a CircusClient in order to obtain the number of workers is also quite slow (about 1 second).

A final problem is what to tell the user to do when the warning fires. Currently it suggests that they should increase the number of workers. However, we are not sure what the upper limit for processes should be and what is feasible under a typical OS and system setup. In this sense, perhaps giving this advice blindly is a bad idea.

We need to think about whether printing a warning is the right course of action, for now, and make plans to fix the cause of the problem.

greschd · 2019-05-11T08:34:36Z

Thanks @ConradJohnston for the improvement!

@sphuber Should we really close this issue though, or was that just done automatically/accidentally because the PR mentions this issue?

To add to my initial suggestions, I think one could also implement some logic where a process gets dropped (and added to the end of the queue) when it has been waiting for a specific number of "worker cycles" or specific time.

Maybe my tasks were a bit excessive in the number of processes created, but I've found launching a large number of workers to be problematic in terms of CPU load -- for workers which end up not doing much.

sphuber · 2019-05-11T14:14:43Z

Yeah, it was closed accidentally. The heavy CPU load, was that on the alphas of aiida-core. There were some inefficiencies that have been solved that could cause them too spin too often, i.e. the polling fallback was firing all the time because the default was wrongly set to 0. The actual solution to this problem will probably be quite complicated, as any typical solution might add other inefficiencies and something we might want to do, would require functionality from RabbitMQ for which it was not designed and so won't be able to provide.

greschd · 2019-05-12T01:32:05Z

Yeah that was on alpha, glad to hear that should now be fixed.

True, the scheduling definitely isn't an easy fix. Another option (maybe as temporary fix) would also be to have a watcher thread that dynamically scales the workers (as an optional setting).

sphuber · 2019-06-17T08:56:29Z

The proper solution to this will require significant changes to the way RabbitMQ is used. Therefore I am putting this in v2.0.0. For now, the "fix" implemented #2774 should give some breathing room, but the problem is not fixed

muhrin · 2019-12-12T19:08:16Z

I'm thinking about how to solve this in a more general way, for now I'm collecting information here:

https://github.com/aiidateam/aiida-core/wiki/Scaling-of-AiiDA-workers-and-slots

because it is easier than re-reading the full conversaion each time.

greschd · 2019-12-12T23:03:08Z

Should we continue discussing here, and you'll collect it there? Or edit there directly?

The problem with this is that the parent would then not be able to receive messages and be unable to be killed, paused, etc.

I guess when the total number of processes is higher than the number of "slots" = workers * processes per worker, that is kind of unavoidable, right? Is my understanding correct that the message would still get queued and they receive it once some other worker picks them up?

I think some sort of rotation / scheduling is needed here. Some comments:

As long as the number of slots is larger than the number of processes, I think a process should not be "pulled off".
There could either be a dedicated "load balancer", or some heuristic for when a process is "dropped" while waiting. A middle ground would be a "monitor" that observes process performance / stats, and adjusts "heuristic parameters".

Since this seems like a classical scheduling problem, we should definitely consult existing solutions / algorithms.

muhrin · 2019-12-13T08:52:22Z

~~Discussing here is good, I was just thinking of putting summary sutff in the wiki.~~

@gerschd, what you say is good except we still would need to deal with the problem of how to make sure processes that are 'pulled off' can still deal with incoming messages. Additionally we have to be careful that any scheduler like implementation doesn't end up in a situation where, say, a process is waiting for transport but is pulled off before it gets it because it looks like it's doing nothing. In the worse case processes could never end up getting their transport because they are always pulled off before they get it.

summary below (this used to be in the wiki, but is best kept here):

Background information

The AiiDA daemon runs processes by launching workers which each have a finite number of slots representing the number of simultaneous (in the coroutine sense) processes that they can run. Each worker maintains a single connection to the database.

This choice of implementation was guided by the following constraints:

There is a database connection limit (typically 100 for ubuntu) which should be respected
Threads don't always play nice with Django or sqlalchemy and indeed Django uses a separate connection per thread.

Problems

There are a number of problems with the current solution, some of which have been discussed in #1397. Specifically:

AiiDA workflows are allowed to block for lengthy periods of time e.g. if carrying out a complex local calculation. This means that other processes on the same worker are blocked (and can't even receive message), this makes it undesirable to have too many on the same worker.
Workchains may spawn child processes which they wait on, however if there are no more slots left this child will never run and the parent will wait indefinitely whilst blocking a slot.

Possible solutions

Is is tempting to try and pull a waiting parent processes off a worker to allow the child to run and then re-create it allowing it to continue once the child finishes. The problem with this is that the parent would then not be able to receive messages and be unable to be killed, paused, etc.

greschd added type/bug priority/critical-blocking must be resolved before next release labels Apr 10, 2018

muhrin assigned muhrin and sphuber Jul 25, 2018

muhrin added this to the v1.0.0 milestone Jul 25, 2018

giovannipizzi added the requires discussion label Dec 3, 2018

sphuber modified the milestones: v1.0.0, v1.0.0b1 Dec 4, 2018

sphuber modified the milestones: v1.0.0b1, v1.0.0 Dec 11, 2018

giovannipizzi assigned CasperWA and ConradJohnston and unassigned muhrin and sphuber Apr 8, 2019

ConradJohnston mentioned this issue Apr 17, 2019

Add slots warning to verdi process list #2774

Merged

ConradJohnston unassigned CasperWA Apr 24, 2019

sphuber closed this as completed in #2774 May 10, 2019

sphuber reopened this May 11, 2019

sphuber modified the milestones: v1.0.0, v2.0.0 Jun 17, 2019

greschd mentioned this issue Feb 24, 2020

Have a WorkChain step get the transport #3478

Open

sphuber removed this from the v2.0.0 milestone Feb 27, 2020

sphuber added priority/important and removed priority/critical-blocking must be resolved before next release labels Dec 16, 2022

sphuber mentioned this issue Jan 24, 2023

Docs: move info on workers/slots from wiki to main docs #5863

Closed

superstar54 mentioned this issue Aug 28, 2024

Feature: scheduler aiidateam/aiida-workgraph#275

Draft

11 tasks

agoscinski added the topic/engine label Oct 28, 2024

agoscinski assigned unkcpz Oct 28, 2024

unkcpz unassigned ConradJohnston Oct 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deadlock when creating many processes #1397

Deadlock when creating many processes #1397

greschd commented Apr 10, 2018

sphuber commented Apr 8, 2019 •

edited

Loading

ConradJohnston commented Apr 16, 2019

greschd commented May 11, 2019 •

edited

Loading

sphuber commented May 11, 2019

greschd commented May 12, 2019

sphuber commented Jun 17, 2019

muhrin commented Dec 12, 2019

greschd commented Dec 12, 2019

muhrin commented Dec 13, 2019 •

edited by sphuber

Loading

Deadlock when creating many processes #1397

Deadlock when creating many processes #1397

Comments

greschd commented Apr 10, 2018

sphuber commented Apr 8, 2019 • edited Loading

ConradJohnston commented Apr 16, 2019

greschd commented May 11, 2019 • edited Loading

sphuber commented May 11, 2019

greschd commented May 12, 2019

sphuber commented Jun 17, 2019

muhrin commented Dec 12, 2019

greschd commented Dec 12, 2019

muhrin commented Dec 13, 2019 • edited by sphuber Loading

Background information

Problems

Possible solutions

sphuber commented Apr 8, 2019 •

edited

Loading

greschd commented May 11, 2019 •

edited

Loading

muhrin commented Dec 13, 2019 •

edited by sphuber

Loading