Putting `asyncio` into context #4704

muhrin · 2021-02-02T10:29:22Z

muhrin
Feb 2, 2021

After some discussions with @ltalirz and @chrisjsewell regarding their excellent debugging of some recent memory leak issues (#4699, #4603, #4698, and possibly others) I thought it might be useful to share my view on part of the reason why they were difficult to track down (and perhaps why they happend in the first place) while providing historical context.

Plumpy and its interactions with AiiDA implement a mixed functions/coroutines model where many of the of plumpy's actions are handled by coroutine calls allowing us to use awaits and yields to effectively have cooperative multitasking between AiiDA processes. This is all well and good, and asyncio is designed precisely to handle such workloads.

The complexity of the system arises (at least in major part) because of an early design decision not to push asynchronous code up to the user (and in the early days I didn't even want to push this on AiiDA developers as this was a fairly new concept). This means that things like WorkChain steps are regular functions (that were called via a coroutine further up the call stack). The fact that asyncio doesn't support re-entrancy by design means that once you've 'gone function' there's no going back and that branch of the call stack is now forever synchronous (@unkcpz 's excellent reentrancy work notwithstanding).

My desire not to push too much async code into AiiDA has led to additional complexity in certain places that tries to get around the, intentional, limitations of asyncio #4699 being a case in point. Here TransportQueue.request_transport is a regular function (well generator) that tries to give you transport by scheduling a callback in the loop (after the next safe open interval) and yields a Future which will resolve to the transport when ready. This led to a difficult to identify leak involving the process stack being 'held' because it is needed when the do_open call finally gets reached.

For me, the lesson here is that, at least for AiiDA developers it's possibly better to ask them to go through the task of getting to know asyncio (a well documented, now fairly widely adopted paradigm) rather than dealing with custom code that tries to hide this from them. In this case request_transport would become a coroutine that awaits do_open which would itself deal with the safe open interval.

It's probably too much to ask users to change WorkChains and workfunctions to be coroutines but it may be worthwhile changing parts of AiiDA's (and plumpy's) internals to be more asyncio friendly. The general approach would be to:

Seach for ocurrences of call_later, call_soon
Consider if the functions that contain these calls, and all their parents up the stack, can be changed to coroutines.
If so, change the client code going up the stack, being midful of situations where a Future was returned, it may be that these can directly come awaits or not, some judgement may be necessary.
Look out for 'fire-and-forget' calls (e.g. a call to a function that returns a future but the client currently does nothing with the future). These should probably not become awaits and can be dealt with using ensure_future

Finally, it's worth considering cases when coroutines shouldn't be used. Well, if I've understood it correctly Guido's reasoning for not having re-entrancy is that any asynchronicity is obvious to see (you just look for the awaits). Therefore if there are actions that should be atomic a regular function should be used (I have to point out, though, that this very absolutist approach doesn't always work because you may want an atomic set of operations to occur that themselves require something asynchronous such as transport in which case they, too, are forced to become coroutines to highlight the fact that internally they do some async stuff).

If it's useful some of this can be rolled into an AEP and be systematically adopted.

Happy coding!

chrisjsewell · 2021-02-02T13:54:14Z

chrisjsewell
Feb 2, 2021

Thanks for the explanation @muhrin, FYI I have copied some of this to a design rationale page on plumpy (aiidateam/plumpy#202).
Questioning Guido, blasphemous! But yes I agree, in recent years aync/await has become well standardised, and so a small extra learning curve is better than artificially increasing the underlying complexity

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AiiDA team

Putting `asyncio` into context #4704

{{title}}

Replies: 1 comment

{{title}}

Select a reply

AiiDA team

Putting asyncio into context #4704

muhrin Feb 2, 2021

Replies: 1 comment

chrisjsewell Feb 2, 2021

Putting `asyncio` into context #4704

muhrin
Feb 2, 2021

chrisjsewell
Feb 2, 2021