-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request Collection for WorkGraph #6561
Comments
Thanks @superstar54 . Could you provide a bit more technical context for "Allow Passing Parent PID to the Process During As for allowing to submit process functions to the daemon: the tricky part here is just to make sure the daemon can actually import the function code. If the function is importable, then there should be no problem. However, process functions currently do not always have to be importable, they could be defined inline in a shell, notebook, or script, in which case they cannot be imported by the daemon. One could try to resort to using pickling off the function and have the daemon unpickle it instead of importing. But we have to be careful that when we enable this, that the user won't be confused/frustrated when sometimes submitting to the daemon works, and other times it doesn't. Also, could you please explain exactly why process functions not being submittable is causing a problem at the moment for scheduling in |
Hi @sphuber, thanks for your comments! Using
|
Thanks @superstar54 . I think I understand a bit better what you are trying to accomplish. Did you ever measure the "cost" of having the idling parent processes that are sitting in a daemon worker? Do we actually know that they use a lot of resources? Is it mostly CPU cost that you are worried about or memory? Your idea may however be interesting in solving the problem where daemon workers have all their process slots with waiting parent processes. This has the potential for causing a deadlock. Although that has never really produced itself in practice, it is still not ideal and we work around it by allowing to increase the number of slots per worker. If we could elegantly solve this, that would be great. I am not so sure about the implementation though. I might not fully understand it yet, but if I understand correctly, you implement a new And I think I now understand why you need the |
Yes, your understanding is correct. Here is an example use case. Users need to start a separate daemon for the Scheduler Process. workgraph scheduler start This will create a runner, and launch a To submit a WorkGraph to the scheduler, set the wg.submit(to_scheduler=True) This will send a msg to the
Of couse, users can still submit the WorkGraph directly without the Scheduler. wg.submit()
I haven't measured the CPU and memory usage yet, but I believe that idling processes don’t add much to either. Initially, my concern was that idle processes would occupy worker slots (e.g., with a default limit of 200), requiring users to increase the number of workers. Since the number of workers is typically limited by the number of CPUs, I saw this as an issue. However, as you pointed out, users can increase the number of slots per worker, meaning idle processes won’t necessarily waste slots. That said, increasing the number of slots could overwhelm a worker and prevent the WorkGraph from scheduling new tasks efficiently. Tasks like calcfunction or calcjob (e.g., its parser) may take a long time, potentially blocking the worker. This is why I want to implement a dedicated Scheduler that only handles WorkGraph-related tasks and doesn't accept others. A Scheduler focused on WorkGraph tasks would only be responsible for analyzing task dependencies and submitting tasks, which shouldn’t overload it. Other upsides of the scheduler: 1) we may control the total number of running tasks (e.g., limit the number of calcjob on a computer); 2) we may control the priority of the workgraph. The downsides of the scheduler: the user needs to maintain a separate daemon and a special Scheduler process. This is still an idea in development, and it's currently implemented in a draft PR. Any comments and suggestions are welcomed! |
I don't understand this one, the process functions are run on the daemon, the issue is they are blocking the event loop. Are your propose to run with spawning in to an executor that not block the event loop thread? I think if the aiida can ensure the DB access of calcfunction and workfunction is thread-safe, this is worth to try with https://docs.python.org/3/library/asyncio-eventloop.html#asyncio.loop.run_in_executor For the other two requests, they are covered in the design of aiidateam/AEP#42 |
To gain an overall view of feature requests related to WorkGraph, I'm opening this issue to collect and track them in one place. While creating individual issues for each request is also beneficial, this list will help maintain an overview, especially since some of these issues are interrelated. I want to ensure developers can easily locate relevant feature requests without having to sift through hundreds of other issues.
Feature Requests
Note: this list will be continuously updated.
Allow Passing Parent PID to the Process During
instantiate_process
:In
aiida-core
, this is handled automatically. However, in WorkGraph, we need more control over this process to ensure proper task scheduling.Enable Submission of
calcfunction
andworkfunction
to the Daemon:This feature would prevent these functions from blocking the scheduling and response of processes.
Support multiple queues
We may want a process (Scheculer) to listen to a task queue that is different from the default one. kiwipy supports it, but in aiida-core, we always use the default one, and does not expose the API for selecting queues, e.g., in
manager.create_daemon_runner.
,RemoteProcessThreadController.task_send
Discussion and Implementation
We can discuss whether to implement these features directly in
aiida-core
or elsewhere. If anyone is interested in taking on any of the above features, please feel free to proceed.The text was updated successfully, but these errors were encountered: