Allow passing kwds to ProcessPool #252

ddelange · 2022-09-29T06:01:39Z

Hi 👋

I would like to propagate maxtasksperchild keyword, but for this I had to switch from pathos to multiprocess ref ddelange/mapply#29.

This however decreases stability/cleanup of workers, so rather allow a pathos user to propagate it :)

ddelange · 2022-09-29T14:20:15Z

hi @mmckerns 👋 thanks for linking the open issues!

while we're at it, is there currently a way to propagate chunksize? since keyword arguments to imap are swallowed, I'm not sure wether passing it currently will be respected 🤔

mmckerns · 2022-09-29T14:59:28Z

I hate to admit it... but there's a PR (#198) that's been open for a while on chunksize and I haven't built a test set for it to see what will happen.

ddelange · 2022-09-29T16:44:32Z

it would be cool to merge both PRs, but agree that it would be cool to test (for both PRs) that behaviour changes when some kwarg is passed.

apart from tests, is there anything else to do before merging?

ddelange · 2022-09-29T17:25:23Z

added test

ddelange · 2022-10-05T06:41:15Z

@mmckerns anything from my side still?

mmckerns · 2022-10-05T09:51:07Z

This is good... it's on my shortlist to test and review.

mmckerns

Most of these changes are fine. However, the code in _serve needs work. Essentially, a Pool instance gets cached in __STATE, and will be reused unless you make a change to the Pool configuration. So, if you call Pool(4) then you call Pool(4, maxtasksperchild=2)... then as is, your code won't spawn an new pool with maxtasksperchild=2 (because of line 117). Basically, if the nodes or any kwds change, you need to make sure it instantiates a new Pool.

You should also make the same changes for ThreadPool.

mmckerns · 2022-12-19T19:00:01Z

pathos/multiprocessing.py

        return
    if AbstractWorkerPool.__init__.__doc__: __init__.__doc__ = AbstractWorkerPool.__init__.__doc__ + __init__.__doc__
   #def __exit__(self, *args):
   #    self._clear()
   #    return
-    def _serve(self, nodes=None): #XXX: should be STATE method; use id
+    def _serve(self, nodes=None, **kwds): #XXX: should be STATE method; use id
        """Create a new server if one isn't already initialized"""
        if nodes is None: nodes = self.__nodes
        _pool = __STATE.get(self._id, None)
        if not _pool or nodes != _pool.__nodes:


needs to also check _pool._maxtasksperchild, _pool._initargs, _pool._initializer

Suggested change

if not _pool or nodes != _pool.__nodes:

if (

_pool is None

or nodes != _pool.__nodes

or kwds.get('maxtasksperchild') != _pool._maxtasksperchild

or kwds.get('initargs') != _pool._initargs

or kwds.get('initializer') != _pool._initializer

):

like this?

it's missing the leading underscores in the latter two.

👍 edited the comment

I believe you'll also need: (1) corresponding changes to _clear, and (2) when _serve is called by one of the map functions and kwds={}, then it does the expected thing by pulling the kwds from the existing pool.

Added 5bdd7af which should take care of it: only add a new pool to state if kwds changed, and only clear a pool from state if kwds match. The (2) part I didn't fully get: pool kwds are e.g. initializer, whereas map kwds are e.g. chunksize. I think it should be OK with this last commit, as the is no overlap between those two kinds of kwds?

correct, there is no overlap between map and pool kwds.

ddelange · 2022-12-20T19:16:47Z

pathos/tests/test_mp.py

@@ -27,6 +28,15 @@ def test_mp():
    result = result_queue.get()
    assert result == _result

+    # test ProcessPool keyword argument propagation
+    pool.clear()
+    pool = ProcessPool(nodes=4, initializer=lambda: time.sleep(0.6))


Suggested change

pool = ProcessPool(nodes=4, initializer=lambda: time.sleep(0.6))

pool = ProcessPool(nodes=4, initializer=lambda: time.sleep(0.6))

assert pool._pool.initializer, 'Subsequent pool with different kwds should propagate'

right? if not propagated, default initializer should be falsely?

wondering why this test actually takes 0.6+ seconds, doesn't that mean propagation is working? 🤔

Default initializer is None, I believe. You can check the defaults with:

>>> from pathos.pools import _ProcessPool as Pool >>> p = Pool() >>> p._initializer

and so on.

is there a state bleed somewhere? because by your theory, my test should be failing. as the test does exactly what you described: first a plain pool, then a pool with initializer. by your theory, that initializer should be ignored and old pool reused. but by the test, the map now takes much longer. so apparently the initializer is propagated?

I don't think you've made all the necessary edits. See comments above.

sure, I'll still make the changes, but I still would like to understand why the test is showing the expected results. L13 populates the state, and L33 calls ProcessPool() again but with sleep initializer. And then time.monotonic shows that the call to map now indeed takes longer due to the sleep. So why does it currently work, if you say it should not work?

It was probably clearing the cached pool because _serve was being called with empty kwds in the map call (and the proper state handling wasn't done correctly).

mmckerns

LGTM. Needs documentation, but I'll add that.

ddelange · 2022-12-22T20:08:49Z

super! thanks for the review 👍

mmckerns · 2022-12-22T20:50:03Z

I also handled if someone passes processes in kwds. Seems to be done now. Thanks for the PR.

This was linked to issues Sep 29, 2022

Support for maxtasksperchild #138

Closed

suggest pathos support initializer parameter for pathos.multiprocessing.ProcessPool #220

Closed

ddelange mentioned this pull request Sep 29, 2022

⏪ Bring back pathos ddelange/mapply#30

Merged

mmckerns added the enhancement label Sep 29, 2022

ddelange force-pushed the patch-1 branch 3 times, most recently from 4a8782a to b6bdb3c Compare September 29, 2022 17:22

Allow passing kwds to ProcessPool

171619c

ddelange force-pushed the patch-1 branch from b6bdb3c to 171619c Compare September 29, 2022 18:11

ddelange mentioned this pull request Sep 29, 2022

How to use initialize with pathos ProcessingPool #151

Closed

ddelange mentioned this pull request Oct 7, 2022

Support of PyTorch Tensors on CPU #250

Closed

ddelange mentioned this pull request Nov 23, 2022

Propagate chunksize keyword argument #253

Closed

mmckerns requested changes Dec 19, 2022

View reviewed changes

mmckerns reviewed Dec 19, 2022

View reviewed changes

Merge branch 'uqfoundation:master' into patch-1

e16b426

ddelange commented Dec 20, 2022

View reviewed changes

Propagate kwds as class instance attribute

5bdd7af

ddelange force-pushed the patch-1 branch from 24b7fbb to 5bdd7af Compare December 21, 2022 22:02

Include changes to def restart

04089b2

ddelange force-pushed the patch-1 branch from 08bd3ea to 04089b2 Compare December 21, 2022 22:23

Add kwds changes and test for ThreadPool

ab32720

ddelange requested a review from mmckerns December 22, 2022 11:57

mmckerns approved these changes Dec 22, 2022

View reviewed changes

mmckerns merged commit dfd15d0 into uqfoundation:master Dec 22, 2022

mmckerns added this to the pathos-0.3.1 milestone Dec 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow passing kwds to ProcessPool #252

Allow passing kwds to ProcessPool #252

ddelange commented Sep 29, 2022

ddelange commented Sep 29, 2022

mmckerns commented Sep 29, 2022

ddelange commented Sep 29, 2022

ddelange commented Sep 29, 2022

ddelange commented Oct 5, 2022

mmckerns commented Oct 5, 2022

mmckerns left a comment •

edited

Loading

mmckerns Dec 19, 2022 •

edited

Loading

ddelange Dec 20, 2022 •

edited

Loading

mmckerns Dec 21, 2022

ddelange Dec 21, 2022

mmckerns Dec 21, 2022

ddelange Dec 21, 2022 •

edited

Loading

mmckerns Dec 21, 2022

ddelange Dec 20, 2022

mmckerns Dec 21, 2022

ddelange Dec 21, 2022 •

edited

Loading

mmckerns Dec 21, 2022

ddelange Dec 21, 2022

mmckerns Dec 21, 2022

mmckerns left a comment

ddelange commented Dec 22, 2022

mmckerns commented Dec 22, 2022

-        if not _pool or nodes != _pool.__nodes:
+        if (
+            _pool is None
+            or nodes != _pool.__nodes
+            or kwds.get('maxtasksperchild') != _pool._maxtasksperchild
+            or kwds.get('initargs') != _pool._initargs
+            or kwds.get('initializer') != _pool._initializer
+        ):

	pool = ProcessPool(nodes=4, initializer=lambda: time.sleep(0.6))
	pool = ProcessPool(nodes=4, initializer=lambda: time.sleep(0.6))
	assert pool._pool.initializer, 'Subsequent pool with different kwds should propagate'

Allow passing kwds to ProcessPool #252

Allow passing kwds to ProcessPool #252

Conversation

ddelange commented Sep 29, 2022

ddelange commented Sep 29, 2022

mmckerns commented Sep 29, 2022

ddelange commented Sep 29, 2022

ddelange commented Sep 29, 2022

ddelange commented Oct 5, 2022

mmckerns commented Oct 5, 2022

mmckerns left a comment • edited Loading

Choose a reason for hiding this comment

mmckerns Dec 19, 2022 • edited Loading

Choose a reason for hiding this comment

ddelange Dec 20, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ddelange Dec 21, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ddelange Dec 21, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mmckerns left a comment

Choose a reason for hiding this comment

ddelange commented Dec 22, 2022

mmckerns commented Dec 22, 2022

mmckerns left a comment •

edited

Loading

mmckerns Dec 19, 2022 •

edited

Loading

ddelange Dec 20, 2022 •

edited

Loading

ddelange Dec 21, 2022 •

edited

Loading

ddelange Dec 21, 2022 •

edited

Loading