Bug: sceua gets stuck with MPI after burn-in #226

MuellerSeb · 2019-07-29T13:28:16Z

Hey there,

from spotpy 1.5.0 on, sce optimization with MPI get stuck after the burn in phase. Here is a minimal example:

from spotpy.algorithms import sceua
from spotpy.examples.spot_setup_rosenbrock import spot_setup
setup = spot_setup("sceua")  # spot_setup() for spotpy 1.4.6
sampler = sceua(setup, parallel="mpi", dbname='db', dbformat="csv")
sampler.sample(repetitions=10000, ngs=4)

Running with

mpiexec -n 4 python3 test.py

Gives the following output:

Initializing the  Shuffled Complex Evolution (SCE-UA) algorithm  with  10000  repetitions
The objective function will be minimized
Initializing the  Shuffled Complex Evolution (SCE-UA) algorithm  with  10000  repetitions
The objective function will be minimized
Initializing the  Shuffled Complex Evolution (SCE-UA) algorithm  with  10000  repetitions
The objective function will be minimized
Initializing the  Shuffled Complex Evolution (SCE-UA) algorithm  with  10000  repetitions
The objective function will be minimized
Starting burn-in sampling...
Initialize database...
['csv', 'hdf5', 'ram', 'sql', 'custom', 'noData']
* Database file 'db.csv' created.
Burn-in sampling completed...
Starting Complex Evolution...
ComplexEvo loop #1 in progress...

And from there on, nothing more happens. With parallel="seq" it takes about 5 seconds to finish.
Do you know what the problem could be?

I've got mpi4py 3.0.2 installed and I am using Python 3.6.8. With spotpy 1.4.6 everything is working. From 1.5.0 on the above mentioned behavior occurs.

Cheers,
Sebastian

The text was updated successfully, but these errors were encountered:

MuellerSeb · 2019-07-29T16:42:49Z

After some bug tracking I think, the problem is in this line:

spotpy/spotpy/parallel/mpi.py

Line 200 in 269a5a7

if self.comm.Iprobe(source=i+1, tag=tag.answer):

where self.comm.Iprobe(source=i+1, tag=tag.answer) never evaluates to true.
Maybe this is related to this: https://groups.google.com/forum/#!topic/mpi4py/RiK8Fhd3LIU

But I've run out of ideas at this point.

philippkraft · 2019-07-31T15:32:40Z

Hi Sebastian, sorry for the long silence - vacation period. We "fixed" some SCE-UA bugs with the last version, I have to check the changes together with @thouska - who is still out of office. Can you check another sampler, if you have the same problems there? (e.g. ROPE or LHS). Just to make sure it is in the SCE-UA implementation (which is tricky) and not a general parallel='mpi' problem.

MuellerSeb · 2019-07-31T16:45:54Z

@philippkraft : Thanks for the reply. I checked the FAST routine, which worked as expected.

MuellerSeb · 2019-09-02T11:33:01Z

Something new on this topic?
Cheers, Sebastian

thouska · 2019-09-02T12:20:42Z

Hi Sebastian,
unfortunatelly, there is not much new on this topic. At least I can confirm your error description. I am on it and will inform you here as soon as this is fixed. Sorry that it takes so long...
Based on your report, we are also working to test the mpi implementation on travis (#231), so that such erros can, hopefully, be avoided in the future.

thouska · 2019-09-02T13:36:02Z

Ok, now it should be fixed. Somehow this in spotpy version 1.5.0 introduced new design of the _RunStatistic class in _algorithm.py was not pickable under mpi4py. This resulted your described stuck after the burn-in phase. I removed the use of the _RunStatistic class while spotpy is running on cpu-slaves. This fixes the problem (at least in my mpi environment). The change might result in a bit longer runtimes at the end of the sampling (will be fixed), but for now it is at least running again.

thouska · 2019-09-02T13:40:05Z

PS: If you want to test this, the corresponding new version (1.5.3) of spotpy is available on pypi.

MuellerSeb · 2019-09-02T20:48:39Z

I installed spotpy 1.5.4 and now I am getting the following error:

  File "/usr/local/lib/python3.6/dist-packages/spotpy/__init__.py", line 41, in <module>
    from . import unittests
ImportError: cannot import name 'unittests'

The submodule unittests is missing in the package. This is due to this line in the setup.py:

spotpy/setup.py

Line 16 in 0d55074

    
           packages = ["spotpy", "spotpy.examples", "spotpy.examples.hymod_python", "spotpy.examples.hymod_exe",

you should use this instead:

packages=find_packages(exclude=["tests*", "docs*"])

with this on the first line:

from setuptools import setup, find_packages

But after commenting out the from . import unittests it now works.

MuellerSeb · 2019-09-02T20:53:13Z

Maybe you could shift the unittests folder to a toplevel folder named tests, as mentioned in the exclude pattern, which is a common way, Than you have to adopt the .travis.yml file. I dont think the unit tests need to be in the package when there is a separate example folder.

Moves tests on toplevel, partly removes jit from hymod_python.py #226

hpsone · 2019-09-03T14:12:51Z

I had similar problems but I just saw @thouska just updated but I mean [I have not] test it out the newest version. :D . I will do it now. :D

thouska · 2019-09-03T14:17:44Z

Many thanks @MuellerSeb that you directly tested everything and reported such a detailed way how to fix the new problems. As you recommended, I removed the unittest import, renamed the unittests folder to tests and moved the whole thing to the toplevel. I like the new structure and think this makes totaly sense.
As @hpsone found out faster than I could answer to this issue: There is a new version on pypi containing the fix.

hpsone · 2019-09-03T16:47:40Z

Sorry for my rush comment. I want to say I have not tested it yet. But now I tested it and it is not working for me. May be it is my mistake in the model but my mpi is working properly as I tested it with Telemac2d. What could be the possible error. Anyway, @thouska thank you very much for help.
Best Regards
Htun

MuellerSeb · 2020-01-13T15:21:48Z

@hpsone : maybe you have to give some details on your problem to get an answer.

hpsone · 2020-01-16T11:15:14Z

@MuellerSeb Thank you so much. I am not quite sure what is the error. But I did run using "mpc" instead of "mpi" and it worked. Anyway I will try again but it probably might be my insufficient knowledge.

thouska · 2020-04-01T09:09:55Z

I guess this issue is solved, if not feel free to reopen.

MuellerSeb mentioned this issue Jul 29, 2019

mpi and sce #202

Closed

thouska mentioned this issue Aug 8, 2019

correct way to spawn a subprocess when running SCE-UA using multiprocess mpi #222

Closed

thouska added the bug label Sep 2, 2019

thouska added a commit that referenced this issue Sep 2, 2019

Fix for #226

e29e403

thouska added a commit that referenced this issue Sep 2, 2019

Update version number #226

6df8794

thouska added a commit that referenced this issue Sep 3, 2019

Exclude unittests from build and rename them to tests #226

3ff0c03

thouska added a commit that referenced this issue Sep 3, 2019

Merge pull request #232 from thouska/Move-tests-on-toplevel

66f3810

Moves tests on toplevel, partly removes jit from hymod_python.py #226

thouska closed this as completed Apr 1, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: sceua gets stuck with MPI after burn-in #226

Bug: sceua gets stuck with MPI after burn-in #226

MuellerSeb commented Jul 29, 2019

MuellerSeb commented Jul 29, 2019

philippkraft commented Jul 31, 2019

MuellerSeb commented Jul 31, 2019

MuellerSeb commented Sep 2, 2019

thouska commented Sep 2, 2019

thouska commented Sep 2, 2019

thouska commented Sep 2, 2019

MuellerSeb commented Sep 2, 2019

MuellerSeb commented Sep 2, 2019

hpsone commented Sep 3, 2019 •

edited

Loading

thouska commented Sep 3, 2019

hpsone commented Sep 3, 2019

MuellerSeb commented Jan 13, 2020

hpsone commented Jan 16, 2020

thouska commented Apr 1, 2020

Bug: sceua gets stuck with MPI after burn-in #226

Bug: sceua gets stuck with MPI after burn-in #226

Comments

MuellerSeb commented Jul 29, 2019

MuellerSeb commented Jul 29, 2019

philippkraft commented Jul 31, 2019

MuellerSeb commented Jul 31, 2019

MuellerSeb commented Sep 2, 2019

thouska commented Sep 2, 2019

thouska commented Sep 2, 2019

thouska commented Sep 2, 2019

MuellerSeb commented Sep 2, 2019

MuellerSeb commented Sep 2, 2019

hpsone commented Sep 3, 2019 • edited Loading

thouska commented Sep 3, 2019

hpsone commented Sep 3, 2019

MuellerSeb commented Jan 13, 2020

hpsone commented Jan 16, 2020

thouska commented Apr 1, 2020

hpsone commented Sep 3, 2019 •

edited

Loading