Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dream sampler. Parallel computation #266

Open
baramousa opened this issue Apr 9, 2021 · 13 comments
Open

Dream sampler. Parallel computation #266

baramousa opened this issue Apr 9, 2021 · 13 comments
Assignees

Comments

@baramousa
Copy link

Hi, this is not really an issue. i just want to know which version of dream this package is?. Is it the basic dream or dreamzs or mt-dreamzs.. Am asking because i want to know if parallel computation is possible . As far as i know basic dream can be run only sequential while the others can be run in parallel.

Thanks @thouska

@thouska
Copy link
Owner

thouska commented Apr 9, 2021

Hi @baramousa
The dream algorithm version implemented in spotpy corresponds to the Algorithm 6 as presented in this publication:
https://www.sciencedirect.com/science/article/pii/S1364815215300396?casa_token=gCl00Qy8ymsAAAAA:BcW90XS8GyI2Rwi7sJnunxAUOAhfQMz9eEHTWSbjgvPflnUxF5DI7cm3qq1OzXro01_bdf3Pyz4
So it can be run in parallel. If you want, you can check out this example, which is set-up to run n=4 chains.
https://github.com/thouska/spotpy/blob/master/spotpy/examples/tutorial_dream_hymod.py
Chaning the parallel keyword to 'mpi' and hand this setting to the sampler, would result that spotpy start each of the chains runs on an individual cpu core. Some details about this are given here:
https://github.com/thouska/spotpy/blob/master/spotpy/examples/tutorial_parallel_computing_hymod.py

@baramousa
Copy link
Author

thanks for the quick reply. just another question let us say i want to implement the hymod example below.
https://github.com/thouska/spotpy/blob/master/spotpy/examples/tutorial_dream_hymod.py

if i want to implement it it parallel then i need to set parellel to equal "mpi" for linux machine and to "mpc" for windows machine. am i getting it right ?

@thouska
Copy link
Owner

thouska commented Apr 9, 2021

Yes, thats correct.

@baramousa
Copy link
Author

ok. thanks a lot . I wil try it and give my feedback

@baramousa
Copy link
Author

baramousa commented Apr 12, 2021

Hi @thouska . i tried to run your hymod_dream example in parallel. it seams to run but then i get a warning. First this is the code:
import numpy as np
import spotpy
import matplotlib.pyplot as plt
from spotpy.likelihoods import gaussianLikelihoodMeasErrorOut as GausianLike
from spotpy.analyser import plot_parameter_trace
from spotpy.analyser import plot_posterior_parameter_histogram
import sys
if name == "main":
parallel ='mpc'
from spotpy.examples.spot_setup_hymod_unix import spot_setup
spot_setup=spot_setup(GausianLike)
sampler=spotpy.algorithms.dream(spot_setup, dbname='DREAM_hymod',parallel=parallel, dbformat='csv')
rep=5000
nChains = 4
convergence_limit = 1.2
nCr = 3
eps = 10e-6
runs_after_convergence = 100
acceptance_test_option = 6
r_hat = sampler.sample(rep, nChains, nCr, eps, convergence_limit)
results = spotpy.analyser.load_csv_results('DREAM_hymod')

Then i get this warning:

Convergence rates =1.5744 4.8378 1.4476 1.3106 1.5791
1003 of 5000, maximal objective function=-8270.54, time remaining: 00:04:34
Acceptance rates [%] =15.08 13.89 11.51 25.
Convergence rates =1.5756 5.1658 1.4241 1.3066 1.5492
1021 of 5000, maximal objective function=-8270.54, time remaining: 00:04:36
Acceptance rates [%] =15.12 13.95 11.63 25.19
Convergence rates =1.5518 5.793 1.4008 1.298 1.5164

IndexError Traceback (most recent call last)
in
11 runs_after_convergence = 100
12 acceptance_test_option = 6
---> 13 r_hat = sampler.sample(rep, nChains, nCr, eps, convergence_limit)
14 results = spotpy.analyser.load_csv_results('DREAM_hymod')
15

c:\users\albaraalmawazreh\appdata\local\programs\python\python37\lib\site-packages\spotpy\algorithms\dream.py in sample(self, repetitions, nChains, nCr, eps, convergence_limit, runs_after_convergence, acceptance_test_option)
274 while self.iter < self.repetitions:
275 param_generator = ((curChain,self.get_new_proposal_vector(curChain,newN,nrN)) for curChain in range(int(self.nChains)))
--> 276 for cChain,par,sim in self.repeat(param_generator):
277 pCr = np.random.randint(0,nCr)
278 ids=[]

c:\users\albaraalmawazreh\appdata\local\programs\python\python37\lib\site-packages\spotpy\parallel\mproc.py in call(self, jobs)
52 def call(self, jobs):
53 results = self.pool.imap(self.f, jobs)
---> 54 for i in results:
55 yield i

c:\users\albaraalmawazreh\appdata\local\programs\python\python37\lib\site-packages\multiprocess\pool.py in next(self, timeout)
746 if success:
747 return value
--> 748 raise value
749
750 next = next # XXX

IndexError: list index out of range

@thouska
Copy link
Owner

thouska commented Apr 13, 2021

Hi @baramousa
thank you for your message and the detailed error describtion. I can confirm an error there and will look into this together with @philippkraft. Sorry for any inconvience this may cause to you. I will keep you posted about the progress.

@thouska
Copy link
Owner

thouska commented Apr 16, 2021

Hi @baramousa,
it turns out to be quiete a task to solve this issue. We will work on this at #268 and also on local machines. It might take a while and I cannot gurantee final success atm. Meanwhile, would 'mpi' parallelization be a solution for you? This should work fine :)

@baramousa
Copy link
Author

Hi @thouska . Sorry for the late reply. I downloaded Anaconda which has the pythom 3.8 or newer one, and tried the parallelisation of dream on windows and it seems to work. However my issue now is that my model write input and output data as text files. And in order for the parallelisation to work effictively, each chain should has it is own directory were write and read input and output files. My question is if there is a way to extract the id/number of the currently running chains so i can insert them in my model to create a directory for each of them.
Since i am also trying to use SCE-UA, would you suggest a way to do the same with it.
thanks in advance :)

@thouska
Copy link
Owner

thouska commented May 26, 2021

Hi @baramousa,
ok I have not tested yet with the newest Anaconda, would be great if it solves the problem!
Regarding the parlallel writing/reading, you are perfectly right. One needs to handle that this is done individually for each core. I wrote a short example for that, which you can find here.

Basically under 'mpi' you can access the cpu_id this way:

cpu_id = str(int(os.environ['OMPI_COMM_WORLD_RANK']))

Under 'mpc' it is done like this:

cpu_id = str(os.getpid())

I would recommend to work with these, instead of usinf the chain_id (in case of dream) or complex_id (in case of sce-ua), as the above example works independent of the choice of the algorithm in spotpy.

@baramousa
Copy link
Author

baramousa commented May 28, 2021

Hi @thouska ,
thanks for your answer. now it works. input and output files are being written and read in individual directories corresponding to the core name. however now the csv summary file which should include the whole results of all simulations is only having the very last carried out simulations of each chain for dream and no data for sceua. The simulations run and summary is shown in the console but the csv files are not written properly.
Can you tell where is the problem?

i am guessing it has to do with this script in _algorithm.py :

        def save(self, like, randompar, simulations, chains=1):
  
  
            # Initialize the database if no run was performed so far
  
  
            self._init_database(like, randompar, simulations)
  
  
            # Test if like and the save threshold are float/list and compare accordingly
  
  
            if self.__is_list_type(like) and self.__is_list_type(self.save_threshold):
  
  
                if all(i &gt; j for i, j in zip(like, self.save_threshold)): #Compares list/list
  
  
                    self.datawriter.save(like, randompar, simulations, chains=chains)
  
  
            if (not self.__is_list_type(like)) and (not self.__is_list_type(self.save_threshold)):
  
  
                if like&gt;self.save_threshold: #Compares float/float
  
  
                    self.datawriter.save(like, randompar, simulations, chains=chains)
  
  
            if self.__is_list_type(like) and (not self.__is_list_type(self.save_threshold)):
  
  
                if like[0]&gt;self.save_threshold: #Compares list/float
  
  
                    self.datawriter.save(like, randompar, simulations, chains=chains)
  
  
            if (not self.__is_list_type(like)) and self.__is_list_type(self.save_threshold): #Compares float/list
  
  
                if (like &gt; self.save_threshold).all:

thouska added a commit that referenced this issue May 31, 2021
pathos multiprocessing imap was resulting in broke spotpy database. Switching to map solces the issue
@thouska
Copy link
Owner

thouska commented May 31, 2021

Hi @baramousa,
thank you for the update! And indeed, the broken file was the point where I got stuck at #268. To be honest I did not fully understand why this did not work, as results are internally perfectly fine, but where not in the final output file.

However, I looked again into this, played a lot around and can finally come up with a fix (see commit above).
Basically I change line 53 in mproc.py

from:

results = self.pool.imap(self.f, jobs)

into:

results = self.pool.map(self.f, jobs)

Now it works fine for me. At least in 90% of the cases. From time to time the header is broken, but the rest should be fine. @baramousa: Could you test for your case and give your feedback here?

@baramousa
Copy link
Author

baramousa commented Jun 1, 2021

Hi @thouska ,
indeed when i change the dbfformat to 'ram', they seem fine. Well i tried now your solution and it worked with sceua algorthim but dream still have the same problem, only the last runs are saved in csv file.
On the other hand mpi on linux machine seems to work.

@thouska
Copy link
Owner

thouska commented Jun 15, 2021

Hi @baramousa
sorry for the late response, but at least I can come up with good news, I hope :)
I worked in #268 on the issue. You were right, somehow only the dream algorithm did not work proberly under the pathos multprocessing settings. This was due to to many pools that were generated during the Markov Chains. I tried to fix it, but at the end I had the feeling that this is a problem in the pathos package. So I changed the package to joblib. With that the parallelization works with dream on my computer. Could you test too? I changed the tutorial_dream_hymod.py in a way, so that it is directly using multiprocessing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants