Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mpi and sce #202

Closed
MuellerSeb opened this issue Feb 21, 2019 · 5 comments
Closed

mpi and sce #202

MuellerSeb opened this issue Feb 21, 2019 · 5 comments

Comments

@MuellerSeb
Copy link
Contributor

Hey there,
when I run the sce algorithm in parallel, the burn-in phase works just right, but afterwards it seems, that there is something going wrong with the parallel processes. The estimated time goes up and the run numbers start to repeat and are sometimes not in an increasing order:

Starting the SCE-UA algorithm with 10000 repetitions...
Starting the SCE-UA algorithm with 10000 repetitions...
Starting the SCE-UA algorithm with 10000 repetitions...
Starting the SCE-UA algorithm with 10000 repetitions...
burn-in sampling started...
Initialize database...
* Database file 'all_hh/2019-02-21_18-14-50_stat2D_db.csv' created.
13 of 10000 (best like=-23.6045) est. time remaining: 00:24:00
30 of 10000 (best like=-23.6045) est. time remaining: 00:22:26
48 of 10000 (best like=-22.1554) est. time remaining: 00:21:14
66 of 10000 (best like=-21.6666) est. time remaining: 00:20:35
84 of 10000 (best like=-21.6666) est. time remaining: 00:20:06
103 of 10000 (best like=-18.4106) est. time remaining: 00:19:48
115 of 10000 (best like=-18.4106) est. time remaining: 00:20:46
127 of 10000 (best like=-18.4106) est. time remaining: 00:21:31
141 of 10000 (best like=-18.4106) est. time remaining: 00:21:50
154 of 10000 (best like=-18.4106) est. time remaining: 00:22:12
173 of 10000 (best like=-18.4106) est. time remaining: 00:21:49
burn-in sampling completed...
ComplexEvo started...
ComplexEvo loop #1 in progress...
183 of 10000 (best like=-18.4106) est. time remaining: 00:22:29
183 of 10000 (best like=-18.4106) est. time remaining: 00:22:35
183 of 10000 (best like=-18.4106) est. time remaining: 00:22:36
187 of 10000 (best like=-18.4106) est. time remaining: 00:24:16
187 of 10000 (best like=-18.4106) est. time remaining: 00:24:26
187 of 10000 (best like=-18.4106) est. time remaining: 00:24:27
189 of 10000 (best like=-18.4106) est. time remaining: 00:26:31
189 of 10000 (best like=-18.4106) est. time remaining: 00:26:34
189 of 10000 (best like=-18.4106) est. time remaining: 00:26:36
181 of 10000 (best like=-18.4106) est. time remaining: 00:29:21
192 of 10000 (best like=-18.4106) est. time remaining: 00:27:54
192 of 10000 (best like=-18.4106) est. time remaining: 00:29:29
208 of 10000 (best like=-18.4106) est. time remaining: 00:27:18
202 of 10000 (best like=-18.4106) est. time remaining: 00:28:22
195 of 10000 (best like=-18.4106) est. time remaining: 00:30:49
211 of 10000 (best like=-18.4106) est. time remaining: 00:28:30
206 of 10000 (best like=-18.4106) est. time remaining: 00:29:33
215 of 10000 (best like=-18.4106) est. time remaining: 00:29:29
200 of 10000 (best like=-18.4106) est. time remaining: 00:31:58
...

Do you may have a guess, what's going wrong?
Thanks in advance!

@thouska
Copy link
Owner

thouska commented Feb 22, 2019

Hi Sebastian,
thanks for your message. This behavior is indeed a bit strange. However, I am pretty sure it is “just” a parallel printing issue. Still, it needs to be solved. As I recently changed some minor things in the parallelization of sce-ua, I need to make sure, that we are using the same version. I just uploaded a new version on pypi (1.4.5). Would you be so kind and test this one again? If the bug is still persisting in this version, I will look closer into this.

@MuellerSeb
Copy link
Contributor Author

I just updated spotpy, but the behavior is the same. Also the estimated time goes up to the same estimated time with sequential optimization.
Again: Burn-in phase works just right. From ComplexEvo loop #1 on repetition counter becomes unsorted and the estimated time goes up to the sequential time estimate. I don't know, what I could have done wrong on my side.

thouska added a commit that referenced this issue Feb 22, 2019
@thouska
Copy link
Owner

thouska commented Feb 22, 2019

Perfect, thanks for testing this again. I found some lines in the code of sceua, where the slaves during parallel computing and complex evolution had access to the status of algorithm.py. As the slaves can have different speed, this might have mixed up the tracked repetitions and time shown in the printing message. I think this should be fixed now.

@p-lauer
Copy link
Contributor

p-lauer commented Mar 6, 2019

I would assume that the unsorted repetition counter is based on how mpi is implemented. The like values are printed in the mpi-loop, so if a process with a higher repetition count finishes before one with a lower repetition count, it is first printed to the screen.

@MuellerSeb
Copy link
Contributor Author

I think this is clarified. But a new issue came up after version 1.5.0:
#226

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants