-
Notifications
You must be signed in to change notification settings - Fork 152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Potential bug in writing DREAM posterior to disk #313
Comments
Hi @bdsegal, The design of saving all results to disk is indeed intended. This behaviour does not affect any of the internal samplings. Also, it does not affect the posterior distribution, which is by the original design of the dream algorithm only generated after convergence of the algorithm. At least that's what I think after double checking the code, based on your report. How many runs are performed after convergence can be defined by the user and is indeed important to use only this set number of runs as part of the results afterwards, if you want to analyse the posterior distribution. I hope this is understandable when following the dream Tutorial? If not, I am happy about any suggestions for improvement. |
Thank you @thouska for your answer and for pointing us to the tutorial. The tutorial is very helpful. if I'm understanding correctly, it looks like the tutorial uses the post-convergence proposals to analyze the posterior, as opposed to the post-convergence results of the accept/reject decisions (i.e. the values that are added to My impression is that using the results from the accept/reject decisions as opposed to the raw proposals is in line with Vrugt (2016), which you reference in the dream tutorial, though please let me know if I'm misunderstanding. I'm also not sure if it is necessary to discard all pre-convergence draws, since R-hat measures how well the chains are mixing up to the current iteration (potentially removing the burn-in period as is the default behavior in rstan's monitor function), though I do acknowledge that Vrugt recommends discarding the pre-convergence period as the warmup. It's great that the current output of spotpy provides flexibility in how the results can be used. |
Hi @thouska. To follow up on this issue, I'm attaching a short example to demonstrate why it's important to use the results of the accept/reject decision, as opposed to all proposals, even after convergence. This example uses a simple Metropolis algorithm, but the principle also applies to more complicated algorithms like DREAM. The takeaway is that even after convergence, the proposals aren't necessarily samples from the target distribution; only the results of the accept/reject decisions are samples from the target distribution. I'm attaching a script and plots showing the distribution of post-burn-in samples (results of the accept/reject decision) vs all post-burn-in proposals for three target distributions, all using a standard normal distribution for proposals. If the proposal and target distribution are nearly the same, then the post-burn-in proposals would be a good representation of the target distribution, but otherwise they are not. For this reason, I would recommend that by default spotpy use the results of the accept/reject decision for downstream processing as opposed to all proposals. Thanks, and please let me know if you have any questions.
|
Hi all,
Thank you for writing this useful package.
While experimenting with spotpy, my colleagues and I may have encountered a bug in the implementation of the DREAM algorithm, described below. It would be great to get your eyes on this potential bug, as it might lead some users to mistakenly use the incorrect posterior draws. Thank you in advance, and please let me know if I'm missing anything.
Potential bug: On line 353 of dream.py, within each MC iteration parameter proposals are passed to self.postprocessing(). The postprocessing() function returns the likelihood, but it also saves parameters to disk as a side effect. In particular, on line 481 of _algorithm.py, postprocessing() saves the parameters that are passed to it to disk. Per the documentation on line 45 of dream.py I believe this is the same file that a user can then import afterwards. The problem is that the proposed parameters are passed to postprocessing() regardless of whether the proposal is accepted or rejected. So while the parameter values for each MC iteration are correctly stored in the dream object (see lines 390-399 of dream.py) the results saved to disk contain all parameter proposals but not necessarily draws representing the posterior (i.e. the results stored in
self.bestpar
).Question: Is the above assessment correct, and if so, is that the intended behavior? I'm worried that some users may expect the values written to disk to be the posterior as opposed to all parameter proposals.
Thanks,
Brian
cc @para2x, @dlebauer, @infotroph
The text was updated successfully, but these errors were encountered: