-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Langevin thermostat is not pickled correctly #4657
Comments
This behavior should affect all thermostats and integrators. One of the reasons for the divergence when reloading from a checkpoint file, is that particle forces and the state of electrostatic/magnetostatic/hydrodynamic actors are invalidated. More details can be found in the December 2021 mailing list thread Question about checkpoint. While several factors contribute to the divergence, there are probably low-hanging fruits that can be identified in a coding day. As a first step, one can look at what happens when the checkpoint reloading code sets the global variable espresso/src/core/integrate.cpp Line 89 in 6d08b88
It might not work out-of-the-box if long-range actors active, so one should start with a simple system like the one in the original post. It might be helpful to monitor what happens during the ghost update, e.g. by iterating through real and ghost particles of a local cell and print their properties before the checkpoint file is created and after the simulation is reloaded to get a better understanding of which data members are out-of-date. This can be achieved with this code: for (auto const p_id : {1, 2}) {
if (auto p = ::cell_structure.get_local_particle(p_id)) {
std::cout << "id=" << p->id() << " pos=[" << p->pos() << "] force=[" << p->force() << "] ";
std::cout << "(" << (p->is_ghost() ? "ghost" : "real") << " particle)\n";
}
} To make progress with the other sources of divergence, one will need to progressively enable more features to see which ones get invalidated during the reload. Specific feature combinations can be turned on using the checkpoint tests: mpiexec -n 2 ./pypresso testsuite/python/save_checkpoint.py Test__lj
mpiexec -n 2 ./pypresso testsuite/python/test_checkpoint.py Test__lj where |
My notion is as follows:
|
I'll have a look |
The problem went a little deeper than just loading the state of the Lines 114 to 118 in a824a7d
This exposing this also to the would just over complicate things. However, I changed the documentation so that when a checkpoint is loaded the integration should be run with the |
There is also the issue that P3M actors automatically re-tune themselves using the current state of the system (at time t), instead of the original system (at time t = 0), leading to slightly different parameters. This is easy to fix, however there are other actors where tuning cannot be disabled from the script interface, like MMM1D. In addition, floating-point precision is an issue for LB and P3MGPU; rounding errors introduce small deviations that make trajectories non-deterministic. |
Fixes #4657 Description of changes: - explain which factors affect reproducibility in checkpointed simulations in the user guide - re-purpose the save/load samples to help measuring force jumps during checkpointing - make the P3M family of algorithms more deterministic by avoiding re-tuning during checkpointing - improve docstrings of the MMM1D family of algorithms
…4677) Fixes espressomd#4657 Description of changes: - explain which factors affect reproducibility in checkpointed simulations in the user guide - re-purpose the save/load samples to help measuring force jumps during checkpointing - make the P3M family of algorithms more deterministic by avoiding re-tuning during checkpointing - improve docstrings of the MMM1D family of algorithms
…4677) Fixes espressomd#4657 Description of changes: - explain which factors affect reproducibility in checkpointed simulations in the user guide - re-purpose the save/load samples to help measuring force jumps during checkpointing - make the P3M family of algorithms more deterministic by avoiding re-tuning during checkpointing - improve docstrings of the MMM1D family of algorithms
…4677) Fixes espressomd#4657 Description of changes: - explain which factors affect reproducibility in checkpointed simulations in the user guide - re-purpose the save/load samples to help measuring force jumps during checkpointing - make the P3M family of algorithms more deterministic by avoiding re-tuning during checkpointing - improve docstrings of the MMM1D family of algorithms
…4677) Fixes espressomd#4657 Description of changes: - explain which factors affect reproducibility in checkpointed simulations in the user guide - re-purpose the save/load samples to help measuring force jumps during checkpointing - make the P3M family of algorithms more deterministic by avoiding re-tuning during checkpointing - improve docstrings of the MMM1D family of algorithms
When saving the system via pickle and loading it again, the force due to the langevin thermostat changes.
This should not happen since the result of the simulation then does depend on whether it was checkpointed or not.
I tested it with both the 4.2.0 commit tag and the latest commit (d9cbffc) and an empty
myconfig.hpp
.The followind MWE demonstrates the problem:
When run two times it produces the following output:
One expects the lines with matching letters to be exactly the same, however, lines B have different forces and thus particle positions, velocities, etc.
The text was updated successfully, but these errors were encountered: