-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Option to save intermediate simulation states not working #1
Comments
I have created a new branch with my first attempt to restore this functionality. I did a few tests and things seem to be working properly, and it is possible to restart the simulation from the saved states reproducibly. I think the documentation of the new options I added can be a little bit confusing, so I might need to rephrase things. I will also wait for some opinions. |
Hi @castroinsua
Does this save all or not?:
Have you benchmarked the save file creation in comparison to the overall runtime? First lesson of R and performance is that those two don't go together in the first place ;) large save states usually also imply large numbers of species, so running a simulation step will take a lot of time anyway. Yes sticking to one parameter leaves a blind spot for saving every few steps and only keeping the last step. Experience tells me that this is okay as there's usually two options when running gen3sis with very few simulations in between: either the simulation runs very fast, and saving will be fast as well. Or it will run painfully slow and you'd want to save every timestep anyway.
cheers 👋 |
I see. However, the saved state already contains the config object, so I am not sure why it should matter if the simulation was started using a config file or an object. I will have a look at the parts of the code you pointed out to make sure that I understand what you mean. Should not the The reason I decided to add an option to only keep the last state is because I am running some simulations with thousands of time steps. Even when saving each 200 steps or so, I do not want to keep all of them, as each step takes hundreds of MB and I would need dozens of GB for each simulation when I just want one checkpoint from time to time. And creating a save each time step and deleting the older files is not really an alternative here, as it takes a long time to save. I agree that the argument names make everything very unclear, and your examples are exactly what I was thinking about when I wrote that things are still confusing. I guess it is better to do as you say and keep only the original parameter, but then I will have to do some manual housekeeping and delete the files that I do not need in my particular case.
Yes, that is why I decided to save only each few hundred time steps. I think this does not have anything to do with R but with the speed at which you can write to disk. How slow it is ultimately depends on the simulation (that is, how large the state file is) and in any case the user is the one that has to decide if creating the saves is worth the time. In my case, it is large not because there is a large number of species in the simulation but because of the landscape. As I said, I will take a look at the code to try to understand how the package handles config files and objects, because there was probably a very good reason why the code to save the state was commented out, but I still do not really know why. Thank you very much for taking your time to review and discuss this. |
The config parameter defines the "final" output directory; to restart a simulation you need to be able to find and access the existing simulation directory. The output directory in the run_simulation call is the parent directory, not the one for each individual simulation. The goal with the save and restart was to restart simulations on clusters with runtime limits. "Just" rerun the batch file with no(!) changes, and the simulations continue.
Please don't, you're inviting a world of hurt onto yourself. The restart functionality is intended for restarting simulations in case of a crash, nothing else. Trying to have checkpoints etc is pointless; You shouldn't change the config for any given running simulation at all. If you want intermediate results for downstream analyses use the observer function and save what you need there. What use case are you trying to cover by keeping the internal save states around?
a "very good" reason is relative, tbh it wasn't: We were working on refactoring a bunch of things, including directory handling. Handling directories changed, the config object instead of files was introduced, and we didn't fully match those changes in the restart function (see your error with output_val), plus the code snipped below. And then we just didn't follow up on it due to time constraints. If you look into R/config_handling.R lines 62-66 you see this:
That is a very questionable piece of code; running a default config is probably the worst idea on what to do, followed closely by just naming the output directory something random in case of a config object. That means if you try to restart a existing simulation from a config object it will not pick the existing/correct directory and start the simulation from scratch somewhere else. A possible fix might be to add a variable into the internal state with the simulation name; either the config name, or a unique name for a config object. Or again, drop the config object from run_simulation. Both options are straight forward enough, but take a little of time and care to implement correctly. cheers |
I have reverted the change that added a new parameter and simply implemented the saving functionality as it was originally documented in Thank you for having the patience to explain all of this. However, I have not still addressed the most important part, which is taking care of the possibility of using config objects instead of a config file. I would gladly help with this, but since I have never passed config objects to I think I now understand the problem you described, and it seems that it stems from how the config_name <- file.path("default_config", paste0(format(Sys.time(), "%Y%m%d%H%m"), "-", formatC(sample(1:9999,1), digits=4, flag="0"))) The format string seems to be wrong, as the second It seems that there is some work to do in So I guess that one specific question that I have right now is: how would you normally create a config object from scratch, instead of generating it from a config file? Sorry for not being more succinct, at the moment I am still trying to figure out things. Thank you again for your help. |
I do not have access to a cluster at the moment, and these are tests anyway, so I am running them locally. I have to turn the computer off when I leave, so in practice it is like a crash. I am not changing the config at all. So what I need is to restart the simulation from the last saved state the next day. However, the But we do not have to worry about this any more, since I have already removed the problematic additional parameter. I will take care of this particular case by myself, without complicating the code of the package. I thought that it was worth considering, that is why I wanted to discuss this with the people maintaining it. |
Answers in reverse order. |
In R/plotting_functions.R lines 314+ conditional_plot we explicitly set the plots directory again according to the rest of the saving functions. I think we/you could also remove the output_plots variable completely 🤷
You're welcome, I'm quite happy to see you using gen3sis and putting this effort into it to :)
R/config_handle.R
Yes.
Yes, we just needed a unique string. Technically correct, the best correct, on both counts for %M and width. Though again, yes, we don't really care what ends up in that string as long as it's unique 🙈
I think the elegant way to proceed would be to set a config_name variable in the config object. When parsing a config file you can take the file name, which corresponds to the current behaviour. When using a config object the user can set the name to whatever, and the create_directories function can just retrieve the name from the object. That would not fully solve the need to parse the config file name twice, once in create directories, and once in parse_config, but it would result in a consistent state between the config file parsing and the config object. (Or again, remove the config object option) The implementation would probably follow something like this:
See above, R/config_handle.R
You're welcome and no worries, the code base is not the most straight forward one 🙈 👀 |
If you want a more consistent method of assigning a unique name, you could maybe use a hash of the config object. Then, if you pass the same config object you should always get the same hash. However, you would probably need functions from some other package, so in the end it is introducing another dependency. That could be a possible solution if you want to do something similar, but I think there are better solutions here (such as what we were discussing). Which approach would you prefer: setting a |
🤷 Tbh I don't know either how much the config object is used, if at all... I'm also not sure if it's in the vignettes. In general I'm not too much of a fan of deprecating functions out of development convenience though... The more I think about it, the more I prefer the config_name setting for that reason alone. Tbh I don't like the hash naming too much either as it doesn't give any hint on where the corresponding output folder comes from. Admittedly it's rather weak argument when currently only the timestamp gives you any hint 🙈 so back to setting an explicit name. How motivated (and interested) are you in working on this issue? 😇 |
I believe that I can at least give it a try, we will see how it goes. I will get back to you when I have something written, or if I have any questions. I think it would be nice to have this functionality available again. |
I have attempted to implement what you suggested. The few tests that I made with config objects seem to work as intended. I was not completely sure where to add the new variable. By looking at Another thing I would like to address is defining the default input and output directories in Somewhat related to this, I am tempted to move the code that creates and verifies a config object before calling |
Related to my previous comment, I just realised that there is an open issue in project-gen3sis#10. |
Sorry for the late answer, it sat in the reply box as a draft 🤦♂️ sending it off usually helps
Thank you for the feedback. Those are all valuable reports. Feel free to also report them as bugs on the main repo, specifically the indexed element access. In general adding new variables at the end of the existing stuff is a good idea for exactly this reason
I'm open to remove all those substitutions and mandate providing paths 👍 So the check would be for the config file if the paths are provided, and for the config object if the paths are set in the object itself.
Possibly yes, we do copy the config file to the output directory. Though it's trivial to keep track of the file name and do a conditional copy there with no need to keep the path in the config object itself
|
About the problems with
I will ask Oskar for an opinion about this when I have the opportunity (possibly this week). |
Since the modifications I made change the previous behaviour of the package, and we might also change the interface so that the input and output directories have no defaults, I think that for now I will try to incorporate these only to the new version of gen3sis (the one that is being developed in this repository). Apparently, the old version and this one will be developed separately, and changes to the old one should be mostly bug fixes. I still have to get the new version running to be able to test things, but that is a different matter. However, I do not think this is set in stone, and maybe the functionality to restart the simulation will be eventually added to the old version as well. |
The
save_state
parameter inrun_simulation()
is currently not working as intended. Calling this function with, for example,save_state = "last"
will throw the following error:The reason for this is that
val$config$directories$output_val
is never initialised. I see that there is some code commented out inprepare_directories()
, and my guess is that this code was removed either to avoid creating an empty directory that is never used or because some of the documented options for thesave_state
parameter (namely, passing a numeric vector of specific time steps to save) are not implemented yet.Saving the state of the simulation can take a lot of time if the resulting
.rds
files are large (and they usually are, I think). This is more of a feature request, but when running simulations with thousands of time steps I would like to be able to save the state at specified times (for example, each 200 steps) to restart the simulation from there, but only keeping the latest state to save disk space.Currently, the available options to
save_state
are:NA
to never save,"all"
to save all time steps (excessive in most cases, since writing the state files can take a lot of time),"last"
which does the same but removing any previous saves, or a numeric vector to save at specified time steps (not implemented yet), keeping them all. I propose using two different options inrun_simulation()
instead:save_state
: Which time steps to save,NULL
by default. Could be a numeric vector or"all"
. If not specified, do not create any saves.last_state_only
: A logical,TRUE
by default. Whether to keep all saved states or only the last one.Adding another option might not be the best approach, but I think it is the right thing to do if this feature is to be implemented.
The text was updated successfully, but these errors were encountered: