Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PwBaseWorkChain: improve restart from parent_folder #722

Merged
merged 12 commits into from
Sep 21, 2021

Conversation

mbercx
Copy link
Member

@mbercx mbercx commented Aug 30, 2021

Fixes #721

Remove some of the logic in the PwBaseWorkChain regarding restarting
from a previous calculation using a RemoteData provided to the
pw.parent_folder input.

The current logic expected the RemoteData to have a PwCalculation
creator, which is not always the case. Moreover, the restart_mode
chosen by the user is overriden, which means that e.g. restarting from
only the charge density with startingpot is not possible.

@mbercx
Copy link
Member Author

mbercx commented Aug 30, 2021

@sphuber this was quickly field tested and seems to work fine. Not sure if there is an important reason why the parent_folder input is converted into the restart_calc stored in the ctx that I'm missing?

It's def possible that the user doesn't provide the correct parameters to actually make use of the parent_folder though. Maybe we should add some validation here?

(Also possible tests will fail, will fix when we agree)

@mbercx
Copy link
Member Author

mbercx commented Aug 31, 2021

@sphuber thanks for the feedback! I've willingly introduced some scope creep in this PR to also deal with some of the issues in #691 (not sure if all though, and there are also some comments there that should be raised on the QE GitLab, so I wouldn't say that issue is fixed by this PR at this time.).

I've mainly done two things:

  1. Introduced a top-level validator to the PwCalculation to check if any correct restart parameters are set when the parent_folder input is provided.
  2. Removed the restart_calc logic in favor of a set_restart_type method (thanks to @ramirezfranciscof for the suggestion of making this a method) that takes a RestartType enum and the previous calculation to properly set the input parameters and parent_folder based on the chosen restart method for each process handler. We've used the RestartType enum to indicate the different "restart modes" in order to be consistent with the other Types we have defined and to avoid any confusion with the restart_mode input tag of QE.

Finally, some more notes/questions regarding the choice of the restart type:

  • sanity_check_insufficient_bands: Updated the process handler to restart from the charge density after increasing the number of bands. I see no reason not to restart from the charge density here? It should already be quite close to the correct result.
  • handle_out_of_walltime: I'm not sure I understand why we cannot do a full restart or at least restart from the charge density in case the walltime has run out and the structure has changed. How would doing a full restart differ from just having more walltime here?
  • handle_relax_recoverable_ionic_convergence_error: Here I can understand that perhaps a restart from scratch might help with kicking the calculation out of the ionic convergence issue?
  • handle_relax_recoverable_electronic_convergence_error and handle_electronic_convergence_not_achieved: Not sure how to best restart here. Is there a reason not to e.g. restart from the charge density for the ionic case and do a full restart instead in case it's just an SCF?

Also pinging @ramirezfranciscof and @qiaojunfeng for comments and field testing! :)

(will fix tests once the review has processed past the design stage)

@bastonero
Copy link
Collaborator

I think indeed having a restart_type would be nicer, which by default could be restart_mode='restart'.
This latter indeed was designed explicitly for restarting calculation which would have taken too much time. So, I think that its proper use is just in the case of 'out of wall time'. In other cases, having this restart mode can be very dramatic. In fact if you want to have an initial density, and just restart from that, this "full" restart will also take the structure from the output folder, which is NOT good.
Depending on the problem, one should use the "full" restart or the startingport='file'+from_scratch. In fact, sometimes you already have a well-converged ground-state and you may want to use it, e.g. to speed up calculation with atom displacements for phonons. In this example, if you do a "full" restart, you end up on also reading the structure from the output folder, thus discarding the displaced input structure.
As a consequence, some handler will benefit (or even might be corrected!) from the type of restart. Regarding the out of wall time, one indeed would like to remove the starting*='file' and restart fully.
Regarding the handle_electronic_convergence_not_achieved: I also found myself changing slightly the handler. Not sure if one restarts from the previous density how it will affect the scf loop. For sure restarting from scratch is the safest one.

Copy link
Contributor

@sphuber sphuber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mbercx . I like the approach a lot, there are just a few minor issues still I think

aiida_quantumespresso/workflows/pw/base.py Show resolved Hide resolved
aiida_quantumespresso/workflows/pw/base.py Show resolved Hide resolved
aiida_quantumespresso/calculations/pw.py Outdated Show resolved Hide resolved
@mbercx mbercx force-pushed the fix/721/base-restart branch from b516b16 to adf3365 Compare September 10, 2021 10:58
@mbercx mbercx requested a review from sphuber September 10, 2021 11:03
@mbercx
Copy link
Member Author

mbercx commented Sep 10, 2021

@sphuber I think this is ready for another round of review. To summarize the logic of these changes a bit again:

  1. Instead of setting the input tags for the restart when a parent_folder is provided, we trust the user to set the restart settings she/he desires.
  2. A validator is added instead, that checks if the inputs are sensible based on the calculation. If the inputs are simply incorrect (e.g. no parent_folder for an "nscf" calculation), this validator will raise. If a parent_folder is provided but no restart inputs are set, a warning is raised.
  3. We added a set_restart_type method to the work chain, which sets the correct input tags based on the provided RestartType and parent_folder (if not restarting from scratch). This is exclusively used in the process_handlers of the PwBaseWorkChain.
  4. The following restart types are used for each process handler:
  • sanity_check_insufficient_bands: FROM_CHARGE_DENSITY
  • handle_unrecoverable_failure: no restart
  • handle_known_unrecoverable_failure: no restart
  • handle_out_of_walltime: FULL if the structure hasn't changed, FROM_SCRATCH otherwise
  • handle_vcrelax_converged_except_final_scf: no restart
  • handle_relax_recoverable_ionic_convergence_error: FROM_SCRATCH
  • handle_relax_recoverable_electronic_convergence_error: FROM_SCRATCH
  • handle_electronic_convergence_not_achieved: FULL
  • handle_electronic_convergence_warning: no restart

I would still consider making the following changes to [4]:

  • handle_out_of_walltime: restart FROM_CHARGE_DENSITY in case structure has changed.
  • handle_relax_recoverable_ionic_convergence_error: restart FROM_CHARGE_DENSITY

@mbercx
Copy link
Member Author

mbercx commented Sep 20, 2021

@qiaojunfeng or @ramirezfranciscof maybe you'll have a chance to review this some time this week? If we all think the logic is solid, I can fix the tests and we can merge this for LUMI.

Copy link
Contributor

@sphuber sphuber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mbercx . Concept looks good to me, just some minor changes and then the validator contains quite a few problems, so would really like to see some unit tests there. Also, we are essentially coding the default values from QE in there. This may become a problem when they change. Not sure how likely that is and also don't really have an alternative solution for now, but just something we should be aware of.

aiida_quantumespresso/calculations/pw.py Outdated Show resolved Hide resolved
aiida_quantumespresso/common/types.py Outdated Show resolved Hide resolved
aiida_quantumespresso/calculations/pw.py Outdated Show resolved Hide resolved
aiida_quantumespresso/workflows/pw/base.py Show resolved Hide resolved
aiida_quantumespresso/workflows/pw/base.py Outdated Show resolved Hide resolved
aiida_quantumespresso/workflows/pw/base.py Show resolved Hide resolved
aiida_quantumespresso/calculations/pw.py Outdated Show resolved Hide resolved
aiida_quantumespresso/calculations/pw.py Outdated Show resolved Hide resolved
aiida_quantumespresso/calculations/pw.py Outdated Show resolved Hide resolved
@mbercx mbercx force-pushed the fix/721/base-restart branch from adf3365 to 600681c Compare September 21, 2021 01:23
@mbercx mbercx requested a review from sphuber September 21, 2021 02:39
@mbercx mbercx force-pushed the fix/721/base-restart branch from d57ff48 to 335da8f Compare September 21, 2021 02:45
@mbercx
Copy link
Member Author

mbercx commented Sep 21, 2021

Much obliged for the kind review, @sphuber! Nits have been picked, copy 🍝 has been untangled, and tests have been fixed and added for the validation.

Extra notes on changes:

  • also dealing with the following deprecation:
DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3, and in 3.9 it will stop working
  • Removed the validate_parameters step in the outline and moved the code into the setup step`, since it seems to have a better home there, i.e. nothing seems to be validated?

Final thing to do is to change the logic of the error handlers when the structure changes, on which I think we agree. Just a confirmation that you agree we should restart from the charge density for both:

  • handle_out_of_walltime: restart FROM_CHARGE_DENSITY in case structure has changed.
  • handle_relax_recoverable_ionic_convergence_error: restart FROM_CHARGE_DENSITY

And I'll update the logic/tests.

@mbercx
Copy link
Member Author

mbercx commented Sep 21, 2021

@ramirezfranciscof @sphuber thanks once more for your reviewing efforts! The only question that now remains is the logic regarding starting from a changed structure. However, this was once implemented by someone, perhaps for a good reason, and since we not sure that starting from the charge density is a good approach, perhaps we should declare this logic change "beyond the scope of this PR" and open an issue for revisiting all the restart types?

Copy link
Member

@ramirezfranciscof ramirezfranciscof left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looks good to me, I mainly checked the logics related to the restarting part, I took a look at the testing for the process handlers but that part I understand a bit less, but I assume @sphuber 's review of it should suffice.

tests/calculations/test_pw.py Outdated Show resolved Hide resolved
Copy link
Contributor

@sphuber sphuber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to bother again, but since code got added, I have a few additional small comments

tests/calculations/test_pw.py Outdated Show resolved Hide resolved
tests/calculations/test_pw.py Outdated Show resolved Hide resolved
tests/conftest.py Outdated Show resolved Hide resolved
tests/workflows/pw/test_base.py Outdated Show resolved Hide resolved
aiida_quantumespresso/calculations/pw.py Outdated Show resolved Hide resolved
@mbercx mbercx requested a review from sphuber September 21, 2021 11:10
qiaojunfeng added a commit to qiaojunfeng/aiida-wannier90-workflows that referenced this pull request Sep 21, 2021
To restart nscf, although the current input works, but in theory
the `startingpot = file` should be used, instead of
`restart_mode = restart`. See
aiidateam/aiida-quantumespresso#722
sphuber
sphuber previously approved these changes Sep 21, 2021
Copy link
Contributor

@sphuber sphuber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bombs away!

Remove some of the logic in the `PwBaseWorkChain` regarding restarting
from a previous calculation using a `RemoteData` provided to the
`pw.parent_folder` input.

The current logic expected the `RemoteData` to have a `PwCalculation`
creator, which is not always the case. Moreover, the `restart_mode`
chosen by the user is overriden, which means that e.g. restarting from
_only_ the charge density with `startingpot` is not possible.
@mbercx
Copy link
Member Author

mbercx commented Sep 21, 2021

Darn, rebased 5 seconds too late! 😭

@sphuber
Copy link
Contributor

sphuber commented Sep 21, 2021

Darn, rebased 5 seconds too late! sob

you mean too early ^^

@sphuber
Copy link
Contributor

sphuber commented Sep 21, 2021

Uhm, just merged another PR and forgot this was waiting. We can just squash merge this though with running tests again, I just updated the README.md.

@mbercx
Copy link
Member Author

mbercx commented Sep 21, 2021

Uhm, just merged another PR and forgot this was waiting. We can just squash merge this though with running tests again, I just updated the README.md.

Merging with admin powers, how rude! ;D I think the badges are still messed up though. If you want to check the result without merging, you can always go to the branch of your PR, right?

@sphuber
Copy link
Contributor

sphuber commented Sep 21, 2021

If you want to check the result without merging, you can always go to the branch of your PR, right?

I did and it looked fine. But I realize now that I forgot to add empty rows for the Python version of older releases of aiida-quantumespresso which I don't think specified explicit python requirements. I was anyway going to add another change to add shields for QE as well.

@sphuber
Copy link
Contributor

sphuber commented Sep 21, 2021

@mbercx look at this deliciousness

@mbercx mbercx requested a review from sphuber September 21, 2021 14:03
@mbercx
Copy link
Member Author

mbercx commented Sep 21, 2021

As I am too shy to merge with admin powers, I once again seek your approval @sphuber. 🙃

FULL = 'full'
FROM_SCRATCH = 'from_scratch'
FROM_CHARGE_DENSITY = 'from_charge_density'
FROM_WAVE_FUNCTIONS = 'from_wave_functions'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi guys, I wanted just to point out that it could be useful to have also FROM_FILES (or something like that), so that one can restart from the density and wave functions. I know that 'full' is meant to do so, but it will also read the atomic positions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My first inclination would be to say that restarting from the wave functions, it should be pretty fast for QE to recalculate the charge density. But if I remember correctly, QE doesn't actually do this, instead just plugging in the wave functions but for the potential calculated from the atomic charge density, correct?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, should be like this. Two case scenario would be:

  • electric enthalpy routine: it uses the wfcs to build the polarization operator (essentially you start with a slightly different hamiltonian)
  • maybe in some restart when vc-relaxing?

But I do agree at the end it is not such a difference. I was just wondering, since the new implementation is so cool (great job!!! :D ), it would be quite easy to just add that one more, just in case one needs it.
Or do you think that the "experienced" user can still always tweak the inputs if that is the case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification, @bastonero! Note that the restart types are only used for setting the restarts after an error has been handled using the set_restart_type method. So if a user wants to restart from a previous calculation, the correct inputs have to be provided, not the restart type.

Happy to implement another restart type (not sure about the name, FROM_FILES seems a bit too general, maybe FROM_CHARGE_AND_WFC?). Do you think there is already a current error handler where this restart type would be used though?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, I see, thanks! For the moment, actually, I still don't have an example of handler where it would be useful. FROM_CHARGE_AND_WFC sounds good anyway.

bastonero pushed a commit to bastonero/aiida-quantumespresso that referenced this pull request Dec 20, 2021
Improve the code regarding restarting in the `PwBaseWorkChain` in several ways:

* Remove some of the logic in the `PwBaseWorkChain` regarding restarting
from a previous calculation using a `RemoteData` provided to the
`pw.parent_folder` input. The current logic expected the `RemoteData` to
have a `PwCalculation` creator, which is not always the case. Moreover, the
`restart_mode` chosen by the user was overriden, which means that e.g.
restarting from _only_ the charge density with `startingpot` was not possible.
* For users who want to restart in the first `PwCalculation`, the inputs are
now validated to make sure that they are sensible. In case the calculation will
still run correctly but the inputs are not consistent, a warning is raised
during the validation. In case the inputs lead to failed calculation, an error
is raised.
* For restarts made by the `PwBaseWorkChain`, the restart logic is gathered
inside the `set_restart_type` method. A new `Enum`, `RestartType` is added for
the different modes of restarting. Each of the error handlers is updated to
use this new method.
* Only for the `sanity_check_insufficient_bands` error handler, the restart
method is changed to restart from the charge density.

Finally, the `validate_parameters` step in the outline of the `PwBaseWorkChain`
is merged into the `setup` step, since no more validation is performed and the
other code in this step is more at home in the `setup` step.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PwBaseWorkChain: Allow restarts from RemoteData that do not have a creator
5 participants