Replies: 1 comment
-
https://github.com/deepmodeling/dpgen/blob/master/doc/run/overview-of-the-run-process.md |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi folks,
I am just a new user. I am running an unfinished DPA-2 model by DP-GEN. I am setting
"stop_batch": 500000,
. So I have already finished001
and002
models.003
was still running but was stopped and000
has not started yet.The command I am using in machine.json is
"command": "CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nnodes=1 --nproc_per_node=auto --no-python dp --pt"
. If I would like to restart the DP-GEN job, what should I do in this case?Is this
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nnodes=1 --nproc_per_node=auto --no-python dp --restart --pt
enough? Do I need to specify themodel.ckpt.pt
? How to specify themodel.ckpt.pt
precisely?I know for DeePMD-kit, I should use "dp train --restart model.ckpt". I am just not sure what should I do in DP-GEN. I am REALLY confused.
Could anyone please give me one example to show how to set a restart task in DP-GEN?
One more question, if the DP-GEN task was stopped at the stage of LAMMPS or VASP. What should I do to restart them then?
Thanks for your time.
The tree structure of DeePMD-kit is here.
Beta Was this translation helpful? Give feedback.
All reactions