-
Notifications
You must be signed in to change notification settings - Fork 686
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First training attempt #119
Comments
This version follows the paper (https://arxiv.org/pdf/2312.15796) more closely. page 27
|
Setting the lr_schedule init_value to zero and training on source-era5_date-2019-03-29_res-1.0_levels-13_steps-01.nc, I get zero change in the params. That seems odd to me. There ought to be some numerical differences from adding new data. Can someone tell me what I'm doing wrong?
|
An initial learning rate of 0 means that the first step of optimisation will multiply all gradients by 0. This will effectively result in the identity being applied to the parameters in the update. I suspect running a second iteration or using a non-zero initial rate will show a different outcome. |
Both of your suspicions are true. I was following the training schedule from figure 7 of https://arxiv.org/pdf/2212.12794 where it appears they used an initial learning rate of 0.0. Table D1 from https://arxiv.org/pdf/2312.15796 doesn't mention the value used in training Gencast. Can you share the value that was use to train Gencast? |
GenCast also uses an initial learning rate of 0. The other hyperparameters are as listed in Table D1. |
If your initial learning rate was zero, how many grad-optimize-apply_updates iterations were used to train Gencast? |
What values did you use for the learning rate schedule when fine-tuning a model (from a checkpoint) with newer data (HRES)? |
Is a response to this latest question forthcoming or should I just close this issue? |
As the GenCast paper states in Table D1, the model is trained in two stages. In the first, 2 million updates are applied, in the second 64 thousand are applied. Finetuning with newer HRES data is done by repeating Stage 2. I.e. Operational GenCast has undergone Stage 1, Stage 2 and Stage 2 again but with 0.25deg HRES data. |
I am working with the gencast_mini_demo.ipynb demo.
Source data is source-era5_date-2019-03-29_res-1.0_levels-13_steps-01.nc.
I added the optimizer steps and the loop. Inferencing does not work on CPUs (#113) so I'll ask here if those Loss and Mean values look reasonable?
The text was updated successfully, but these errors were encountered: