-
Notifications
You must be signed in to change notification settings - Fork 686
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gencast - num_steps_per_chunk > 1 breaks rollout.chunked_prediction_generator_multiple_runs #112
Comments
Hi! Could you clarify what is it that you are trying to achieve setting that parameter to 2, so we can advice appropriately? You should be able to get it to run by adding the wrapper in autoregressive.py wrapper in the construct_wrapped_gencast() function, but it may go out of device memory if you do that. |
I am trying to run 50-member ensembles for the 0.25 resolution, and encountered the following:
To resolve these and speed up inference, my understanding was this parameter could help. Would really appreciate if you have any other suggestions! |
Thanks for explaining. I don't think that argument will help you with those. With respect to 1. could you confirm that you are working past this commit. With respect to 2. I suspect what is happening here is that you are running out of host memory, when you generate a large number of ensembles you probably want to write the chunks to disk as they get generated rather than appending go the list (of course there will be associated time cost with writing to disk, so you may want to set it up to write it asynchronously, or write a subset of the variables only). Could you confirm what number you get when you print Thanks! |
(1) I think that should be the case, I was running this as in notebooks: (2) I see, thanks for the heads up I'll try to manage the memory more efficiently. (3) Thanks again! |
Hey! Sure, but which version of the notebook are you using? Could you confirm it was the one past this commit? Note the change in that commit to separate the line that Regarding the number of devices, that's indeed bizarre. Did you mention you were following these instructions? If so, can you confirm how you requested the TPU VM? But indeed, running 8 samples when you have 4 devices is going to double the inference speed because they will be produced sequentially in two batches of 4. In the meantime, you should be able to reproduce the inference speed by generating just 4 samples (maximising parallelism in the number of devices). Andrew |
Hey Andrew, you were right I had the Also for running the 1deg version with different ERA5 conditions, do you use a known regridder or is it a custom one that goes from 0.25 degree to 1 degree –if so would it be possible for you to share the script that generated the 1deg ERA5 datasets in |
The 1 deg data is simply the 0.25 deg data subsampling it 1 every 4 points along each of the spatial axes. We do it like this so the distribution of the data does not change and we can more easily compare models across resolutions. We follow this approach because we usually use the 1 dev models just as a baseline for the 0.25 deg models, but for other use cases of 1 deg models it may better to train on data subsampled in a different way. |
Running examples with num_steps_per_chunk = 2 results in the following error in example notebooks with the 1p0 model:
Is the step chunking working?
Thanks!
The text was updated successfully, but these errors were encountered: