-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spherical padding and faster tests #45
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making these improvements!
It would probably be good to add a single workflow runner with micromamba, so that the comparison with ESMF will run on the CI. I could do this in a separate PR.
The only issue is that it can induce a very large memory penalty if the input data isn't chunked, because the weights arrays can become huge if they are broadcast with all the other input dimensions. I added a note to the docstring about this but perhaps we want a more formal warning, or to explicitly add some chunking in certain cases?
This probably is quite unintuitive to users, was the reason for removing this tracking? mostly just simplifying the code? If it does allow for a much lower memory footprint (without users needing to improve their chunking) it could be interesting to leave the non-grid-dims tracking in.
Made some modifications and general cleanup of the existing tests. Mainly making the datasets dask-backed so the regridding functions are lazy for all the attribute checks. Also got rid of some unneeded copies, compute calls, etc. Tests run in about 17 seconds now.
Awesome!
It does simplify the code. But primarily, getting rid of this aggregation of the valid frac over non grid dims allows us to truly track isolated NaN fractions, so that you can have NaN in just a single time slice for example and get the correct output. I think what I have now is the most sensible baseline option, but I have some other ideas for improving conservative performance so I'll follow up in a different branch to avoid adding too much to this PR. |
I made a few small changes/refactors which I guess cancelled out your previous review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the improvements! 😄
Spherical padding
This makes an initial attempt at building in some automatic logic for better handling the boundaries of a spherical domain with appropriate padding, as well as shifting the longitude source grid to match the target as needed.
Faster tests
Made some modifications and general cleanup of the existing tests. Mainly making the datasets dask-backed so the regridding functions are lazy for all the attribute checks. Also got rid of some unneeded copies, compute calls, etc. Tests run in about 17 seconds now.
With the addition of the spherical padding logic I was also able to remove some aspects of the tests where we avoided checking for good matches at the boundaries.
Conservative NaN tracking
I did some benchmarking and the version of NaN tracking where we track independently across non-grid dimensions as well doesn't actually have much of a run-time penalty. Therefore I went ahead and made this small change. The only issue is that it can induce a very large memory penalty if the input data isn't chunked, because the weights arrays can become huge if they are broadcast with all the other input dimensions. I added a note to the docstring about this but perhaps we want a more formal warning, or to explicitly add some chunking in certain cases?
TODO / open questions