-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Caravan data hosted on OpenDAP server #30
Comments
Hi Bart, thanks for the post. I do agree that the current file structure and the way how we share data is not ideal. I will even become worse in the next days, since I am about to add more than 10k additional basins to Caravan. The first thing I will do for the update is to have two separate downloads, so that not everybody needs to download the csv and nc version together but could chose to only download one of the two. One netCDF file per subdataset is also neat. I could also imagine having a single zarr/netcdf file at some point with all data combined, but that would required to recreate this file every time there is an extension. Also a zarr file hosted online could be an interesting idea. That could allow users to only query for basins and bands (and time periods) they are interested in. |
Hi Frederik, Having separate downloads for netCDF and csv would already be much better. However, separate files for each basin still isn't ideal as it adds a lot of overhead when copying them or opening them as a multi-file dataset.
Not necessarily; netCDF datasets can be split up over multiple files. I believe zarr also has some support for appending along a dimension. Of course this also depends on how/where the data is hosted. I will also be attending EGU next week so we can discuss this there. |
I saw the ARCO-ERA5 dataset from Google some time back (https://github.com/google-research/arco-era5), it could be nice if Caravan would be accessible in the same way (as a Zarr store on Google Cloud Public Datasets). |
Thanks for working on this and putting the data online! For our ewatercycle project we wanted easier access to the separate basins contained in the Caravans dataset. A data hosting service we have access to has an OpenDAP server, so we wanted to put it there.
I reorganized the data: added the attributes (units, basin properties) to the netCDF files, merged them per collections (i.e. one file per Camels), and compressed the netCDF files.
The data is available on:
https://doi.org/10.4121/ca13056c-c347-4a27-b320-930c2a4dd207
And can be accessed like this in xarray:
The text was updated successfully, but these errors were encountered: