-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
correct_units fails on CMIP6 historical tos data #322
Comments
Hi @tessjacobson. Thanks for using xMIP and reporting this issue. Apologies for the long wait on this. I just ran this on the LEAP-Pangeo hub () and got this:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
File /srv/conda/envs/notebook/lib/python3.10/site-packages/intake_esm/source.py:244, in ESMDataSource._open_dataset(self)
223 datasets = [
224 _open_dataset(
225 record[self.path_column_name],
(...)
241 for _, record in self.df.iterrows()
242 ]
--> 244 datasets = dask.compute(*datasets)
245 if len(datasets) == 1:
File /srv/conda/envs/notebook/lib/python3.10/site-packages/dask/base.py:666, in compute(traverse, optimize_graph, scheduler, get, *args, **kwargs) File /srv/conda/envs/notebook/lib/python3.10/site-packages/dask/threaded.py:89, in get(dsk, keys, cache, num_workers, pool, **kwargs) File /srv/conda/envs/notebook/lib/python3.10/site-packages/dask/local.py:511, in get_async(submit, num_workers, dsk, result, cache, get_id, rerun_exceptions_locally, pack_exception, raise_exception, callbacks, dumps, loads, chunksize, **kwargs) File /srv/conda/envs/notebook/lib/python3.10/site-packages/dask/local.py:319, in reraise(exc, tb) File /srv/conda/envs/notebook/lib/python3.10/site-packages/dask/local.py:224, in execute_task(key, task_info, dumps, loads, get_id, pack_exception) File /srv/conda/envs/notebook/lib/python3.10/site-packages/dask/core.py:121, in _execute_task(arg, cache, dsk) File /srv/conda/envs/notebook/lib/python3.10/site-packages/dask/utils.py:73, in apply(func, args, kwargs) File /srv/conda/envs/notebook/lib/python3.10/site-packages/intake_esm/source.py:79, in _open_dataset(urlpath, varname, xarray_open_kwargs, preprocess, requested_variables, additional_attrs, expand_dims, data_format, storage_options) File /srv/conda/envs/notebook/lib/python3.10/site-packages/xmip/preprocessing.py:458, in combined_preprocessing(ds) File /srv/conda/envs/notebook/lib/python3.10/site-packages/xmip/preprocessing.py:219, in correct_units(ds) File /srv/conda/envs/notebook/lib/python3.10/site-packages/pint_xarray/accessors.py:1085, in PintDatasetAccessor.quantify(self, units, unit_registry, **unit_kwargs) File /srv/conda/envs/notebook/lib/python3.10/site-packages/pint_xarray/accessors.py:138, in _decide_units(units, registry, unit_attribute) File /srv/conda/envs/notebook/lib/python3.10/site-packages/pint/facets/plain/registry.py:1127, in GenericPlainRegistry.parse_units(self, input_string, as_delta, case_sensitive) File /srv/conda/envs/notebook/lib/python3.10/site-packages/pint/facets/nonmultiplicative/registry.py:70, in GenericNonMultiplicativeRegistry._parse_units(self, input_string, as_delta, case_sensitive) File /srv/conda/envs/notebook/lib/python3.10/site-packages/pint/facets/plain/registry.py:1153, in GenericPlainRegistry._parse_units(self, input_string, as_delta, case_sensitive) File /srv/conda/envs/notebook/lib/python3.10/site-packages/pint/util.py:764, in ParserHelper.from_string(cls, input_string, non_int_type) File /srv/conda/envs/notebook/lib/python3.10/site-packages/pint/pint_eval.py:147, in EvalTreeNode.evaluate(self, define_op, bin_op, un_op) File /srv/conda/envs/notebook/lib/python3.10/site-packages/pint/pint_eval.py:146, in EvalTreeNode.evaluate(self, define_op, bin_op, un_op) TypeError: unsupported operand type(s) for -: 'ParserHelper' and 'int' The above exception was the direct cause of the following exception: ESMDataSourceError Traceback (most recent call last) File /srv/conda/envs/notebook/lib/python3.10/site-packages/pydantic/deprecated/decorator.py:55, in validate_arguments..validate..wrapper_function(*args, **kwargs) File /srv/conda/envs/notebook/lib/python3.10/site-packages/pydantic/deprecated/decorator.py:150, in ValidatedFunction.call(self, *args, **kwargs) File /srv/conda/envs/notebook/lib/python3.10/site-packages/pydantic/deprecated/decorator.py:222, in ValidatedFunction.execute(self, m) File /srv/conda/envs/notebook/lib/python3.10/site-packages/intake_esm/core.py:686, in esm_datastore.to_dataset_dict(self, xarray_open_kwargs, xarray_combine_by_coords_kwargs, preprocess, storage_options, progressbar, aggregate, skip_on_error, **kwargs) File /srv/conda/envs/notebook/lib/python3.10/site-packages/intake_esm/core.py:682, in esm_datastore.to_dataset_dict(self, xarray_open_kwargs, xarray_combine_by_coords_kwargs, preprocess, storage_options, progressbar, aggregate, skip_on_error, **kwargs) File /srv/conda/envs/notebook/lib/python3.10/concurrent/futures/_base.py:451, in Future.result(self, timeout) File /srv/conda/envs/notebook/lib/python3.10/concurrent/futures/_base.py:403, in Future.__get_result(self) File /srv/conda/envs/notebook/lib/python3.10/concurrent/futures/thread.py:58, in _WorkItem.run(self) File /srv/conda/envs/notebook/lib/python3.10/site-packages/intake_esm/core.py:824, in _load_source(key, source) File /srv/conda/envs/notebook/lib/python3.10/site-packages/intake_esm/source.py:272, in ESMDataSource.to_dask(self) File /srv/conda/envs/notebook/lib/python3.10/site-packages/intake/source/base.py:283, in DataSourceBase._load_metadata(self) File /srv/conda/envs/notebook/lib/python3.10/site-packages/intake_esm/source.py:208, in ESMDataSource._get_schema(self) File /srv/conda/envs/notebook/lib/python3.10/site-packages/intake_esm/source.py:264, in ESMDataSource._open_dataset(self) ESMDataSourceError: Failed to load dataset with key='CMIP.HAMMOZ-Consortium.MPI-ESM-1-2-HAM.historical.Omon.gn' I confirmed that this is introduced by ds = xr.open_dataset('gs://cmip6/CMIP6/CMIP/HAMMOZ-Consortium/MPI-ESM-1-2-HAM/historical/r3i1p1f1/Omon/tos/gn/v20191218/', engine='zarr', chunks={}, **z_kwargs)
combined_preprocessing(ds) The error message is quite hard to read, but I think I have a solution: ds_fixed = ds.copy()
for var in ds_fixed.variables:
unit = ds_fixed[var].attrs.get('units', None)
if isinstance(unit, int):
del ds_fixed[var].attrs['unit']
print(f"{var} {unit}")
combined_preprocessing(ds_fixed) this works! It turns out that the original dataset had a unit of This should be easily fixable. Ill consult with the pint crowd @TomNicholas @keewis what the best way of attack is here? Is this something that I should/could change in the unit registry or do you think it is better to just delete all integer unit attributes like above? |
I think dimensionless numbers in pint are just supposed to be represented with a unit of |
yeah, you can put Note that |
That seems like a nice alternative to ripping it out! Thanks |
I just added #331, but that did not fix the issue above. I think I misdiagnosed this. It seems to be the from xmip.preprocessing import combined_preprocessing
import intake
import dask
url = "https://storage.googleapis.com/cmip6/pangeo-cmip6.json"
col = intake.open_esm_datastore(url)
query = dict(
activity_id="CMIP",
experiment_id="historical",
variable_id=["tos"],
table_id="Omon",
source_id = ['MPI-ESM-1-2-HAM'],
member_id = 'r3i1p1f1',
grid_label = 'gn'
)
cat_mon = col.search(**query)
z_kwargs = {'consolidated': True, 'decode_times':False}
with dask.config.set(**{'array.slicing.split_large_chunks': True}):
dset_dict = cat_mon.to_dataset_dict(zarr_kwargs=z_kwargs)
from xmip.preprocessing import correct_units
ds_test = ds.drop(['time', 'time_bnds'])
correct_units(ds_test)
ds_test works as intended!!! So it is the undecoded time units that cause the failure. As a quick fix for @tessjacobson: Could you just decode the times? Or was there a specific reason not to do that. from xmip.preprocessing import combined_preprocessing
import intake
import dask
url = "https://storage.googleapis.com/cmip6/pangeo-cmip6.json"
col = intake.open_esm_datastore(url)
query = dict(
activity_id="CMIP",
experiment_id="historical",
variable_id=["tos"],
table_id="Omon",
source_id = ['MPI-ESM-1-2-HAM'],
member_id = 'r3i1p1f1',
grid_label = 'gn'
)
cat_mon = col.search(**query)
z_kwargs = {'consolidated': True, 'decode_times':True}
with dask.config.set(**{'array.slicing.split_large_chunks': True}):
dset_dict = cat_mon.to_dataset_dict(zarr_kwargs=z_kwargs, preprocess=combined_preprocessing) works as intended. A more high level question for @keewis and @TomNicholas : The units upsetting pint seem to be
is there a way pint-xarray could/should detect encoded time dimensions and leave them alone? Another question for the whole group: |
I don't understand. How is
I would happily add this to |
It is not. The issue here is that @tessjacobson explicitly set But since xarray still 'knows' about the time dimension I feel we should be able to just leave them as is. |
@keewis is there a way to have |
I'm not sure. We'd need to be able to tell See also #279 |
Actually, it might be better to move this to Edit: that was in xarray-contrib/cf-xarray#238 That would cast everything to a import pint
ureg = pint.UnitRegistry(preprocessors=[...])
ureg.preprocessors.insert(0, str)
ureg.parse_units(1) |
Just to clarify, I think the |
the datetime unit issue should be fixed by the PR on For the |
Amazing. Thank you so much @keewis. |
I just tested this on the LEAP-Pangeo hub from the main of cf-xarray and pint-xarray
and this ran without error: from xmip.preprocessing import combined_preprocessing
import intake
import dask
url = "https://storage.googleapis.com/cmip6/pangeo-cmip6.json"
col = intake.open_esm_datastore(url)
query = dict(
activity_id="CMIP",
experiment_id="historical",
variable_id=["tos"],
table_id="Omon",
source_id = ['MPI-ESM-1-2-HAM'],
member_id = 'r3i1p1f1',
grid_label = 'gn'
)
cat_mon = col.search(**query)
z_kwargs = {'consolidated': True, 'decode_times':False}
with dask.config.set(**{'array.slicing.split_large_chunks': True}):
dset_dict = cat_mon.to_dataset_dict(zarr_kwargs=z_kwargs, preprocess=combined_preprocessing) I will close this issue now. Please feel free to open again if this should not work for you @tessjacobson. |
you might also want to close #279 |
it took me a while, but the fix in |
Awesome. Thanks @keewis |
I'm trying to preprocess SST data in all the historical CMIP6 runs and running into an issue with
combined_preprocessing
. This happens with any of the models/members but is shown below for a single model/member. It seems to be happening in thecorrect_units
step. Using v0.21 ofpint
and v0.7.1 ofxmip
.The text was updated successfully, but these errors were encountered: