-
Notifications
You must be signed in to change notification settings - Fork 264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NetCDF unable to read some HDF5 enums #267
Comments
This is also true for reading data written with a complex dtype, which is encoded as an HDF5 compound dtype. |
I'll take a look at this; netCDF4 does not support all features of libhdf5, but if this is something that's straightforward to implement, we should be able to add support in. |
I suspect that the problem is that I wrote the enum code for integer types only. Who knew that you could use a binary type in an enum!?! For example, in nc4type.c we have this function, which assumes that only integer types are involved:
I can take a look at this - I will put it on the list. ;-) But @shoyer I am not familiar with a dtype. What's that all about? |
@edhartnett thanks for you interest! My guess was that the issue is netCDF4 always expects data types to be defined at the group level, but h5py only defines data types on dataset objects.
I'm not quite what you mean by a binary type in an enum.
"dtype" is Python shorthand for "data type", i.e., the type for each element in an array. |
The assumption that enum basetypes are integers is not only in the netcdf-c library |
@edhartnett @DennisHeimbigner I'm pretty sure h5py is still using integer enum codes -- where did you get the impression that they are something else? Both my examples in my first post, created with netCDF4-Python and h5py, report the underlying Looking at the HDF5 docs, it seems that they also insist on enums as a mapping between characters strings and integer values.
|
Ok sorry got that wrong. I will take another look... |
OK, the problem is that the enum is somehow defined inside the dataset in HDF5. How did you do that? Here's the h5dump of your file:
Here's the output of h5dump on an enum file created by nc_test4/tst_enum.c:
Note that in the netCDF-4 file, the data type is defined at root level in the group. But in your mystery file, the data type is defined inside the dataset, instead of at the group level. How did you do that? |
I made this file using h5py using the following Python code: import h5py
with h5py.File('test.nc') as f:
f.create_dataset('foo', data=True) h5py creates HDF5 dtypes using its low-level Booleans in particular use the following function: cdef TypeEnumID _c_bool(dtype dt):
# Booleans
global cfg
cdef TypeEnumID out
out = TypeEnumID(H5Tenum_create(H5T_NATIVE_INT8))
out.enum_insert(cfg._f_name, 0)
out.enum_insert(cfg._t_name, 1)
return out cdef class TypeEnumID(TypeCompositeID):
"""
Represents an enumerated type
"""
cdef int enum_convert(self, long long *buf, int reverse) except -1:
# Convert the long long value in "buf" to the native representation
# of this (enumerated) type. Conversion performed in-place.
# Reverse: false => llong->type; true => type->llong
cdef hid_t basetype
cdef H5T_class_t class_code
class_code = H5Tget_class(self.id)
if class_code != H5T_ENUM:
raise ValueError("This type (class %d) is not of class ENUM" % class_code)
basetype = H5Tget_super(self.id)
assert basetype > 0
try:
if not reverse:
H5Tconvert(H5T_NATIVE_LLONG, basetype, 1, buf, NULL, H5P_DEFAULT)
else:
H5Tconvert(basetype, H5T_NATIVE_LLONG, 1, buf, NULL, H5P_DEFAULT)
finally:
H5Tclose(basetype)
@with_phil
def enum_insert(self, char* name, long long value):
"""(STRING name, INT/LONG value)
Define a new member of an enumerated type. The value will be
automatically converted to the base type defined for this enum. If
the conversion results in overflow, the value will be silently
clipped.
"""
cdef long long buf
buf = value
self.enum_convert(&buf, 0)
H5Tenum_insert(self.id, name, &buf) |
OK, much as I love python, I cannot follow that to figure out how you created the file (with the HDF5 C API). Do you understand what the above code is doing? Do you understand where it is creating the type with an H5Tcreate() command? At that time, is it providing the dataset ID as the locid? I will try and see if C code like that will reproduce the situation... |
It looks like it's using How does netCDF associate types with HDF5 groups? There doesn't seem to be any reference to groups in either |
I think Ed has it right. Creating the enum type inside the dataset is not |
OK, the docs on HDF5 datatypes seem to be helpful here: https://support.hdfgroup.org/HDF5/doc/H5.user/Datatypes.html In particular, netCDF4 only seems to support "named datatypes", but h5py here is writing a file with a "transient datatype":
So I guess the request here is to "support transient datatypes" in netCDF. I don't see any particular reason for why not to do this, but the implementation might be involved since we would need to lookup data types in a new place. |
I am right in the middle of a major refactor of the file opening code, which is where this happens, so this is timely. If I can support transient datatypes without too many contortions, I am happy to do so. I would like netCDF-4 to have wide capabilities to read existing HDF5 files. |
@edhartnett I'm assuming you've already finished the code refactor, considering your latest comment from almost three years ago. I just wanted to check the status on this issue regarding "transient datatypes" in netCDF-4. Thanks! |
Did not happen, sorry! |
No worries! Do you think efforts to tackle this will be resumed in near future? I'd be happy to help with testing. |
No, I don't. Unless someone else attempts it. Unfortunately (or fortunately) I am working on other tasks, much more urgent to NOAA needs. ;-) |
Thanks for letting me know and good to hear that my fellow researchers at NOAA receive your attention ;-) If I would try this myself (which would mean to dive into C again after 25 years) which code do you recommend reading first. Any pointers are welcome. |
I don't recommend that this be anyone's first dive into C in 25 years. ;-) What is the big picture here? Is there some application or user workflow that is broken because of this? Or are you trying to fill in missing functionality but there is no user waiting for it? |
I thought that this might be adventurous. This functionality would be |
Thanks, if I'm not back in a year or so, consider me lost in space. |
I too am curious about the use-case. And also the semantics of this. |
I think the idea is to have enums accept other base types (i.e. not just integers). Apparently HDF5 does this, and I just missed it. So no transient type would be needed. |
What other base types? |
FWIW: If I were doing a boolean type in netcdf-4, I would probably do it this way
|
Wow, that's perseverance! Thanks for sticking with this @shoyer and improving netCDF for everyone! |
h5py stores boolean data using an HDF5 enum: http://docs.h5py.org/en/latest/faq.html
The attached file is a such an example: test.nc.zip
However, the netCDF-C library doesn't report these variables at all:
This is somewhat unfortunate.
Here's the output of h5dump on the file, along with a file created via netCDF4:
Scripts to produce these files:
ncdump reports "netcdf library version 4.4.0"
The text was updated successfully, but these errors were encountered: