Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error with handling variable length data (H5T_VLEN) #99

Open
janblumenkamp opened this issue Jan 9, 2019 · 6 comments
Open

Error with handling variable length data (H5T_VLEN) #99

janblumenkamp opened this issue Jan 9, 2019 · 6 comments

Comments

@janblumenkamp
Copy link

I have a dataset that was created with h5py and which contains variable length data (utilizing the H5T_VLEN type). The python script I used to generate it:

import numpy as np
import h5py

with h5py.File('testdata.hdf5', 'a') as hdf:
  if 'real' in hdf:
    del hdf['real']
  
  hdf_group = hdf.create_group('real')
  hdf_labels = hdf_group.create_dataset('labels', (3,), h5py.special_dtype(vlen = np.uint8))
  
  for i in range(3):
    labels = np.empty(i + 1, np.uint8)
    for j in range(i + 1):
      labels[j] = j
    hdf_labels[i] = labels

The output of h5dump:

HDF5 "testdata.hdf5" {
GROUP "/" {
   GROUP "real" {
      DATASET "labels" {
         DATATYPE  H5T_VLEN { H5T_STD_U8LE}
         DATASPACE  SIMPLE { ( 3 ) / ( 3 ) }
         DATA {
         (0): (0), (0, 1), (0, 1, 2)
         }
      }
   }
}
}

Reading the generated HDF file in JS:

const hdf5 = require('hdf5').hdf5;
const h5tb = require('hdf5').h5tb;

var Access = require('hdf5/lib/globals').Access;
var file = new hdf5.File('testdata.hdf5', Access.ACC_READ);
var group = file.openGroup('real');
var readBuffer=h5tb.getTableInfo(group.id, 'labels');
console.log(readBuffer);

And the output:

HDF5-DIAG: Error detected in HDF5 (1.10.4) thread 0:
  #000: H5Tfields.c line 63 in H5Tget_nmembers(): cannot return member number
    major: Invalid arguments to routine
    minor: Inappropriate type
  #001: H5Tfields.c line 104 in H5T_get_nmembers(): operation not supported for type class
    major: Invalid arguments to routine
    minor: Inappropriate type
{ nfields: 1213911376, nrecords: -781860838 }

where the numbers in the last line are different every time.
What is the problem? I would be surprised if H5T_VLEN is not implemented, as it should also be used for strings?

@rimmartin
Copy link
Collaborator

rimmartin commented Jan 10, 2019

Hi,
h5tb.getTableInfo is for Tables like https://support.hdfgroup.org/HDF5/doc/HL/RM_H5TB.html. The python above created a dataset which probably doesn't have all the table sophistry.

Do you want the dimensions of your dataset before reading it? Maybe try

const dims = group.getDatasetDimensions('labels');

http://hdf-ni.github.io/hdf5.node/ref/groups.html

@janblumenkamp
Copy link
Author

Hi,
getDatasetDimensions correctly outputs [3], but readDataset outputs a similar error:

var data = h5lt.readDataset(group.id, 'labels');
                ^

SyntaxError: unsupported data type
    at Object.<anonymous> (hdfGenerator.js:23:17)
    at Module._compile (internal/modules/cjs/loader.js:722:30)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:733:10)
    at Module.load (internal/modules/cjs/loader.js:620:32)
    at tryModuleLoad (internal/modules/cjs/loader.js:560:12)
    at Function.Module._load (internal/modules/cjs/loader.js:552:3)
    at Function.Module.runMain (internal/modules/cjs/loader.js:775:12)
    at startup (internal/bootstrap/node.js:300:19)
    at bootstrapNodeJSCore (internal/bootstrap/node.js:826:3)

@rimmartin
Copy link
Collaborator

Ah, ok; I'll add support for reading:-) Thank you, I have a python env to make the test

@rimmartin
Copy link
Collaborator

And also writing...

@rimmartin
Copy link
Collaborator

Hi, do you have other types you want vlen'ed?

Also the overall dimensions and rank you need covered? I want to support them all

@janblumenkamp
Copy link
Author

janblumenkamp commented Jan 11, 2019

Perfect, thanks! It would be great if any kind of tables can also be used with VLEN. Demo generation script:

import numpy as np
import h5py

label_dtype = np.dtype(
  [('type1', np.float),
   ('type2', np.float),
   ('type3', np.uint8),
   ('type4', np.uint16),
   ('type5', np.uint16)])

with h5py.File('testdata.hdf5', 'a') as hdf:
  if 'real' in hdf:
    del hdf['real']
  
  hdf_group = hdf.create_group('real')
  hdf_labels = hdf_group.create_dataset('labels', (3,), h5py.special_dtype(vlen = label_dtype))
  
  for i in range(3):
    labels = np.empty(i + 1, label_dtype)
    for j in range(i + 1):
      labels[j]['type1'] = j
      labels[j]['type2'] = j + 1
      labels[j]['type3'] = j + 2
      labels[j]['type4'] = j + 3
      labels[j]['type5'] = j + 4
    hdf_labels[i] = labels

h5dump output:

HDF5 "testdata.hdf5" {
GROUP "/" {
   GROUP "real" {
      DATASET "labels" {
         DATATYPE  H5T_VLEN { H5T_COMPOUND {
            H5T_IEEE_F64LE "type1";
            H5T_IEEE_F64LE "type2";
            H5T_STD_U8LE "type3";
            H5T_STD_U16LE "type4";
            H5T_STD_U16LE "type5";
         }}
         DATASPACE  SIMPLE { ( 3 ) / ( 3 ) }
         DATA {
         (0): ({
                  0,
                  1,
                  2,
                  3,
                  4
               }),
         (1): ({
                  0,
                  1,
                  2,
                  3,
                  4
               }, {
                  1,
                  2,
                  3,
                  4,
                  5
               }),
         (2): ({
                  0,
                  1,
                  2,
                  3,
                  4
               }, {
                  1,
                  2,
                  3,
                  4,
                  5
               }, {
                  2,
                  3,
                  4,
                  5,
                  6
               })
         }
      }
   }
}
}

But I think I will use another approach for now. Maybe I will get back to this really nice module in the future. So no hurry for me, but this will probably also be very useful for other users who want to use the module with hdf files generated with h5py :)

Regarding the rank, it would be helpful if any kind of rank can be used (if I understand correctly the maximum rank you currently support is 4?), but then this would probably be a different issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants