Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Contiguous datasets in SWMR mode cannot be read with hdf5 1.14.4+ #5251

Open
PeterC-DLS opened this issue Jan 20, 2025 · 2 comments
Open

Contiguous datasets in SWMR mode cannot be read with hdf5 1.14.4+ #5251

PeterC-DLS opened this issue Jan 20, 2025 · 2 comments
Assignees
Labels
Component - C Library Core C library issues (usually in the src directory) Priority - 1. High 🔼 These are important issues that should be resolved in the next release Type - Bug / Bugfix Please report security issues to [email protected] instead of creating an issue on GitHub

Comments

@PeterC-DLS
Copy link
Contributor

PeterC-DLS commented Jan 20, 2025

Describe the bug
During the writing of a SWMR file with a contiguous dataset, when a read process opens that dataset, I get using hdf5 1.14.4-3

  File "/io/swmr-check.py", line 33, in read
    c = f[data_path]
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "/opt/python/cp310-cp310/lib/python3.10/site-packages/h5py/_hl/group.py", line 357, in __getitem__
    oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5o.pyx", line 257, in h5py.h5o.open
KeyError: 'Unable to synchronously open object (invalid dataset size, likely file corruption)'

I guess #4283 may have caused this issue.

Expected behavior
In a prior release, such as 1.14.3, the dataset is opened successfully. I also encounter another issue using this release which is noted below.

Platform (please complete the following information)

  • HDF5 version 1.14.4.-3
  • OS and version RHEL 7
  • Compiler and version
  • Build system (e.g. CMake, Autotools) and version cmake
  • Any configure options you specified
  • MPI library and version (parallel HDF5) not used

Additional context
A h5py script to reproduce the error:

import h5py
import numpy as np
from time import sleep

data_path = "test/data"

def write(out_name, chunk_size=12, reps=60, pause=4, chunk=False):
    d = np.arange(chunk_size, dtype=np.float64)
    print(d)
    total_shape = (chunk_size * reps,)
    with h5py.File(out_name, "w", libver="latest") as f:
        if chunk:
            f.create_dataset(data_path, dtype=d.dtype, shape=d.shape, maxshape=total_shape, chunks=d.shape, fillvalue=np.nan)
        else:
            f.create_dataset(data_path, dtype=d.dtype, shape=total_shape)
            # workaround using for a fixed shape chunked dataset
            #f.create_dataset(data_path, dtype=d.dtype, shape=total_shape, maxshape=total_shape, chunks=total_shape, fillvalue=np.nan)
        f.swmr_mode = True
        # sleep(pause)
        c = f[data_path]
        for i in range(reps):
            b = i * chunk_size
            e = b + chunk_size
            s = slice(b, e)
            print(f"{i+1}/{reps}: {s}")
            if chunk:
                c.resize((e,))
            c[s] = d
            c.flush()
            sleep(pause)

def read(in_name, reps=60, pause=4):
    with h5py.File(in_name, "r", libver="latest", swmr=True) as f:
        c = f[data_path]
        l = 0
        while True:
            c.refresh()
            l += 1
            print(l, c.shape)
            print(c[:])
            if l == reps:
                break
            sleep(pause)

def main(reps=5):
    from multiprocessing import Process
    use_chunk = False
    filename = "swmr-test.h5"
    w = Process(target=write, args=(filename,), kwargs={"reps":reps, "chunk":use_chunk})
    r = Process(target=read, args=(filename,), kwargs={"reps":reps})
    w.start()
    sleep(0.5) # wait long enough for file to be created
    r.start()
    w.join()
    r.join()

if __name__ == "__main__":
    main()

Note, also, the contents of a contiguous dataset does not read updated values during SWMR writing in 1.14.3:

[ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9. 10. 11.]
1/5: slice(0, 12, None)
1 (60,)
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
2/5: slice(12, 24, None)
2 (60,)
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
3/5: slice(24, 36, None)
3 (60,)
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
4/5: slice(36, 48, None)
4 (60,)
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
5/5: slice(48, 60, None)
5 (60,)
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]

@jhendersonHDF jhendersonHDF self-assigned this Jan 22, 2025
@jhendersonHDF
Copy link
Collaborator

Hi @PeterC-DLS,

thanks for the report and the reproducer! It does look like the PR you mentioned may be causing this issue due to an overly strict check of dataset sizes that didn't take SWMR into account. We're currently in the middle of our release process, so a fix may not make it into our next release, but we'll investigate both issues and try to get a fix into the release after this one.

@Villtord
Copy link

Can confirm same happens on RHEL 8

@jhendersonHDF jhendersonHDF added Priority - 1. High 🔼 These are important issues that should be resolved in the next release Component - C Library Core C library issues (usually in the src directory) Type - Bug / Bugfix Please report security issues to [email protected] instead of creating an issue on GitHub labels Jan 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component - C Library Core C library issues (usually in the src directory) Priority - 1. High 🔼 These are important issues that should be resolved in the next release Type - Bug / Bugfix Please report security issues to [email protected] instead of creating an issue on GitHub
Projects
None yet
Development

No branches or pull requests

3 participants