Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

complex number support #558

Merged
merged 3 commits into from
Aug 21, 2019
Merged

complex number support #558

merged 3 commits into from
Aug 21, 2019

Conversation

kleinhenz
Copy link
Contributor

This closes #180. The approach mirrors what is done by h5py. Complex types are stored as a compound datatype with field names "r" and "i". This compound type is recognized during reading and loaded as the correct complex julia type. Real and complex field names are configurable via HDF5.set_complex_field_names(real::String, imag::String). Complex support can be enabled/disabled via HDF5.enable/disable_complex_support(). All previous tests pass and I have added new tests for complex support specifically.

Although this is not part of the HDF5 standard I think it makes sense to include this here since it seems to be fairly basic and universally useful functionality that people coming from python would expect to just work. Also h5py has had this functionality for a while and doesn’t seem to have run into any major problems with their approach as far as I can tell.

@tknopp @musm what do you think of this?

@tknopp
Copy link
Contributor

tknopp commented Jul 19, 2019

Looks good, thank you! @musm could you do the final decision / review?

@kleinhenz
Copy link
Contributor Author

One downside of this approach is that the hdf5 datatype for complex numbers is created/destroyed on every read/write. I think this is necessary in order to enable dynamically enabling/disabling complex support and updating the field names. Another possibility is to create the types once at library initialization like in #341. I have also implemented this approach here although I am seeing some intermittent segfaults which I am not sure are related.

I'm not sure which approach is better. From a compatibility perspective it is nice to be able to dynamically update complex number handling but creating the type repeatedly will have at least a little performance impact. What do people think is better?

Copy link

@benchislett benchislett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.
I like the dynamic nature, and I don't think the dynamic creation will be that big of a performance hit.
The alternative of creating at init-time for every subtype of HDF5Scalar and looking them up at runtime seems like a bit much to me. I'm in favor of merging as-is

src/HDF5.jl Outdated
@@ -323,6 +323,13 @@ cset(::Type{ASCIIChar}) = H5T_CSET_ASCII

hdf5_type_id(::Type{C}) where {C<:CharType} = H5T_C_S1

# global configuration for complex support
complex_support = true
complex_field_names = ("r", "i")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would ("re", "im") be better here, since that's what Julia uses?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would stick with the default convention h5py uses.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tknopp why do you think it is better to stick to h5py convention?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal of HDF5 is to be language agnostic. It should not matter what programming language you use to access a file. Unfortunately, HDF5 itself defines no standard for complex numbers. But there is a very popular package (h5py) that proposes or uses a convention. Why should this convention be different for Julia, when we can stick to the convention h5py uses?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds reasonable. MATLAB does not have a convention. So we can use the h5py precedent.

We should document this however.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the idea was to seamlessly compatible with h5py by default. If people are happy with this approach and want to merge I can add some documentation explaining the choice.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm in favor of sticking to the h5py precedent

@tknopp
Copy link
Contributor

tknopp commented Aug 21, 2019

ping @musm: Can this be merged after fixing the conflicts?

enable_complex_support() = global complex_support = true
disable_complex_support() = global complex_support = false
set_complex_field_names(real::String, imag::String) = global complex_field_names = (real, imag)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really need these global setter functions?

What's the rationale for being able to enable/disable complex support? Same with set_complex_field_names?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case a better way to deal with such global constants is to use Ref's with appropriate setters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented it this way following what is done by h5py and for compatibility with existing practice/workflows. Since there is no actual hdf5 standard for complex numbers allowing the fieldnames to be set dynamically lets you interface more easily with existing conventions. For example I think that octave uses real/imag and I have also seen re/im. Also it's conceivable that somebody is storing a compound type with these fieldnames that doesn't represent a complex number. I'm not sure if this flexibility is necessary/worth it but I decided to follow the h5py precedent in allowing it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why they would want to do that, but even if they did couldn't they just change the field names?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kleinhenz that makes sense

Can we please update all the global constants to use refs (it's precompile friendly, as opposed to globals)

const COMPLEX_SUPPORT = Ref(true) 
const COMPLEX_FIELD_NAMES = Ref(("r", "i"))

enable_complex_support() = COMPLEX_SUPPORT[] = true
disable_complex_support() = COMPLEX_SUPPORT[] = false
set_complex_field_names(real::AbstractString, imag::AbstractString) =  COMPLEX_FIELD_NAMES[] = ((real, imag))

And update everywhere else to use COMPLEX_SUPPORT[] instead of complex_support

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see the rational of being able to disable this feature. It allows to to read the complex numbers in its "raw" representation, which otherwise would not be possible. It also does not really hurt making this optional.

Copy link
Member

@musm musm Aug 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at h5py they don't offer a seperate way to enable/disable complex support.

I kinda agree with @benchislett that even if they would want to do that they could just change the field names.

I'm not 100% convinced which approach is best. I'm leaning towards no option to enable disable support since that seems a little overboard. OTOH we can always remove this non-user facing functions in the future if deemed overkill.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are correct that h5py doesn't provide a mechanism for completely disabling complex support. I don't have a strong preference on this either. I lean towards including it just to be the least disruptive to existing workflows but I could see it the other way also.

One point in favor is that changing the field names will prevent you from reading data as complex when you don't want to but it can't prevent you from accidentally writing complex data in an unwanted format where previously it would have thrown an exception so I think including the enable/disable thing is somewhat safer for people with a preexisting way of handling complex data.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok that's convincing enough 👍

Manifest.toml Outdated Show resolved Hide resolved
@kleinhenz
Copy link
Contributor Author

@musm I've updated the global constants to use refs and fixed the merge conflicts. I will add some documentation later today.

src/HDF5.jl Outdated Show resolved Hide resolved
src/HDF5.jl Outdated Show resolved Hide resolved
@kleinhenz
Copy link
Contributor Author

is there anything more that I should do here?

@musm
Copy link
Member

musm commented Aug 21, 2019

I guess the only thing is

#558 (comment)

Do you know if h5py has an option for this as well? I looked but it seems like they only allow you to change the field names? I could be wrong though.

@musm
Copy link
Member

musm commented Aug 21, 2019

I'll go ahead and merge this since this has been open for a while without comment

@tknopp @kleinhenz do you all approve as well?

@kleinhenz
Copy link
Contributor Author

sounds good to me

@tknopp
Copy link
Contributor

tknopp commented Aug 21, 2019

looks good!

@musm musm merged commit ac3570d into JuliaIO:master Aug 21, 2019
@musm
Copy link
Member

musm commented Aug 22, 2019

@kleinhenz thanks for sticking to this PR and seeing it through. It's great to have others improve this package

@kleinhenz
Copy link
Contributor Author

@musm thanks! glad we were able to get it merged!

@lazarusA
Copy link

Why this does not work? what is the appropriate syntax ?

f = h5open("file.h5", "w")
dset_f = d_create(f, "psi", datatype(Complex{Float64}), dataspace(5, 5, 10))
for i in 1:10
    dset_f[:, :, i] = zeros(Complex{Float64}, 5,5)
end
flush(f)
close(f)

@kleinhenz
Copy link
Contributor Author

@lazarusA does #591 fix your issue?

@lazarusA
Copy link

lazarusA commented Oct 30, 2019

I did add HDF5#master and my code still doesn't work.

using HDF5
f = h5open("file.h5", "w")
dset_f = d_create(f, "psi", datatype(Complex{Float64}), dataspace(5, 5, 10))
for i in 1:10
    dset_f[:, :, i] = zeros(Complex{Float64}, 5,5)
end
flush(f)
close(f)

@kleinhenz
Copy link
Contributor Author

It's not merged yet so you have to add it with add https://github.com/kleinhenz/HDF5.jl#complex_hyperslab

@lazarusA
Copy link

Nice! now it has been merged. And it works just beautiful.

@bhawkins
Copy link

Thanks for adding this feature! I just thought I'd mention that this supports MATLAB MAT files, too:

>> x = 1:10;
>> y = x + 10;
>> z = x + 1j * y;
>> save('test.mat', 'z', '-v7.3')
julia> using HDF5

julia> HDF5.set_complex_field_names("real", "imag");

julia> z = h5read("/tmp/test.mat", "z")
1×10 Array{Complex{Float64},2}:
 1.0+11.0im  2.0+12.0im  3.0+13.0im    8.0+18.0im  9.0+19.0im  10.0+20.0im

@musm
Copy link
Member

musm commented Oct 15, 2020

@kleinhenz I just noticed this but is it really allowed for Complex{<:HDF5Scalar}, specifically the case with HDF5Reference, or should it be restricted to Complex{<:HDF5BitsKind} ?

@kleinhenz
Copy link
Contributor Author

yeah this probably doesn't make much sense for HDF5Reference. OTOH I think in general we want to move more in the direction of removing dispatch restrictions rather than tightening them since this could conceivably make sense for types not in HDF5Scalar, e.g. BigFloat.

@musm
Copy link
Member

musm commented Oct 17, 2020

Totally agreed, and actually recent commits on master have started to remove some unnecessary type restrictions for the high level interfaces

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Read complex numbers from HDF5
6 participants