How should dimnames -> axes names work? #86

Tokazama · 2019-12-12T11:48:26Z

I've mentioned this in a more rambly post here, but I thought it would be good to specifically address how to propagate names to axes. Specifically, how should one get the axes (e.g., axes(NamedDimsArray)) and access the names in a concise and meaningful way?

As I see it there are 3 possible ways for this to happen:

Return a NamedTuple
Incorporate names into the individual elements of the Tuple returned by axes.
Make a unique structure like NamedTuple that permits repeated names.
Make names and axes completely separate and users sort out how to keep track of them

The first one would only work if each name is completely unique, so there would have to be defaults for wild card names. We'd also have to address a lot of the instances in base where indexing is accomplished by iterating through a tuple (e.g. to_indices).
The second one would require making a novel type for a named index.
The third is what is currently happening and would mean we do nothing for this particular issue, but is pretty obnoxious.

The text was updated successfully, but these errors were encountered:

mcabbott · 2019-12-12T12:26:05Z

I think that when last I tried, making axes or size return a NamedTuple broke all sorts of things which Base expects. And, as you say, this demand unique names. And I'm not so sure what problem it solves anyway. If you want to retrieve an axis, axes(nda, :x) already works fine, instead of axes(nda).x.

For point 2. the problem this aims to solve is the creation of new arrays which carry some of the same names? You may enjoy some experiments in NamedPlus.jl which do this for size, because I realised I had a lot of functions which took the size of their input, and made a new array to write into. (All axes were just OneTo(n).) With this modified size, the new array automatically gets the names:

julia> using NamedPlus

julia> nda = named(rand(Int8, 3,7), :x, :y)
3×7 NamedDimsArray(::Array{Int8,2}, (:x, :y)):
      → :y
↓ :x  -79  -80  111  -70  110  -122  52
      -36  -57  109   94   20    -4  21
      123    2   65  -64  -30    -2  72

julia> d,n = size(nda)
(3ᵅ, 7ᵅ)

julia> zed = zeros(2,n)
2×7 NamedDimsArray(::Array{Float64,2}, (:_, :y)):
      → :y
↓ :_  0.0  0.0  0.0  0.0  0.0  0.0  0.0
      0.0  0.0  0.0  0.0  0.0  0.0  0.0

julia> d'
NamedInt(x′ = 3)

Tokazama · 2019-12-12T12:49:10Z

I hadn't even thought of size returning names. The main reason I'm interested in this is because it's used a lot in ImageCore.jl there's a whole packages, ImageAxes that built on this feature of AxisArrays.

mcabbott · 2019-12-12T13:09:42Z

I see, but this is about AxisArrays's nontrivial "axes", not Base.axes. There was a PR to make those accessible as A.x, which I copied in AxisRanges.jl.

Tokazama · 2019-12-12T13:43:15Z

I see, but this is about AxisArrays's nontrivial "axes", not Base.axes.

To be clear, I'm not saying there shouldn't be a comparable functionality for size. I just haven't thought of it and actually think it raises a good point. That is, how far should the names be propagated through other methods? I'm assuming it wouldn't be too difficult to settle on just the "axes" and "size" type methods because those are really all that could propagate the names meaningfully.

My personal preference would be that size or axes return something that has both the size/axes information and names. I realize that might not be a realistic option though and I'm ultimately open to any solution that is performant and maintainable. I have more detailed opinions about each solution, but I wanted to see if anyone else had more solutions/thoughts first.

mcabbott · 2019-12-12T13:53:02Z

Sure. My point with the size example was mostly that it's a working implementation you can play with, of something like your point 2. (With a few rough edges!) One could surely do something similar for axes, although I guess it would be a bit more work. If you have lots of OffsetArrays, then this would be useful to make things like zeros(Int, axes(nda, 1)) also propagate names.

Tokazama · 2019-12-12T14:19:20Z

I really like the idea of approaching it from point 2, but then what does NamedDims.jl offer that couldn't be more conveniently incorporated into NamedIndices.jl? Just use a struct that wraps an array and it's accompanying axes like NamedIndices is trying to do. This would avoid rewriting all of the array interface again.

oxinabox · 2020-01-18T21:21:52Z

I forgot about this issue. I was at a conference.

Note: all the following also apply to size:

I think that when last I tried, making axes or size return a NamedTuple broke all sorts of things which Base expects.
...
I see, but this is about AxisArrays's nontrivial "axes", not Base.axes.

AxisArrays and Base both defining slightly different semantics for different functions call axes is annoying.

One fairly easy solution would be for NamedDims.jl to define named_axes(::NamedDimsArray) which returns a NamedTuple.
Something like:

function named_axes(nda::NamedDimsArray{L}) where L
    return (; (name=>axes(nda, name) for name in L)...)
end

(That should nest happily with IndexedDims overloading axes to retun iterators of the dimensions indexes)

Then if a package is transitioning from the use of AxisArrays.axes to something NamedDims based,
they can do find and replace, or const axes = named_axes

oxinabox · 2020-01-18T21:27:02Z

I really like the idea of approaching it from point 2, but then what does NamedDims.jl offer that couldn't be more conveniently incorporated into NamedIndices.jl? Just use a struct that wraps an array and it's accompanying axes like NamedIndices is trying to do. This would avoid rewriting all of the array interface again.

I have no idea what this comment is talking about sorry.
I can't find this NamedIndices.jl package.

In general NamedDims takes care of everything to do with the names.
So all the things that define that they take a dim argument, NamedDims takes care of converting those into integers before passing it on the other underlying type.
For a bunch of those functions (e.g. sum) almost all underling types have nothing to do and don't need to overload it again.
For some functions, like getindex then the backing array tyope does still have more things to do; but at least when writing it they don't have to think about name lookup anymore.

Tokazama · 2020-01-18T21:28:20Z

I implemented some glue code very similar to this elsewhere https://github.com/JuliaImages/ImageCore.jl/blob/master/src/traits.jl#L42. I think it's probably the right solution, but at some point we need to figure out how our names will by compatible with the keys of a NamedTuple. Right now we'd have to convert wild card names to something unique.

oxinabox · 2020-01-18T21:34:13Z

at some point we need to figure out how our names will by compatible with the keys of a NamedTuple. Right now we'd have to convert wild card names to something unique.

To my mind when people end up in circumstances like that, it is a sign that they should have resolved all wildcards first -- since if they want to access something by name, they had better know what that name is before they do so.
So generally, either erroring or making picking one arbitarily is kind of OK -- we can assume they never check the value for the duplicated one anyway?

With your trait based approach, could detect the duplicate named and if so say that it doesn't have names.

Tokazama · 2020-01-18T21:52:39Z

I have no idea what this comment is talking about sorry.

Sorry, I meant IndexDims. It's a fair point though that this functions could be simpler to implement separately.

since if they want to access something by name, they had better know what that name is before they do so.

I think that logically makes sense, but it would change how we do some things with images right now. Maybe I should make a PR and see if I can figure out how to connect all the dots on paper.

oxinabox added the RequiresThinking For issues / feature requests that need detailed and careful consideration label Dec 12, 2019

nalimilan mentioned this issue May 30, 2020

findfirst/findlast/nextind/prevind not working for eachcol in v0.21.0 JuliaData/DataFrames.jl#2229

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How should dimnames -> axes names work? #86

How should dimnames -> axes names work? #86

Tokazama commented Dec 12, 2019 •

edited

Loading

mcabbott commented Dec 12, 2019

Tokazama commented Dec 12, 2019

mcabbott commented Dec 12, 2019

Tokazama commented Dec 12, 2019

mcabbott commented Dec 12, 2019 •

edited

Loading

Tokazama commented Dec 12, 2019

oxinabox commented Jan 18, 2020 •

edited

Loading

oxinabox commented Jan 18, 2020

Tokazama commented Jan 18, 2020

oxinabox commented Jan 18, 2020

Tokazama commented Jan 18, 2020 •

edited

Loading

How should dimnames -> axes names work? #86

How should dimnames -> axes names work? #86

Comments

Tokazama commented Dec 12, 2019 • edited Loading

mcabbott commented Dec 12, 2019

Tokazama commented Dec 12, 2019

mcabbott commented Dec 12, 2019

Tokazama commented Dec 12, 2019

mcabbott commented Dec 12, 2019 • edited Loading

Tokazama commented Dec 12, 2019

oxinabox commented Jan 18, 2020 • edited Loading

oxinabox commented Jan 18, 2020

Tokazama commented Jan 18, 2020

oxinabox commented Jan 18, 2020

Tokazama commented Jan 18, 2020 • edited Loading

Tokazama commented Dec 12, 2019 •

edited

Loading

mcabbott commented Dec 12, 2019 •

edited

Loading

oxinabox commented Jan 18, 2020 •

edited

Loading

Tokazama commented Jan 18, 2020 •

edited

Loading