Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: support unwrapping #17

Closed
wants to merge 3 commits into from
Closed

WIP: support unwrapping #17

wants to merge 3 commits into from

Conversation

piever
Copy link

@piever piever commented Sep 20, 2018

This is an attempt at addressing #13. It is still WIP (I've only done the case with known schema, haven't added tests and have probably forgotten to added the unwrap keyword in a few places) and lacks tests but I hope is good enough to give an idea. It works as follows:

the user passes a unwrap optional argument (defaults to t -> false) that specifies what types should be unwrapped. So the default behavior is unchanged:

julia> x = [(a=(b=1,), c=2)]
1-element Array{NamedTuple{(:a, :c),Tuple{NamedTuple{(:b,),Tuple{Int64}},Int64}},1}:
 (a = (b = 1,), c = 2)

julia> Tables.columns(x)
(a = NamedTuple{(:b,),Tuple{Int64}}[(b = 1,)], c = [2])

julia> unwrap(x) = false
unwrap (generic function with 1 method)

julia> unwrap(::Type{<:NamedTuple}) = true
unwrap (generic function with 2 methods)

julia> Tables.columns(x, unwrap=unwrap)
(a = (b = [1],), c = [2])

From experimentation in StructArrays the compiler is quite good here so it figures out the value of unwrap(T) in advance (at least in general I couldn't find traces of unwrap in @code_llvm and results were inferred correctly).

In #13 I was wrong: there's actually no need to add extra setindex! or push! method provided one is a bit careful in the definition of add!

src/fallbacks.jl Outdated
@@ -44,25 +44,32 @@ haslength(L) = L isa Union{Base.HasShape, Base.HasLength}

Custom column types can override with an appropriate "scalar" element type that should dispatch to their column allocator.
"""
allocatecolumn(T, len) = Vector{T}(undef, len)
function allocatecolumn(::Type{T}, len; unwrap = t -> false) where {T}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mhm, actually if this is supposed to be overloaded (say to get a DataValueArray or things like that), I should add an extra step in between and not touch this one. So allocatecolumns should call some _allocatecolumn that decides whether calling allocatecolumn or to call back allocatecolumns.

Pietro Vertechi added 2 commits September 21, 2018 10:34
@piever
Copy link
Author

piever commented Sep 21, 2018

I got stuck in the implementation of the type unstable case (in the IndexedTables scenario I think I understand how to do it in full generality but here it doesn't play particularly nicely with the updated trick. I was wondering whether this is the wrong approach (as the code in Tables is quite nice and I woulnd't want to mess it up for this one usecase). Still, I hope this WIP PR is enough to give an idea of the strategy I had in mind.

I'm curious whether this is best done at the level of the element being iterated. For example I could define a:

struct Unwrapped{F<: Function, T}
   unwrap::F
   itr::T
end

whose propertynames are the unwrapped propertynames of el (with some unique names). For example in the case of (a=(b=1, c=2), d=3) the unwrapped propertynames would be (b, c, d) (some care is needed to avoid repetition) and everything would proceed as usual, but somehow the information of the row type is kept so that one can remap the collected columns in the end to turn (b=vb, c=vc, d = vd) into (a=(b=vb, c=vc), d=vd). Does that make sense?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant