Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support types from Base.Iterators as columns #101

Closed
Drvi opened this issue Jun 15, 2019 · 2 comments
Closed

Support types from Base.Iterators as columns #101

Drvi opened this issue Jun 15, 2019 · 2 comments

Comments

@Drvi
Copy link

Drvi commented Jun 15, 2019

I guess the problem is that Flatten, PartitionIterator, Repeated, Filter... are not AbstractArrays? But they are iterable so it seems possible to support them.

Examples:

For two ranges it works as expected.

julia> collect(Tables.rows((a = 1:4, b = 1:4)))
4-element Array{Tables.ColumnsRow{NamedTuple{(:a, :b),Tuple{UnitRange{Int64},UnitRange{Int64}}}},1}:
Tables.ColumnsRow{NamedTuple{(:a, :b),Tuple{UnitRange{Int64},UnitRange{Int64}}}}((a = 1:4, b = 1:4), 1)
Tables.ColumnsRow{NamedTuple{(:a, :b),Tuple{UnitRange{Int64},UnitRange{Int64}}}}((a = 1:4, b = 1:4), 2)
Tables.ColumnsRow{NamedTuple{(:a, :b),Tuple{UnitRange{Int64},UnitRange{Int64}}}}((a = 1:4, b = 1:4), 3)
Tables.ColumnsRow{NamedTuple{(:a, :b),Tuple{UnitRange{Int64},UnitRange{Int64}}}}((a = 1:4, b = 1:4), 4)

For a combination of "lazy inputs" it sometimes fails on covert.

julia> collect(Tables.rows((a = flatten(1:4), b = 1:4)))
ERROR: MethodError: Cannot `convert` an object of type Tables.IteratorRow{Base.Iterators.Flatten{UnitRange{Int64}}} to an object of type Tables.IteratorRow{Any}
Closest candidates are:
 convert(::Type{T}, ::T) where T at essentials.jl:154
 Tables.IteratorRow{Any}(::Any) where T at /home/drvi/.julia/packages/Tables/hAk5N/src/tofromdatavalues.jl:62
Stacktrace:
[1] setindex!(::Array{Tables.IteratorRow{Any},1}, ::Tables.IteratorRow{Base.Iterators.Flatten{UnitRange{Int64}}}, ::Int64) at ./array.jl:767
[2] copyto!(::Array{Tables.IteratorRow{Any},1}, ::Tables.IteratorWrapper{NamedTuple{(:a, :b),Tuple{Base.Iterators.Flatten{UnitRange{Int64}},UnitRange{Int64}}}}) at ./abstractarray.jl:671
[3] _collect(::UnitRange{Int64}, ::Tables.IteratorWrapper{NamedTuple{(:a, :b),Tuple{Base.Iterators.Flatten{UnitRange{Int64}},UnitRange{Int64}}}}, ::Base.HasEltype, ::Base.HasLength) at ./array.jl:550
[4] collect(::Tables.IteratorWrapper{NamedTuple{(:a, :b),Tuple{Base.Iterators.Flatten{UnitRange{Int64}},UnitRange{Int64}}}}) at ./array.jl:544
[5] top-level scope at none:0

For two flattened arrays it treats columns as rows (I guess)

julia> collect(Tables.rows((a = flatten(1:4), b = flatten(1:4))))
2-element Array{Tables.IteratorRow{Base.Iterators.Flatten{UnitRange{Int64}}},1}:
Tables.IteratorRow{Base.Iterators.Flatten{UnitRange{Int64}}}(Base.Iterators.Flatten{UnitRange{Int64}}(1:4))
Tables.IteratorRow{Base.Iterators.Flatten{UnitRange{Int64}}}(Base.Iterators.Flatten{UnitRange{Int64}}(1:4))

Please let me know if I can help. I'm not familiar with Tables.jl internals but with some guidance I might be able to help.

@quinnj
Copy link
Member

quinnj commented Jun 18, 2019

Yes, currently for the "fallback" definitions, this line assumes that the column extracted is "indexable". The alternative is the ColumnsRow struct somehow try to keep track of the current iteration state for each column as you iterate through the columns. I imagine it'll be a bit tricky to make that performant, but in theory, it should be possible and fast, if you want to give it a shot.

@quinnj
Copy link
Member

quinnj commented Feb 8, 2020

In #131, we've discussed and committed to a new requirement for columns being this: any indexable object with a known length, i.e. it supports x[i] for i = 1:length(x). Luckily, this applies to many iterators, though obviously ones like filter will need to be collected before being able to be treated like a column. A big part of the requirement here is managing complexity in implementations; there were a few places before #131 where generic fallbacks didn't actually support arbitrary iterators as columns, even though the API documentation said they should be valid. It turns out that managing type complexity and keeping things efficient is really hard when trying to scale to potentially hundreds of columns in a table. Alternatively, the vast majority of "column" objects are AbstractVectors, and by just requiring indexing and a known length, we can make generic code complexity much, much simpler.

@quinnj quinnj closed this as completed Feb 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants