-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
narrow_types!
operation to wrangle Any columns
#9
Comments
Yeah, I can see the utility of this. Here's a few thoughts:
|
Thank you! Yes I believe it makes sense. Although I'm a little confused about storing the new Schema and specifying a subset of columns to narrow (as opposed to all columns). I don't have a ton of experience with lazy eval so this probably needs a bunch of work but I've started by omitting the ability to specify columns to narrow (ie just narrowing the whole table), but I imagine it is quite similar to struct NarrowTypes{T}
x::T
schema::Tables.Schema
end
narrow_arr(x) = mapreduce(typeof, promote_type, x)
narrow_types(t) = NarrowTypes(t, Tables.Schema(Tables.columnnames(t), [narrow_arr(getproperty(t, nm)) for nm in Tables.columnnames(t)]))
Tables.getcolumn(nt::NarrowTypes, nm::Symbol) = Vector{getproperty(nt.schema.types, nm)}(Tables.getcolumn(getfield(nt, 1), nm))
Tables.getcolumn(nt::NarrowTypes, i::Int) = Vector{nt.schema.types[i]}(Tables.getcolumn(getfield(nt, 1), i))
Tables.columnnames(nt::NarrowTypes) = Tables.columnnames(getfield(nt, 1)) # or nt.sch.names?
Tables.schema(nt::NarrowTypes) = nt.schema
Tables.istable(::Type{<:NarrowTypes}) = true
MWE (I believe any df could be used) using Tables, TableOperations, CSV, DataFrames
df = CSV.read("purple_air_data.csv", DataFrame)
t = Tables.table(Matrix(df))
Tables.MatrixTable{Array{Any,2}}:
nt = narrow_types(t)
t_sch = Tables.schema(t)
nt_sch = Tables.schema(nt)
julia> Tables.getcolumn(t, 5)
13555-element Array{Any,1}:
julia> Tables.getcolumn(nt, 5)
13555-element Array{Int64,1}: I'm not exactly certain I've done lazy evaluation correctly, but I'd appreciate letting me know if I'm on the right track. Last things I noted:
Thanks! |
That looks pretty good so far; mind putting it in a pull request? |
Implemented in #14 |
There have been a number of times when I have wanted to coerce a DataFrame/Table with
Any
or incorrect column types into something more specific.I end up with poorly written conversions using
tryparse
, etc.For a new user to want their data in the correct data type without this hassle, a utility function like this would be pretty handy:
example:
This function could probably be improved to work with specified columns, or an individual column, but this is the gist of it
The text was updated successfully, but these errors were encountered: