diff --git a/dev/index.html b/dev/index.html index 41679ca..878eefb 100644 --- a/dev/index.html +++ b/dev/index.html @@ -21,7 +21,7 @@ # with known length (i.e. supports # `length(column)` and `column[i]`) column = Tables.getcolumn(columns, col) -end

So we see two high-level functions here, Tables.rows, and Tables.columns.

Tables.rowsFunction
Tables.rows(x) => Row iterator

Accesses data of input table source x row-by-row by returning an AbstractRow-compatible iterator. Note that even if the input table source is column-oriented by nature, an efficient generic definition of Tables.rows is defined in Tables.jl to return an iterator of row views into the columns of the input.

The Tables.Schema of an AbstractRow iterator can be queried via Tables.schema(rows), which may return nothing if the schema is unknown. Column names can always be queried by calling Tables.columnnames(row) on an individual row, and row values can be accessed by calling Tables.getcolumn(row, i::Int ) or Tables.getcolumn(row, nm::Symbol) with a column index or name, respectively.

See also rowtable and namedtupleiterator.

source
Tables.columnsFunction
Tables.columns(x) => AbstractColumns-compatible object

Accesses data of input table source x by returning an AbstractColumns-compatible object, which allows retrieving entire columns by name or index. A retrieved column is a 1-based indexable object that has a known length, i.e. supports length(col) and col[i] for any i = 1:length(col). Note that even if the input table source is row-oriented by nature, an efficient generic definition of Tables.columns is defined in Tables.jl to build a AbstractColumns- compatible object object from the input rows.

The Tables.Schema of a AbstractColumns object can be queried via Tables.schema(columns), which may return nothing if the schema is unknown. Column names can always be queried by calling Tables.columnnames(columns), and individual columns can be accessed by calling Tables.getcolumn(columns, i::Int ) or Tables.getcolumn(columns, nm::Symbol) with a column index or name, respectively.

Note that if x is an object in which columns are stored as vectors, the check that these vectors use 1-based indexing is not performed (it should be ensured when x is constructed).

source

Given these two powerful data access methods, let's walk through real, albeit somewhat simplified versions of how packages actually use these methods.

Tables.rows usage

First up, let's take a look at the SQLite.jl package and how it uses the Tables.jl interface to allow loading of generic table-like data into a sqlite relational table. Here's the code:

function load!(table, db::SQLite.DB, tablename)
+end

So we see two high-level functions here, Tables.rows, and Tables.columns.

Tables.rowsFunction
Tables.rows(x) => Row iterator

Accesses data of input table source x row-by-row by returning an AbstractRow-compatible iterator. Note that even if the input table source is column-oriented by nature, an efficient generic definition of Tables.rows is defined in Tables.jl to return an iterator of row views into the columns of the input.

The Tables.Schema of an AbstractRow iterator can be queried via Tables.schema(rows), which may return nothing if the schema is unknown. Column names can always be queried by calling Tables.columnnames(row) on an individual row, and row values can be accessed by calling Tables.getcolumn(row, i::Int ) or Tables.getcolumn(row, nm::Symbol) with a column index or name, respectively.

See also rowtable and namedtupleiterator.

source
Tables.columnsFunction
Tables.columns(x) => AbstractColumns-compatible object

Accesses data of input table source x by returning an AbstractColumns-compatible object, which allows retrieving entire columns by name or index. A retrieved column is a 1-based indexable object that has a known length, i.e. supports length(col) and col[i] for any i = 1:length(col). Note that even if the input table source is row-oriented by nature, an efficient generic definition of Tables.columns is defined in Tables.jl to build a AbstractColumns- compatible object object from the input rows.

The Tables.Schema of a AbstractColumns object can be queried via Tables.schema(columns), which may return nothing if the schema is unknown. Column names can always be queried by calling Tables.columnnames(columns), and individual columns can be accessed by calling Tables.getcolumn(columns, i::Int ) or Tables.getcolumn(columns, nm::Symbol) with a column index or name, respectively.

Note that if x is an object in which columns are stored as vectors, the check that these vectors use 1-based indexing is not performed (it should be ensured when x is constructed).

source

Given these two powerful data access methods, let's walk through real, albeit somewhat simplified versions of how packages actually use these methods.

Tables.rows usage

First up, let's take a look at the SQLite.jl package and how it uses the Tables.jl interface to allow loading of generic table-like data into a sqlite relational table. Here's the code:

function load!(table, db::SQLite.DB, tablename)
     # get input table rows
     rows = Tables.rows(table)
     # query for schema of data
@@ -108,13 +108,13 @@
 end

So here we have a generic DataFrame constructor that takes a single, untyped argument, calls Tables.columns on it, then Tables.columnnames to get the column names. It then passes the Tables.AbstractColumns-compatible object to an internal function fromcolumns, which dispatches on a special kind of Tables.AbstractColumns object called a Tables.CopiedColumns, which wraps any Tables.AbstractColumns-compatible object that has already had copies of its columns made, and are thus safe for the columns-consumer to assume ownership of (this is because DataFrames.jl, by default makes copies of all columns upon construction). In both cases, individual columns are collected in Vector{AbstractVector}s by calling Tables.getcolumn(x, nm) for each column name. A final note is the call to getvector on each column, which ensures each column is materialized as an AbstractVector, as is required by the DataFrame constructor.

Note in both the rows and columns usages, we didn't need to worry about the natural orientation of the input data; we just called Tables.rows or Tables.columns as was most natural for the table-specific use-case, knowing that it will Just Work™️.

Tables.jl Utilities

Before moving on to implementing the Tables.jl interfaces, we take a quick break to highlight some useful utility functions provided by Tables.jl:

Tables.SchemaType
Tables.Schema(names, types)

Create a Tables.Schema object that holds the column names and types for an AbstractRow iterator returned from Tables.rows or an AbstractColumns object returned from Tables.columns. Tables.Schema is dual-purposed: provide an easy interface for users to query these properties, as well as provide a convenient "structural" type for code generation.

To get a table's schema, one can call Tables.schema on the result of Tables.rows or Tables.columns, but also note that a table may return nothing, indicating that its column names and/or column element types are unknown (usually not inferable). This is similar to the Base.EltypeUnknown() trait for iterators when Base.IteratorEltype is called. Users should account for the Tables.schema(tbl) => nothing case by using the properties of the results of Tables.rows(x) and Tables.columns(x) directly.

To access the names, one can simply call sch.names to return a collection of Symbols (Tuple or Vector). To access column element types, one can similarly call sch.types, which will return a collection of types (like (Int64, Float64, String)).

The actual type definition is

struct Schema{names, types}
     storednames::Union{Nothing, Vector{Symbol}}
     storedtypes::Union{Nothing, Vector{Type}}
-end

Where names is a tuple of Symbols or nothing, and types is a tuple type of types (like Tuple{Int64, Float64, String}) or nothing. Encoding the names & types as type parameters allows convenient use of the type in generated functions and other optimization use-cases, but users should note that when names and/or types are the nothing value, the names and/or types are stored in the storednames and storedtypes fields. This is to account for extremely wide tables with columns in the 10s of thousands where encoding the names/types as type parameters becomes prohibitive to the compiler. So while optimizations can be written on the typed names/types type parameters, users should also consider handling the extremely wide tables by specializing on Tables.Schema{nothing, nothing}.

source
Tables.schemaFunction
Tables.schema(x) => Union{Nothing, Tables.Schema}

Attempt to retrieve the schema of the object returned by Tables.rows or Tables.columns. If the AbstractRow iterator or AbstractColumns object can't determine its schema, nothing will be returned. Otherwise, a Tables.Schema object is returned, with the column names and types available for use.

source
Tables.subsetFunction
Tables.subset(x, inds; viewhint=nothing)

Return one or more rows from table x according to the position(s) specified by inds:

  • If inds is a single non-boolean integer return a row object.
  • If inds is a vector of non-boolean integers, a vector of booleans, or a :, return a subset of the original table according to the indices. In this case, the returned type is not necessarily the same as the original table type.

If other types of inds are passed than specified above the behavior is undefined.

The viewhint argument tries to influence whether the returned object is a view of the original table or an independent copy:

  • If viewhint=nothing (the default) then the implementation for a specific table type is free to decide whether to return a copy or a view.
  • If viewhint=true then a view is returned and if viewhint=false a copy is returned. This applies both to returning a row or a table.

Any specialized implementation of subset must support the viewhint=nothing argument. Support for viewhint=true or viewhint=false is optional (i.e. implementations may ignore the keyword argument and return a view or a copy regardless of viewhint value).

source
Tables.partitionsFunction
Tables.partitions(x)

Request a "table" iterator from x. Each iterated element must be a "table" in the sense that one may call Tables.rows or Tables.columns to get a row-iterator or collection of columns. All iterated elements must have identical schema, so that users may call Tables.schema(first_element) on the first iterated element and know that each subsequent iteration will match the same schema. The default definition is:

Tables.partitions(x) = (x,)

So that any input is assumed to be a single "table". This means users should feel free to call Tables.partitions anywhere they're currently calling Tables.columns or Tables.rows, and get back an iterator of those instead. In other words, "sink" functions can use Tables.partitions whether or not the user passes a partionable table, since the default is to treat a single input as a single, non-partitioned table.

Tables.partitioner(itr) is a convenience wrapper to provide table partitions from any table iterator; this allows for easy wrapping of a Vector or iterator of tables as valid partitions, since by default, they'd be treated as a single table.

A 2nd convenience method is provided with the definition:

Tables.partitions(x...) = x

That allows passing vararg tables and they'll be treated as separate partitions. Sink functions may allow vararg table inputs and can "splat them through" to partitions.

For convenience, Tables.partitions(x::Iterators.PartitionIterator) = x and Tables.partitions(x::Tables.Partitioner) = x are defined to handle cases where user created partitioning with the Iterators.partition or Tables.partitioner functions.

source
Tables.partitionerFunction
Tables.partitioner(f, itr)
+end

Where names is a tuple of Symbols or nothing, and types is a tuple type of types (like Tuple{Int64, Float64, String}) or nothing. Encoding the names & types as type parameters allows convenient use of the type in generated functions and other optimization use-cases, but users should note that when names and/or types are the nothing value, the names and/or types are stored in the storednames and storedtypes fields. This is to account for extremely wide tables with columns in the 10s of thousands where encoding the names/types as type parameters becomes prohibitive to the compiler. So while optimizations can be written on the typed names/types type parameters, users should also consider handling the extremely wide tables by specializing on Tables.Schema{nothing, nothing}.

source
Tables.schemaFunction
Tables.schema(x) => Union{Nothing, Tables.Schema}

Attempt to retrieve the schema of the object returned by Tables.rows or Tables.columns. If the AbstractRow iterator or AbstractColumns object can't determine its schema, nothing will be returned. Otherwise, a Tables.Schema object is returned, with the column names and types available for use.

source
Tables.subsetFunction
Tables.subset(x, inds; viewhint=nothing)

Return one or more rows from table x according to the position(s) specified by inds:

  • If inds is a single non-boolean integer return a row object.
  • If inds is a vector of non-boolean integers, a vector of booleans, or a :, return a subset of the original table according to the indices. In this case, the returned type is not necessarily the same as the original table type.

If other types of inds are passed than specified above the behavior is undefined.

The viewhint argument tries to influence whether the returned object is a view of the original table or an independent copy:

  • If viewhint=nothing (the default) then the implementation for a specific table type is free to decide whether to return a copy or a view.
  • If viewhint=true then a view is returned and if viewhint=false a copy is returned. This applies both to returning a row or a table.

Any specialized implementation of subset must support the viewhint=nothing argument. Support for viewhint=true or viewhint=false is optional (i.e. implementations may ignore the keyword argument and return a view or a copy regardless of viewhint value).

source
Tables.partitionsFunction
Tables.partitions(x)

Request a "table" iterator from x. Each iterated element must be a "table" in the sense that one may call Tables.rows or Tables.columns to get a row-iterator or collection of columns. All iterated elements must have identical schema, so that users may call Tables.schema(first_element) on the first iterated element and know that each subsequent iteration will match the same schema. The default definition is:

Tables.partitions(x) = (x,)

So that any input is assumed to be a single "table". This means users should feel free to call Tables.partitions anywhere they're currently calling Tables.columns or Tables.rows, and get back an iterator of those instead. In other words, "sink" functions can use Tables.partitions whether or not the user passes a partionable table, since the default is to treat a single input as a single, non-partitioned table.

Tables.partitioner(itr) is a convenience wrapper to provide table partitions from any table iterator; this allows for easy wrapping of a Vector or iterator of tables as valid partitions, since by default, they'd be treated as a single table.

A 2nd convenience method is provided with the definition:

Tables.partitions(x...) = x

That allows passing vararg tables and they'll be treated as separate partitions. Sink functions may allow vararg table inputs and can "splat them through" to partitions.

For convenience, Tables.partitions(x::Iterators.PartitionIterator) = x and Tables.partitions(x::Tables.Partitioner) = x are defined to handle cases where user created partitioning with the Iterators.partition or Tables.partitioner functions.

source
Tables.partitionerFunction
Tables.partitioner(f, itr)
 Tables.partitioner(x)

Convenience methods to generate table iterators. The first method takes a "materializer" function f and an iterator itr, and will call Tables.LazyTable(f, x) for x in itr for each iteration. This allows delaying table materialization until Tables.columns or Tables.rows are called on the LazyTable object (which will call f(x)). This allows a common desired pattern of materializing and processing a table on a remote process or thread, like:

for tbl in Tables.partitions(Tables.partitioner(CSV.File, list_of_csv_files))
     Threads.@spawn begin
         cols = Tables.columns(tbl)
         # do stuff with cols
     end
-end

The second method is provided because the default behavior of Tables.partition(x) is to treat x as a single, non-partitioned table. This method allows users to easily wrap a Vector or generator of tables as table partitions to pass to sink functions able to utilize Tables.partitions.

source
Tables.rowtableFunction
Tables.rowtable(x) => Vector{NamedTuple}

Take any input table source, and produce a Vector of NamedTuples, also known as a "row table". A "row table" is a kind of default table type of sorts, since it satisfies the Tables.jl row interface naturally, i.e. a Vector naturally iterates its elements, and NamedTuple satisfies the AbstractRow interface by default (allows indexing value by index, name, and getting all names).

For a lazy iterator over rows see rows and namedtupleiterator.

Not for use with extremely wide tables with # of columns > 67K; current fundamental compiler limits prevent constructing NamedTuples that large.

source
Tables.columntableFunction
Tables.columntable(x) => NamedTuple of AbstractVectors

Takes any input table source x and returns a NamedTuple of AbstractVectors, also known as a "column table". A "column table" is a kind of default table type of sorts, since it satisfies the Tables.jl column interface naturally.

Note that if x is an object in which columns are stored as vectors, the check that these vectors use 1-based indexing is not performed (it should be ensured when x is constructed).

Not for use with extremely wide tables with # of columns > 67K; current fundamental compiler limits prevent constructing NamedTuples that large.

source
Tables.dictrowtableFunction
Tables.dictrowtable(x) => Tables.DictRowTable

Take any Tables.jl-compatible source x and return a DictRowTable, which can be thought of as a Vector of OrderedDict rows mapping column names as Symbols to values. The order of the input table columns is preserved via the Tables.schema(::DictRowTable).

For "schema-less" input tables, dictrowtable employs a "column unioning" behavior, as opposed to inferring the schema from the first row like Tables.columns. This means that as rows are iterated, each value from the row is joined into an aggregate final set of columns. This is especially useful when input table rows may not include columns if the value is missing, instead of including an actual value missing, which is common in json, for example. This results in a performance cost tracking all seen values and inferring the final unioned schemas, so it's recommended to use only when the union behavior is needed.

source
Tables.dictcolumntableFunction
Tables.dictcolumntable(x) => Tables.DictColumnTable

Take any Tables.jl-compatible source x and return a DictColumnTable, which can be thought of as a OrderedDict mapping column names as Symbols to AbstractVectors. The order of the input table columns is preserved via the Tables.schema(::DictColumnTable).

For "schema-less" input tables, dictcolumntable employs a "column unioning" behavior, as opposed to inferring the schema from the first row like Tables.columns. This means that as rows are iterated, each value from the row is joined into an aggregate final set of columns. This is especially useful when input table rows may not include columns if the value is missing, instead of including an actual value missing, which is common in json, for example. This results in a performance cost tracking all seen values and inferring the final unioned schemas, so it's recommended to use only when needed.

source
Tables.namedtupleiteratorFunction
Tables.namedtupleiterator(x)

Pass any table input source and return a NamedTuple iterator

See also rows and rowtable.

Not for use with extremely wide tables with # of columns > 67K; current fundamental compiler limits prevent constructing NamedTuples that large.

source
Tables.datavaluerowsFunction
Tables.datavaluerows(x) => NamedTuple iterator

Takes any table input x and returns a NamedTuple iterator that will replace missing values with DataValue-wrapped values; this allows any table type to satisfy the TableTraits.jl Queryverse integration interface by defining:

IteratorInterfaceExtensions.getiterator(x::MyTable) = Tables.datavaluerows(x)
source
Tables.nondatavaluerowsFunction
Tables.nondatavaluerows(x)

Takes any Queryverse-compatible NamedTuple iterator source and converts to a Tables.jl-compatible AbstractRow iterator. Will automatically unwrap any DataValues, replacing NA with missing. Useful for translating Query.jl results back to non-DataValue-based tables.

source
Tables.tableFunction
Tables.table(m::AbstractVecOrMat; [header])

Wrap an AbstractVecOrMat (Matrix, Vector, Adjoint, etc.) in a MatrixTable, which satisfies the Tables.jl interface. (An AbstractVector is treated as a 1-column matrix.) This allows accessing the matrix via Tables.rows and Tables.columns. An optional keyword argument iterator header can be passed which will be converted to a Vector{Symbol} to be used as the column names. Note that no copy of the AbstractVecOrMat is made.

source
Tables.matrixFunction
Tables.matrix(table; transpose::Bool=false)

Materialize any table source input as a new Matrix or in the case of a MatrixTable return the originally wrapped matrix. If the table column element types are not homogeneous, they will be promoted to a common type in the materialized Matrix. Note that column names are ignored in the conversion. By default, input table columns will be materialized as corresponding matrix columns; passing transpose=true will transpose the input with input columns as matrix rows or in the case of a MatrixTable apply permutedims to the originally wrapped matrix.

source
Tables.eachcolumnFunction
Tables.eachcolumn(f, sch::Tables.Schema{names, types}, x::Union{Tables.AbstractRow, Tables.AbstractColumns})
+end

The second method is provided because the default behavior of Tables.partition(x) is to treat x as a single, non-partitioned table. This method allows users to easily wrap a Vector or generator of tables as table partitions to pass to sink functions able to utilize Tables.partitions.

source
Tables.rowtableFunction
Tables.rowtable(x) => Vector{NamedTuple}

Take any input table source, and produce a Vector of NamedTuples, also known as a "row table". A "row table" is a kind of default table type of sorts, since it satisfies the Tables.jl row interface naturally, i.e. a Vector naturally iterates its elements, and NamedTuple satisfies the AbstractRow interface by default (allows indexing value by index, name, and getting all names).

For a lazy iterator over rows see rows and namedtupleiterator.

Not for use with extremely wide tables with # of columns > 67K; current fundamental compiler limits prevent constructing NamedTuples that large.

source
Tables.columntableFunction
Tables.columntable(x) => NamedTuple of AbstractVectors

Takes any input table source x and returns a NamedTuple of AbstractVectors, also known as a "column table". A "column table" is a kind of default table type of sorts, since it satisfies the Tables.jl column interface naturally.

Note that if x is an object in which columns are stored as vectors, the check that these vectors use 1-based indexing is not performed (it should be ensured when x is constructed).

Not for use with extremely wide tables with # of columns > 67K; current fundamental compiler limits prevent constructing NamedTuples that large.

source
Tables.dictrowtableFunction
Tables.dictrowtable(x) => Tables.DictRowTable

Take any Tables.jl-compatible source x and return a DictRowTable, which can be thought of as a Vector of OrderedDict rows mapping column names as Symbols to values. The order of the input table columns is preserved via the Tables.schema(::DictRowTable).

For "schema-less" input tables, dictrowtable employs a "column unioning" behavior, as opposed to inferring the schema from the first row like Tables.columns. This means that as rows are iterated, each value from the row is joined into an aggregate final set of columns. This is especially useful when input table rows may not include columns if the value is missing, instead of including an actual value missing, which is common in json, for example. This results in a performance cost tracking all seen values and inferring the final unioned schemas, so it's recommended to use only when the union behavior is needed.

source
Tables.dictcolumntableFunction
Tables.dictcolumntable(x) => Tables.DictColumnTable

Take any Tables.jl-compatible source x and return a DictColumnTable, which can be thought of as a OrderedDict mapping column names as Symbols to AbstractVectors. The order of the input table columns is preserved via the Tables.schema(::DictColumnTable).

For "schema-less" input tables, dictcolumntable employs a "column unioning" behavior, as opposed to inferring the schema from the first row like Tables.columns. This means that as rows are iterated, each value from the row is joined into an aggregate final set of columns. This is especially useful when input table rows may not include columns if the value is missing, instead of including an actual value missing, which is common in json, for example. This results in a performance cost tracking all seen values and inferring the final unioned schemas, so it's recommended to use only when needed.

source
Tables.namedtupleiteratorFunction
Tables.namedtupleiterator(x)

Pass any table input source and return a NamedTuple iterator

See also rows and rowtable.

Not for use with extremely wide tables with # of columns > 67K; current fundamental compiler limits prevent constructing NamedTuples that large.

source
Tables.datavaluerowsFunction
Tables.datavaluerows(x) => NamedTuple iterator

Takes any table input x and returns a NamedTuple iterator that will replace missing values with DataValue-wrapped values; this allows any table type to satisfy the TableTraits.jl Queryverse integration interface by defining:

IteratorInterfaceExtensions.getiterator(x::MyTable) = Tables.datavaluerows(x)
source
Tables.nondatavaluerowsFunction
Tables.nondatavaluerows(x)

Takes any Queryverse-compatible NamedTuple iterator source and converts to a Tables.jl-compatible AbstractRow iterator. Will automatically unwrap any DataValues, replacing NA with missing. Useful for translating Query.jl results back to non-DataValue-based tables.

source
Tables.tableFunction
Tables.table(m::AbstractVecOrMat; [header])

Wrap an AbstractVecOrMat (Matrix, Vector, Adjoint, etc.) in a MatrixTable, which satisfies the Tables.jl interface. (An AbstractVector is treated as a 1-column matrix.) This allows accessing the matrix via Tables.rows and Tables.columns. An optional keyword argument iterator header can be passed which will be converted to a Vector{Symbol} to be used as the column names. Note that no copy of the AbstractVecOrMat is made.

source
Tables.matrixFunction
Tables.matrix(table; transpose::Bool=false)

Materialize any table source input as a new Matrix or in the case of a MatrixTable return the originally wrapped matrix. If the table column element types are not homogeneous, they will be promoted to a common type in the materialized Matrix. Note that column names are ignored in the conversion. By default, input table columns will be materialized as corresponding matrix columns; passing transpose=true will transpose the input with input columns as matrix rows or in the case of a MatrixTable apply permutedims to the originally wrapped matrix.

source
Tables.eachcolumnFunction
Tables.eachcolumn(f, sch::Tables.Schema{names, types}, x::Union{Tables.AbstractRow, Tables.AbstractColumns})
 Tables.eachcolumn(f, sch::Tables.Schema{names, nothing}, x::Union{Tables.AbstractRow, Tables.AbstractColumns})

Takes a function f, table schema sch, x, which is an object that satisfies the AbstractRow or AbstractColumns interfaces; it generates calls to get the value for each column (Tables.getcolumn(x, nm)) and then calls f(val, index, name), where f is the user-provided function, val is the column value (AbstractRow) or entire column (AbstractColumns), index is the column index as an Int, and name is the column name as a Symbol.

An example using Tables.eachcolumn is:

rows = Tables.rows(tbl)
 sch = Tables.schema(rows)
 if sch === nothing
@@ -136,8 +136,8 @@
             bind!(stmt, i, val)
         end
     end
-end

Note in this example we account for the input table potentially returning nothing from Tables.schema(rows); in that case, we start iterating the rows, and build a partial schema using the column names from the first row sch = Tables.schema(Tables.columnnames(row), nothing), which is valid to pass to Tables.eachcolumn.

source
Tables.materializerFunction
Tables.materializer(x) => Callable

For a table input, return the "sink" function or "materializing" function that can take a Tables.jl-compatible table input and make an instance of the table type. This enables "transform" workflows that take table inputs, apply transformations, potentially converting the table to a different form, and end with producing a table of the same type as the original input. The default materializer is Tables.columntable, which converts any table input into a NamedTuple of Vectors.

It is recommended that for users implementing MyType, they define only materializer(::Type{<:MyType}). materializer(::MyType) will then automatically delegate to this method.

source
Tables.columnindexFunction
Tables.columnindex(table, name::Symbol)

Return the column index (1-based) of a column by name in a table with a known schema; returns 0 if name doesn't exist in table

source

given names and a Symbol name, compute the index (1-based) of the name in names

source
Tables.columntypeFunction
Tables.columntype(table, name::Symbol)

Return the column element type of a column by name in a table with a known schema; returns Union{} if name doesn't exist in table

source

given tuple type and a Symbol name, compute the type of the name in the tuples types

source
Tables.rowmergeFunction
rowmerge(row, other_rows...)
-rowmerge(row; fields_to_merge...)

Return a NamedTuple by merging row (an AbstractRow-compliant value) with other_rows (one or more AbstractRow-compliant values) via Base.merge. This function is similar to Base.merge(::NamedTuple, ::NamedTuple...), but accepts AbstractRow-compliant values instead of NamedTuples.

A convenience method rowmerge(row; fields_to_merge...) = rowmerge(row, fields_to_merge) is defined that enables the fields_to_merge to be specified as keyword arguments.

source
Tables.RowType
Tables.Row(row)

Convenience type to wrap any AbstractRow interface object in a dedicated struct to provide useful default behaviors (allows any AbstractRow to be used like a NamedTuple):

  • Indexing interface defined; i.e. row[i] will return the column value at index i, row[nm] will return column value for column name nm
  • Property access interface defined; i.e. row.col1 will retrieve the value for the column named col1
  • Iteration interface defined; i.e. for x in row will iterate each column value in the row
  • AbstractDict methods defined (get, haskey, etc.) for checking and retrieving column values
source
Tables.ColumnsType
Tables.Columns(tbl)

Convenience type that calls Tables.columns on an input tbl and wraps the resulting AbstractColumns interface object in a dedicated struct to provide useful default behaviors (allows any AbstractColumns to be used like a NamedTuple of Vectors):

  • Indexing interface defined; i.e. row[i] will return the column at index i, row[nm] will return column for column name nm
  • Property access interface defined; i.e. row.col1 will retrieve the value for the column named col1
  • Iteration interface defined; i.e. for x in row will iterate each column in the row
  • AbstractDict methods defined (get, haskey, etc.) for checking and retrieving columns

Note that Tables.Columns calls Tables.columns internally on the provided table argument. Tables.Columns can be used for dispatch if needed.

source

Implementing the Interface (i.e. becoming a Tables.jl source)

Now that we've seen how one uses the Tables.jl interface, let's walk-through how to implement it; i.e. how can I make my custom type valid for Tables.jl consumers?

For a type MyTable, the interface to becoming a proper table is straightforward:

Required MethodsDefault DefinitionBrief Description
Tables.istable(::Type{MyTable})Declare that your table type implements the interface
One of:
Tables.rowaccess(::Type{MyTable})Declare that your table type defines a Tables.rows(::MyTable) method
Tables.rows(x::MyTable)Return an Tables.AbstractRow-compatible iterator from your table
Or:
Tables.columnaccess(::Type{MyTable})Declare that your table type defines a Tables.columns(::MyTable) method
Tables.columns(x::MyTable)Return an Tables.AbstractColumns-compatible object from your table
Optional methods
Tables.schema(x::MyTable)Tables.schema(x) = nothingReturn a Tables.Schema object from your Tables.AbstractRow iterator or Tables.AbstractColumns object; or nothing for unknown schema
Tables.materializer(::Type{MyTable})Tables.columntableDeclare a "materializer" sink function for your table type that can construct an instance of your type from any Tables.jl input
Tables.subset(x::MyTable, inds; viewhint)Return a row or a sub-table of the original table
DataAPI.nrow(x::MyTable)Return number of rows of table x
DataAPI.ncol(x::MyTable)Return number of columns of table x

Based on whether your table type has defined Tables.rows or Tables.columns, you then ensure that the Tables.AbstractRow iterator or Tables.AbstractColumns object satisfies the respective interface.

As an additional source of documentation, see this discourse post outlining in detail a walk-through of making a row-oriented table.

Tables.AbstractRow

Tables.AbstractRowType
Tables.AbstractRow

Abstract interface type representing the expected eltype of the iterator returned from Tables.rows(table). Tables.rows must return an iterator of elements that satisfy the Tables.AbstractRow interface. While Tables.AbstractRow is an abstract type that custom "row" types may subtype for useful default behavior (indexing, iteration, property-access, etc.), users should not use it for dispatch, as Tables.jl interface objects are not required to subtype, but only implement the required interface methods.

Interface definition:

Required MethodsDefault DefinitionBrief Description
Tables.getcolumn(row, i::Int)getfield(row, i)Retrieve a column value by index
Tables.getcolumn(row, nm::Symbol)getproperty(row, nm)Retrieve a column value by name
Tables.columnnames(row)propertynames(row)Return column names for a row as a 1-based indexable collection
Optional methods
Tables.getcolumn(row, ::Type{T}, i::Int, nm::Symbol)Tables.getcolumn(row, nm)Given a column element type T, index i, and column name nm, retrieve the column value. Provides a type-stable or even constant-prop-able mechanism for efficiency.

Note that subtypes of Tables.AbstractRow must overload all required methods listed above instead of relying on these methods' default definitions.

While custom row types aren't required to subtype Tables.AbstractRow, benefits of doing so include:

  • Indexing interface defined (using getcolumn); i.e. row[i] will return the column value at index i
  • Property access interface defined (using columnnames and getcolumn); i.e. row.col1 will retrieve the value for the column named col1
  • Iteration interface defined; i.e. for x in row will iterate each column value in the row
  • AbstractDict methods defined (get, haskey, etc.) for checking and retrieving column values
  • A default show method

This allows the custom row type to behave as close as possible to a builtin NamedTuple object.

source

Tables.AbstractColumns

Tables.AbstractColumnsType
Tables.AbstractColumns

An interface type defined as an ordered set of columns that support retrieval of individual columns by name or index. A retrieved column must be a 1-based indexable collection with known length, i.e. an object that supports length(col) and col[i] for any i = 1:length(col). Tables.columns must return an object that satisfies the Tables.AbstractColumns interface. While Tables.AbstractColumns is an abstract type that custom "columns" types may subtype for useful default behavior (indexing, iteration, property-access, etc.), users should not use it for dispatch, as Tables.jl interface objects are not required to subtype, but only implement the required interface methods.

Interface definition:

Required MethodsDefault DefinitionBrief Description
Tables.getcolumn(table, i::Int)getfield(table, i)Retrieve a column by index
Tables.getcolumn(table, nm::Symbol)getproperty(table, nm)Retrieve a column by name
Tables.columnnames(table)propertynames(table)Return column names for a table as a 1-based indexable collection
Optional methods
Tables.getcolumn(table, ::Type{T}, i::Int, nm::Symbol)Tables.getcolumn(table, nm)Given a column eltype T, index i, and column name nm, retrieve the column. Provides a type-stable or even constant-prop-able mechanism for efficiency.

Note that subtypes of Tables.AbstractColumns must overload all required methods listed above instead of relying on these methods' default definitions.

While types aren't required to subtype Tables.AbstractColumns, benefits of doing so include:

  • Indexing interface defined (using getcolumn); i.e. tbl[i] will retrieve the column at index i
  • Property access interface defined (using columnnames and getcolumn); i.e. tbl.col1 will retrieve column named col1
  • Iteration interface defined; i.e. for col in table will iterate each column in the table
  • AbstractDict methods defined (get, haskey, etc.) for checking and retrieving columns
  • A default show method

This allows a custom table type to behave as close as possible to a builtin NamedTuple of vectors object.

source

Implementation Example

As an extended example, let's take a look at some code defined in Tables.jl for treating AbstractVecOrMats as tables.

First, we define a special MatrixTable type that will wrap an AbstractVecOrMat, and allow easy overloading for the Tables.jl interface.

struct MatrixTable{T <: AbstractVecOrMat} <: Tables.AbstractColumns
+end

Note in this example we account for the input table potentially returning nothing from Tables.schema(rows); in that case, we start iterating the rows, and build a partial schema using the column names from the first row sch = Tables.schema(Tables.columnnames(row), nothing), which is valid to pass to Tables.eachcolumn.

source
Tables.materializerFunction
Tables.materializer(x) => Callable

For a table input, return the "sink" function or "materializing" function that can take a Tables.jl-compatible table input and make an instance of the table type. This enables "transform" workflows that take table inputs, apply transformations, potentially converting the table to a different form, and end with producing a table of the same type as the original input. The default materializer is Tables.columntable, which converts any table input into a NamedTuple of Vectors.

It is recommended that for users implementing MyType, they define only materializer(::Type{<:MyType}). materializer(::MyType) will then automatically delegate to this method.

source
Tables.columnindexFunction
Tables.columnindex(table, name::Symbol)

Return the column index (1-based) of a column by name in a table with a known schema; returns 0 if name doesn't exist in table

source

given names and a Symbol name, compute the index (1-based) of the name in names

source
Tables.columntypeFunction
Tables.columntype(table, name::Symbol)

Return the column element type of a column by name in a table with a known schema; returns Union{} if name doesn't exist in table

source

given tuple type and a Symbol name, compute the type of the name in the tuples types

source
Tables.rowmergeFunction
rowmerge(row, other_rows...)
+rowmerge(row; fields_to_merge...)

Return a NamedTuple by merging row (an AbstractRow-compliant value) with other_rows (one or more AbstractRow-compliant values) via Base.merge. This function is similar to Base.merge(::NamedTuple, ::NamedTuple...), but accepts AbstractRow-compliant values instead of NamedTuples.

A convenience method rowmerge(row; fields_to_merge...) = rowmerge(row, fields_to_merge) is defined that enables the fields_to_merge to be specified as keyword arguments.

source
Tables.RowType
Tables.Row(row)

Convenience type to wrap any AbstractRow interface object in a dedicated struct to provide useful default behaviors (allows any AbstractRow to be used like a NamedTuple):

  • Indexing interface defined; i.e. row[i] will return the column value at index i, row[nm] will return column value for column name nm
  • Property access interface defined; i.e. row.col1 will retrieve the value for the column named col1
  • Iteration interface defined; i.e. for x in row will iterate each column value in the row
  • AbstractDict methods defined (get, haskey, etc.) for checking and retrieving column values
source
Tables.ColumnsType
Tables.Columns(tbl)

Convenience type that calls Tables.columns on an input tbl and wraps the resulting AbstractColumns interface object in a dedicated struct to provide useful default behaviors (allows any AbstractColumns to be used like a NamedTuple of Vectors):

  • Indexing interface defined; i.e. row[i] will return the column at index i, row[nm] will return column for column name nm
  • Property access interface defined; i.e. row.col1 will retrieve the value for the column named col1
  • Iteration interface defined; i.e. for x in row will iterate each column in the row
  • AbstractDict methods defined (get, haskey, etc.) for checking and retrieving columns

Note that Tables.Columns calls Tables.columns internally on the provided table argument. Tables.Columns can be used for dispatch if needed.

source

Implementing the Interface (i.e. becoming a Tables.jl source)

Now that we've seen how one uses the Tables.jl interface, let's walk-through how to implement it; i.e. how can I make my custom type valid for Tables.jl consumers?

For a type MyTable, the interface to becoming a proper table is straightforward:

Required MethodsDefault DefinitionBrief Description
Tables.istable(::Type{MyTable})Declare that your table type implements the interface
One of:
Tables.rowaccess(::Type{MyTable})Declare that your table type defines a Tables.rows(::MyTable) method
Tables.rows(x::MyTable)Return an Tables.AbstractRow-compatible iterator from your table
Or:
Tables.columnaccess(::Type{MyTable})Declare that your table type defines a Tables.columns(::MyTable) method
Tables.columns(x::MyTable)Return an Tables.AbstractColumns-compatible object from your table
Optional methods
Tables.schema(x::MyTable)Tables.schema(x) = nothingReturn a Tables.Schema object from your Tables.AbstractRow iterator or Tables.AbstractColumns object; or nothing for unknown schema
Tables.materializer(::Type{MyTable})Tables.columntableDeclare a "materializer" sink function for your table type that can construct an instance of your type from any Tables.jl input
Tables.subset(x::MyTable, inds; viewhint)Return a row or a sub-table of the original table
DataAPI.nrow(x::MyTable)Return number of rows of table x
DataAPI.ncol(x::MyTable)Return number of columns of table x

Based on whether your table type has defined Tables.rows or Tables.columns, you then ensure that the Tables.AbstractRow iterator or Tables.AbstractColumns object satisfies the respective interface.

As an additional source of documentation, see this discourse post outlining in detail a walk-through of making a row-oriented table.

Tables.AbstractRow

Tables.AbstractRowType
Tables.AbstractRow

Abstract interface type representing the expected eltype of the iterator returned from Tables.rows(table). Tables.rows must return an iterator of elements that satisfy the Tables.AbstractRow interface. While Tables.AbstractRow is an abstract type that custom "row" types may subtype for useful default behavior (indexing, iteration, property-access, etc.), users should not use it for dispatch, as Tables.jl interface objects are not required to subtype, but only implement the required interface methods.

Interface definition:

Required MethodsDefault DefinitionBrief Description
Tables.getcolumn(row, i::Int)getfield(row, i)Retrieve a column value by index
Tables.getcolumn(row, nm::Symbol)getproperty(row, nm)Retrieve a column value by name
Tables.columnnames(row)propertynames(row)Return column names for a row as a 1-based indexable collection
Optional methods
Tables.getcolumn(row, ::Type{T}, i::Int, nm::Symbol)Tables.getcolumn(row, nm)Given a column element type T, index i, and column name nm, retrieve the column value. Provides a type-stable or even constant-prop-able mechanism for efficiency.

Note that subtypes of Tables.AbstractRow must overload all required methods listed above instead of relying on these methods' default definitions.

While custom row types aren't required to subtype Tables.AbstractRow, benefits of doing so include:

  • Indexing interface defined (using getcolumn); i.e. row[i] will return the column value at index i
  • Property access interface defined (using columnnames and getcolumn); i.e. row.col1 will retrieve the value for the column named col1
  • Iteration interface defined; i.e. for x in row will iterate each column value in the row
  • AbstractDict methods defined (get, haskey, etc.) for checking and retrieving column values
  • A default show method

This allows the custom row type to behave as close as possible to a builtin NamedTuple object.

source

Tables.AbstractColumns

Tables.AbstractColumnsType
Tables.AbstractColumns

An interface type defined as an ordered set of columns that support retrieval of individual columns by name or index. A retrieved column must be a 1-based indexable collection with known length, i.e. an object that supports length(col) and col[i] for any i = 1:length(col). Tables.columns must return an object that satisfies the Tables.AbstractColumns interface. While Tables.AbstractColumns is an abstract type that custom "columns" types may subtype for useful default behavior (indexing, iteration, property-access, etc.), users should not use it for dispatch, as Tables.jl interface objects are not required to subtype, but only implement the required interface methods.

Interface definition:

Required MethodsDefault DefinitionBrief Description
Tables.getcolumn(table, i::Int)getfield(table, i)Retrieve a column by index
Tables.getcolumn(table, nm::Symbol)getproperty(table, nm)Retrieve a column by name
Tables.columnnames(table)propertynames(table)Return column names for a table as a 1-based indexable collection
Optional methods
Tables.getcolumn(table, ::Type{T}, i::Int, nm::Symbol)Tables.getcolumn(table, nm)Given a column eltype T, index i, and column name nm, retrieve the column. Provides a type-stable or even constant-prop-able mechanism for efficiency.

Note that subtypes of Tables.AbstractColumns must overload all required methods listed above instead of relying on these methods' default definitions.

While types aren't required to subtype Tables.AbstractColumns, benefits of doing so include:

  • Indexing interface defined (using getcolumn); i.e. tbl[i] will retrieve the column at index i
  • Property access interface defined (using columnnames and getcolumn); i.e. tbl.col1 will retrieve column named col1
  • Iteration interface defined; i.e. for col in table will iterate each column in the table
  • AbstractDict methods defined (get, haskey, etc.) for checking and retrieving columns
  • A default show method

This allows a custom table type to behave as close as possible to a builtin NamedTuple of vectors object.

source

Implementation Example

As an extended example, let's take a look at some code defined in Tables.jl for treating AbstractVecOrMats as tables.

First, we define a special MatrixTable type that will wrap an AbstractVecOrMat, and allow easy overloading for the Tables.jl interface.

struct MatrixTable{T <: AbstractVecOrMat} <: Tables.AbstractColumns
     names::Vector{Symbol}
     lookup::Dict{Symbol, Int}
     matrix::T
@@ -180,7 +180,7 @@
     getfield(getfield(m, :source), :matrix)[getfield(m, :row), i]
 getcolumn(m::MatrixRow, nm::Symbol) =
     getfield(getfield(m, :source), :matrix)[getfield(m, :row), getfield(getfield(m, :source), :lookup)[nm]]
-columnnames(m::MatrixRow) = names(getfield(m, :source))

Here we start by defining Tables.rowaccess and Tables.rows, and then the iteration interface methods, since we declared that a MatrixTable itself is an iterator of Tables.AbstractRow-compatible objects. For eltype, we say that a MatrixTable iterates our own custom row type, MatrixRow. MatrixRow subtypes Tables.AbstractRow, which provides interface implementations for several useful behaviors (indexing, iteration, property-access, etc.); essentially it makes our custom MatrixRow type more convenient to work with.

Implementing the Tables.AbstractRow interface is straightforward, and very similar to our implementation of Tables.AbstractColumns previously (i.e. the same methods for getcolumn and columnnames).

And that's it. Our MatrixTable type is now a fully fledged, valid Tables.jl source and can be used throughout the ecosystem. Now, this is obviously not a lot of code; but then again, the actual Tables.jl interface implementations tend to be fairly simple, given the other behaviors that are already defined for table types (i.e. table types tend to already have a getcolumn like function defined).

Tables.isrowtable

One option for certain table types is to define Tables.isrowtable to automatically satisfy the Tables.jl interface. This can be convenient for "natural" table types that already iterate rows.

Tables.isrowtableFunction
Tables.isrowtable(x) => Bool

For convenience, some table objects that are naturally "row oriented" can define Tables.isrowtable(::Type{TableType}) = true to simplify satisfying the Tables.jl interface. Requirements for defining isrowtable include:

  • Tables.rows(x) === x, i.e. the table object itself is a Row iterator
  • If the table object is mutable, it should support:
    • push!(x, row): allow pushing a single row onto table
    • append!(x, rows): allow appending set of rows onto table
  • If table object is mutable and indexable, it should support:
    • x[i] = row: allow replacing of a row with another row by index

A table object that defines Tables.isrowtable will have definitions for Tables.istable, Tables.rowaccess, and Tables.rows automatically defined.

source

Testing Tables.jl Implementations

One question that comes up is what the best strategies are for testing a Tables.jl implementation. Continuing with our MatrixTable example, let's see some useful ways to test that things are working as expected.

mat = [1 4.0 "7"; 2 5.0 "8"; 3 6.0 "9"]

First, we define a matrix literal with three columns of various differently typed values.

# first, create a MatrixTable from our matrix input
+columnnames(m::MatrixRow) = names(getfield(m, :source))

Here we start by defining Tables.rowaccess and Tables.rows, and then the iteration interface methods, since we declared that a MatrixTable itself is an iterator of Tables.AbstractRow-compatible objects. For eltype, we say that a MatrixTable iterates our own custom row type, MatrixRow. MatrixRow subtypes Tables.AbstractRow, which provides interface implementations for several useful behaviors (indexing, iteration, property-access, etc.); essentially it makes our custom MatrixRow type more convenient to work with.

Implementing the Tables.AbstractRow interface is straightforward, and very similar to our implementation of Tables.AbstractColumns previously (i.e. the same methods for getcolumn and columnnames).

And that's it. Our MatrixTable type is now a fully fledged, valid Tables.jl source and can be used throughout the ecosystem. Now, this is obviously not a lot of code; but then again, the actual Tables.jl interface implementations tend to be fairly simple, given the other behaviors that are already defined for table types (i.e. table types tend to already have a getcolumn like function defined).

Tables.isrowtable

One option for certain table types is to define Tables.isrowtable to automatically satisfy the Tables.jl interface. This can be convenient for "natural" table types that already iterate rows.

Tables.isrowtableFunction
Tables.isrowtable(x) => Bool

For convenience, some table objects that are naturally "row oriented" can define Tables.isrowtable(::Type{TableType}) = true to simplify satisfying the Tables.jl interface. Requirements for defining isrowtable include:

  • Tables.rows(x) === x, i.e. the table object itself is a Row iterator
  • If the table object is mutable, it should support:
    • push!(x, row): allow pushing a single row onto table
    • append!(x, rows): allow appending set of rows onto table
  • If table object is mutable and indexable, it should support:
    • x[i] = row: allow replacing of a row with another row by index

A table object that defines Tables.isrowtable will have definitions for Tables.istable, Tables.rowaccess, and Tables.rows automatically defined.

source

Testing Tables.jl Implementations

One question that comes up is what the best strategies are for testing a Tables.jl implementation. Continuing with our MatrixTable example, let's see some useful ways to test that things are working as expected.

mat = [1 4.0 "7"; 2 5.0 "8"; 3 6.0 "9"]

First, we define a matrix literal with three columns of various differently typed values.

# first, create a MatrixTable from our matrix input
 mattbl = Tables.table(mat)
 # test that the MatrixTable `istable`
 @test Tables.istable(typeof(mattbl))
@@ -223,4 +223,4 @@
 # and same for a row table
 tbl2 = Tables.table(mat2) |> rowtable
 @test length(tbl2) == 3
-@test map(x->x.Column1, tbl2) == [1.0, 2.0, 3.0]
+@test map(x->x.Column1, tbl2) == [1.0, 2.0, 3.0] diff --git a/dev/search/index.html b/dev/search/index.html index 75dde7e..7d58b70 100644 --- a/dev/search/index.html +++ b/dev/search/index.html @@ -1,2 +1,2 @@ -Search · Tables.jl

Loading search...

    +Search · Tables.jl

    Loading search...