Skip to content

Commit

Permalink
Add is_transformable (#50)
Browse files Browse the repository at this point in the history
* Add is_transformable

* Add to documentation API

* Update docs

* Fix docs job

* Rename transforms.jl to transform.jl
  • Loading branch information
nicoleepp authored Mar 17, 2021
1 parent 3956c4f commit ab6ab74
Show file tree
Hide file tree
Showing 10 changed files with 80 additions and 34 deletions.
2 changes: 1 addition & 1 deletion Project.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name = "FeatureTransforms"
uuid = "8fd68953-04b8-4117-ac19-158bf6de9782"
authors = ["Invenia Technical Computing Corporation"]
version = "0.2.2"
version = "0.2.3"

[deps]
Dates = "ade2ca70-3891-5945-98fb-dc099432e06a"
Expand Down
2 changes: 1 addition & 1 deletion docs/Manifest.toml
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ version = "0.26.1"
deps = ["Dates", "Statistics", "Tables"]
path = ".."
uuid = "8fd68953-04b8-4117-ac19-158bf6de9782"
version = "0.1.0"
version = "0.2.3"

[[Formatting]]
deps = ["Printf"]
Expand Down
1 change: 1 addition & 0 deletions docs/src/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ OneHotEncoding
```@docs
FeatureTransforms.apply
FeatureTransforms.apply!
FeatureTransforms.is_transformable
FeatureTransforms.transform!
FeatureTransforms.transform
```
2 changes: 1 addition & 1 deletion docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ There are three key parts of the Transforms.jl API:

* Subtypes of [`Transform`](@ref about-transforms) define transformations of data, for example normalization or a periodic function.
* The `apply` and `apply!` methods transform data according to the given [`Transform`](@ref about-transforms), in a manner determined by the data type and specified dimensions, column names, indices, and other `Transform`-specific parameters.
* The `transform` method should be overloaded to define feature engineering pipelines that include [`Transform`](@ref about-transforms)s.
* The `transform`(@ref transform-interface) method should be overloaded to define feature engineering pipelines that include [`Transform`](@ref about-transforms)s.

## Getting Started

Expand Down
11 changes: 11 additions & 0 deletions docs/src/transform interface.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# [Transform Interface](@id transform-interface)

The idea around a "transform interface” is to make feature transformations composable, i.e. the output of one `Transform` should be valid input to another.

Feature engineering pipelines, which comprise a sequence of multiple `Transform`s and other steps, should obey the same principle and one should be able to add/remove subsequent `Transform`s without the pipeline breaking.
So the output of an end-to-end transform pipeline should itself be "transformable".

We have enforced this in Transforms.jl by only supporting certain input types, i.e. AbstractArrays and Tables, which produce other AbstractArrays and Tables.
We also have specified this in the `transform` function API, which is expected to be overloaded for implementing pipelines (the exact method is an implementation detail for the user).
Our only requirement is that the return of the implemented `transform` is itself "transformable", i.e. an AbstractArray or Table.
This can be checked by calling `is_transformable` on the output.
5 changes: 3 additions & 2 deletions src/FeatureTransforms.jl
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,11 @@ using Tables
export HoD, LinearCombination, OneHotEncoding, Periodic, Power
export IdentityScaling, MeanStdScaling, AbstractScaling
export Transform
export transform, transform!
export is_transformable, transform, transform!

include("utils.jl")
include("transformers.jl")
include("transform.jl")
include("apply.jl")

# Transform implementations
include("linear_combination.jl")
Expand Down
29 changes: 0 additions & 29 deletions src/transformers.jl → src/apply.jl
Original file line number Diff line number Diff line change
@@ -1,32 +1,3 @@

"""
Transform
Abstract supertype for all feature Transforms.
"""
abstract type Transform end

# Make Transforms callable types
(t::Transform)(x; kwargs...) = apply(x, t; kwargs...)


"""
transform!(::T, data)
Defines the feature engineering pipeline for some type `T`, which comprises a collection of
[`Transform`](@ref)s to be peformed on the `data`.
`transform!` should be overloaded for custom types `T` that require feature engineering.
"""
function transform! end

"""
transform(::T, data)
Non-mutating version of [`transform!`](@ref).
"""
function transform end

"""
apply(data::T, ::Transform; kwargs...)
Expand Down
45 changes: 45 additions & 0 deletions src/transform.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@

"""
Transform
Abstract supertype for all feature Transforms.
"""
abstract type Transform end

# Make Transforms callable types
(t::Transform)(x; kwargs...) = apply(x, t; kwargs...)

"""
is_transformable(x)
Determine if `x` is both a valid input and output of any [`Transform`](@ref), i.e. that it
follows the [`transform`](@ref) interface.
Currently, all subtypes of `Table`s and `AbstractArray`s are transformable.
"""
is_transformable(::AbstractArray) = true
is_transformable(x) = Tables.istable(x)

"""
transform(::T, data)
Defines the feature engineering pipeline for some type `T`, which comprises a collection of
[`Transform`](@ref)s and other steps to be peformed on the `data`.
The idea around a "transform interface” is to make feature transformations composable, i.e.
the output of any one `Transform` should be valid input to another.
Feature engineering pipelines should obey the same principle and it should be trivial to
add/remove `Transform` steps that compose the pipeline without it breaking.
`transform` should be overloaded for custom types `T` that require feature engineering.
The only requirement is that the return of `transform `is itself "transformable", i.e.
calling [`is_transformable`](@ref) on the output returns true.
"""
function transform end

"""
transform!(::T, data)
Mutating version of [`transform`](@ref).
"""
function transform! end
1 change: 1 addition & 0 deletions test/runtests.jl
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ using TimeZones
include("power.jl")
include("scaling.jl")
include("temporal.jl")
include("transform.jl")

doctest(FeatureTransforms)
end
16 changes: 16 additions & 0 deletions test/transform.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
@testset "is_transformable" begin

# Test that AbstractArrays and Tables are transformable
@test is_transformable([1, 2, 3, 4, 5])
@test is_transformable([1 2 3; 4 5 6])
@test is_transformable(AxisArray([1 2 3; 4 5 6], foo=["a", "b"], bar=["x", "y", "z"]))
@test is_transformable(KeyedArray([1 2 3; 4 5 6], foo=["a", "b"], bar=["x", "y", "z"]))
@test is_transformable((a = [1, 2, 3], b = [4, 5, 6]))
@test is_transformable(DataFrame(:a => [1, 2, 3], :b => [4, 5, 6]))

# Test types that are not transformable
@test is_transformable(1) == false
@test is_transformable("string") == false
@test is_transformable(true) == false
@test is_transformable(Dict(2 => 3)) == false
end

2 comments on commit ab6ab74

@nicoleepp
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JuliaRegistrator register()

@JuliaRegistrator
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Registration pull request created: JuliaRegistries/General/32196

After the above pull request is merged, it is recommended that a tag is created on this repository for the registered package version.

This will be done automatically if the Julia TagBot GitHub Action is installed, or can be done manually through the github interface, or via:

git tag -a v0.2.3 -m "<description of version>" ab6ab7447213546c600c3e0b41a86e508226f910
git push origin v0.2.3

Please sign in to comment.