Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define cardinality traits #79

Merged
merged 4 commits into from
Apr 16, 2021
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Project.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name = "FeatureTransforms"
uuid = "8fd68953-04b8-4117-ac19-158bf6de9782"
authors = ["Invenia Technical Computing Corporation"]
version = "0.3.2"
version = "0.3.3-DEV"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to follow up with TestUtils before tagging a release


[deps]
Dates = "ade2ca70-3891-5945-98fb-dc099432e06a"
Expand Down
1 change: 1 addition & 0 deletions src/FeatureTransforms.jl
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ export Transform
export is_transformable, transform, transform!

include("utils.jl")
include("traits.jl")
include("transform.jl")
include("apply.jl")

Expand Down
2 changes: 2 additions & 0 deletions src/linear_combination.jl
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@ struct LinearCombination <: Transform
coefficients::Vector{Real}
end

cardinality(::LinearCombination) = ManyToOne()

"""
apply(
::AbstractArray{<:Real, N}, ::LinearCombination; dims=1, inds=:
Expand Down
2 changes: 2 additions & 0 deletions src/one_hot_encoding.jl
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,8 @@ struct OneHotEncoding{R<:Real} <: Transform
end
end

cardinality(::OneHotEncoding) = OneToMany()

function OneHotEncoding(possible_values::AbstractVector{T}) where T
return OneHotEncoding{Bool}(possible_values)
end
Expand Down
2 changes: 2 additions & 0 deletions src/periodic.jl
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,8 @@ struct Periodic{P, S} <: Transform
end
end

cardinality(::Periodic) = OneToOne()

"""
Periodic(f, period) -> Periodic

Expand Down
2 changes: 2 additions & 0 deletions src/power.jl
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,6 @@ struct Power <: Transform
exponent::Real
end

cardinality(::Power) = OneToOne()

_apply(x, P::Power; kwargs...) = x .^ P.exponent
4 changes: 4 additions & 0 deletions src/scaling.jl
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@ Represents the no-op scaling which simply returns the `data` it is applied on.
struct IdentityScaling <: AbstractScaling end
IdentityScaling(args...) = IdentityScaling()

cardinality(::IdentityScaling) = OneToOne()

@inline _apply(x, ::IdentityScaling; kwargs...) = x

"""
Expand Down Expand Up @@ -69,6 +71,8 @@ end

compute_stats(x) = (mean(x), std(x))

cardinality(::MeanStdScaling) = OneToOne()

function _apply(A::AbstractArray, scaling::MeanStdScaling; inverse=false, eps=1e-3, kwargs...)
inverse && return scaling.μ .+ scaling.σ .* A
# Avoid division by 0
Expand Down
1 change: 1 addition & 0 deletions src/temporal.jl
Original file line number Diff line number Diff line change
Expand Up @@ -4,5 +4,6 @@
Get the hour of day corresponding to the data.
"""
struct HoD <: Transform end
cardinality(::HoD) = OneToOne()

_apply(x, ::HoD; kwargs...) = hour.(x)
48 changes: 48 additions & 0 deletions src/traits.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
"""
type Cardinality

A trait describing the cardinality of a [`Transform`]. Available cardinalities are:
[`OneToOne`](@ref), [`ManyToOne`](@ref), [`OneToMany`](@ref), and [`ManyToMany`](@ref).
"""
abstract type Cardinality end

"""
OneToOne <: Cardinality

Transforms that map each input to exactly one output: `x → y`.
Examples: [`Power`](@ref), [`Periodic`](@ref).
"""
struct OneToOne <: Cardinality end

"""
ManyToOne <: Cardinality

Transforms that map many inputs to one output: `(x_1, x_2, ..., x_n) → y`.
These are typically reduction operations.
Examples: [`LinearCombination`](@ref).
"""
struct ManyToOne <: Cardinality end

"""
OneToMany <: Cardinality

Transforms that map one input to many outputs: `x → (y_1, y_2, ..., y_n)`.
Examples: [`OneHotEncoding`](@ref).
"""
struct OneToMany <: Cardinality end

"""
ManyToMany <: Cardinality

Transforms that map many inputs to many outputs: `(x_1, x_2, ..., x_m) → (y_1, y_2, ..., y_n)`.
Examples: Principle Component Analysis (not implemented).
"""
struct ManyToMany <: Cardinality end


"""
cardinality(transform) -> Cardinality

Returns the [`Cardinality`](@ref) of the `transform`.
"""
function cardinality end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially thought defining cardinality on instances rather than types seemed redundant in a way. And that cardinality for each subtype of AbstractScaling seemed redundant. But I realised it's how traits are supposed to work right?

Copy link
Member Author

@glennmoy glennmoy Apr 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand.
I could/should have defined the cardinality on AbstractScaling, which would have covered MeanStdScaling and IdentityScaling. In case another type of scaling comes along that's ManyToOne it would have to be special-cased in that instance. Is that what you're getting at?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's partly what I was thinking. For some reason thought defining cardinality on AbstractScaling wouldn't work because it was an abstract type, so I didn't comment on that directly.

The other point I was making is, why should every instance of a given Transform subtype have this trait, when the trait is the same between all instances? Conceptually, cardinality belongs more with the type than the instances.

I backed off on that because I figured traits, as a design pattern, apply to instances. But I'm not even sure about that - is it a valid point?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh right, well they can apply to both.

You'll see I've changed the code so that cardinality now applies to AbstractScaling.
In that case it made more sense to attach it to the type because all concrete subtypes will have the same behaviour.

But I attached it directly to the concrete types of all other transforms because they don't have abstract supertype (besides Transform) to define it for. And if I defined it for Transform outright then users won't know they have to define it themselves.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep that looks fine. Still not sure if I got this across though:

why should every instance of a given Transform subtype have this trait, when the trait is the same between all instances? Conceptually, cardinality belongs more with the type than the instances.

This means e.g.

julia> cardinality(T) where T <: Power = OneToOne()

julia> cardinality(Power) 
OneToOne

vs.

julia> cardinality(::Power) = OneToOne()

julia> cardinality(Power(2))
OneToOne

4 changes: 3 additions & 1 deletion test/linear_combination.jl
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
@testset "linear combination" begin

@test LinearCombination([1, -1]) isa Transform
lc = LinearCombination([1, -1])
@test lc isa Transform
@test cardinality(lc) == ManyToOne()

@testset "Vector" begin

Expand Down
1 change: 1 addition & 0 deletions test/one_hot_encoding.jl
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
categories = ["foo", "bar", "baz"]
ohe = OneHotEncoding(categories)
@test ohe isa Transform
@test cardinality(ohe) == OneToMany()

@testset "Vector" begin

Expand Down
1 change: 1 addition & 0 deletions test/periodic.jl
Original file line number Diff line number Diff line change
Expand Up @@ -322,6 +322,7 @@
@test p isa Transform
@test p.period == Day(5)
@test p.phase_shift == Day(2)
@test cardinality(p) == OneToOne()
end

@testset "No phase_shift" begin
Expand Down
1 change: 1 addition & 0 deletions test/power.jl
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

p = Power(3)
@test p isa Transform
@test cardinality(p) == OneToOne()

# TODO: all of these should be part of some test utils
@testset "Vector" begin
Expand Down
2 changes: 2 additions & 0 deletions test/runtests.jl
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ using Dates
using Documenter: doctest
using FeatureTransforms
using FeatureTransforms: _periodic
using FeatureTransforms: cardinality, OneToOne, OneToMany, ManyToOne, ManyToMany
using Test
using TimeZones

Expand All @@ -19,4 +20,5 @@ using TimeZones
include("scaling.jl")
include("temporal.jl")
include("transform.jl")
include("traits.jl")
end
2 changes: 2 additions & 0 deletions test/scaling.jl
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
@testset "IdentityScaling" begin
scaling = IdentityScaling()
@test scaling isa Transform
@test cardinality(scaling) == OneToOne()

@testset "Arguments do nothing" begin
@test IdentityScaling(123) == IdentityScaling()
Expand Down Expand Up @@ -233,6 +234,7 @@
@testset "simple" for x in (M, nt)
x_copy = deepcopy(x)
scaling = MeanStdScaling(x)
@test cardinality(scaling) == OneToOne()
@test scaling isa Transform
@test x == x_copy # data is not mutated
# constructor uses all data by default
Expand Down
1 change: 1 addition & 0 deletions test/temporal.jl
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

hod = HoD()
@test hod isa Transform
@test cardinality(hod) == OneToOne()

@testset "Vector" begin
x = collect(DateTime(2020, 1, 1, 9, 0):Hour(1):DateTime(2020, 5, 7, 9, 0))
Expand Down
5 changes: 5 additions & 0 deletions test/traits.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
@testset "traits.jl" begin
for t in (OneToOne(), OneToMany(), ManyToOne(), ManyToMany())
@test t isa FeatureTransforms.Cardinality
end
end