-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: support FeatureTransforms.jl #44
Changes from all commits
b71c197
f3bd036
80bd4b3
de3304f
525ccc1
c840873
b73a80d
f85a135
8af6a70
5c71193
230ea65
c7423b4
6a26438
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,6 +2,7 @@ module AxisSets | |
|
||
using AutoHashEquals | ||
using AxisKeys | ||
using FeatureTransforms | ||
using Impute | ||
using NamedDims | ||
using OrderedCollections | ||
|
@@ -88,5 +89,6 @@ include("dataset.jl") | |
include("indexing.jl") | ||
include("functions.jl") | ||
include("impute.jl") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
include("featuretransforms.jl") | ||
|
||
end |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,154 @@ | ||
FeatureTransforms.is_transformable(::KeyedDataset) = true | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is now automatic when an https://github.com/invenia/FeatureTransforms.jl/releases/tag/v0.3.4 |
||
|
||
_transform_pattern(keys, dims) = isempty(keys) ? _transform_pattern(dims) : Pattern[keys...] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. See comment in impute.jl |
||
_transform_pattern(::Colon) = Pattern[(:__,)] | ||
_transform_pattern(dims::Symbol) = Pattern[(:__, dims)] | ||
_transform_pattern(dims) = Pattern[(:__, d) for d in dims] | ||
|
||
""" | ||
FeatureTransforms.apply(ds::KeyedDataset, t::Transform, [key]; dims=:, kwargs...) | ||
|
||
Apply the `Transform` to each component of the [`KeyedDataset`](@ref). | ||
Returns a new dataset with the same constraints, but transformed components. | ||
|
||
The transform can be applied to a subselection of components via a [`Pattern`](@ref) `key`. | ||
Otherwise, components are selected by the desired `dims`. | ||
|
||
Keyword arguments including `dims` are passed to the appropriate `FeatureTransforms` method | ||
for a component. | ||
|
||
# Example | ||
```jldoctest | ||
julia> using AxisKeys, FeatureTransforms; using AxisSets: KeyedDataset, Pattern, flatten; | ||
|
||
julia> ds = KeyedDataset( | ||
flatten([ | ||
:train => [ | ||
:load => KeyedArray([7.0 7.7; 8.0 8.2; 9.0 9.9]; time=1:3, loc=[:x, :y]), | ||
:price => KeyedArray([-2.0 4.0; 3.0 2.0; -1.0 -1.0]; time=1:3, id=[:a, :b]), | ||
], | ||
:predict => [ | ||
:load => KeyedArray([7.0 7.7; 8.1 7.9; 9.0 9.9]; time=1:3, loc=[:x, :y]), | ||
:price => KeyedArray([0.5 -1.0; -5.0 -2.0; 0.0 1.0]; time=1:3, id=[:a, :b]), | ||
] | ||
])... | ||
); | ||
|
||
julia> p = Power(2); | ||
|
||
julia> r = FeatureTransforms.apply(ds, p, (:_, :price, :_)); | ||
|
||
julia> [k => parent(parent(v)) for (k, v) in r.data] | ||
4-element Vector{Pair{Tuple{Symbol, Symbol}, Matrix{Float64}}}: | ||
(:train, :load) => [7.0 7.7; 8.0 8.2; 9.0 9.9] | ||
(:train, :price) => [4.0 16.0; 9.0 4.0; 1.0 1.0] | ||
(:predict, :load) => [7.0 7.7; 8.1 7.9; 9.0 9.9] | ||
(:predict, :price) => [0.25 1.0; 25.0 4.0; 0.0 1.0] | ||
``` | ||
""" | ||
function FeatureTransforms.apply( | ||
ds::KeyedDataset, t::Transform, keys...; | ||
dims=:, kwargs... | ||
) | ||
return map(ds, _transform_pattern(keys, dims)...) do a | ||
FeatureTransforms.apply(a, t; dims=dims, kwargs...) | ||
end | ||
end | ||
|
||
""" | ||
FeatureTransforms.apply!(ds::KeyedDataset, t::Transform, [key]; dims=:, kwargs...) | ||
|
||
Apply the `Transform` to each component of the [`KeyedDataset`](@ref). | ||
Returns a new dataset with the same constraints, but transformed components. | ||
|
||
The transform can be applied to a subselection of components via a [`Pattern`](@ref) `key`. | ||
Otherwise, components are selected by the desired `dims`. | ||
|
||
Keyword arguments including `dims` are passed to the appropriate `FeatureTransforms` method | ||
for a component. | ||
""" | ||
function FeatureTransforms.apply!( | ||
ds::KeyedDataset, t::Transform, keys...; | ||
dims=:, kwargs... | ||
) | ||
return map(ds, _transform_pattern(keys, dims)...) do a | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. should this be a for-loop and the output be just |
||
FeatureTransforms.apply!(a, t; dims=dims, kwargs...) | ||
end | ||
end | ||
|
||
""" | ||
FeatureTransforms.apply_append( | ||
ds::KeyedDataset, t::Transform, [key]; | ||
dims=:, inner=false, component_name=:component, kwargs... | ||
) | ||
|
||
Apply the `Transform` to each component of the [`KeyedDataset`](@ref). | ||
|
||
The transform can be applied to a subselection of components via a [`Pattern`](@ref) `key`. | ||
Otherwise, components are selected by the desired `dims`. | ||
|
||
If `inner=true`, perform `FeatureTransforms.apply_append` on each component, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure |
||
returning a new dataset with the same constraints, but transformed components. | ||
|
||
Otherwise, transform each component using `FeatureTransforms.apply`, and append | ||
to a copy of the dataset as a new component called `component_name`. | ||
|
||
Keyword arguments including `dims` are passed to the appropriate `FeatureTransforms` method | ||
for a component. | ||
|
||
# Example | ||
```jldoctest | ||
julia> using AxisKeys, FeatureTransforms; using AxisSets: KeyedDataset, Pattern, flatten; | ||
|
||
julia> ds = KeyedDataset( | ||
flatten([ | ||
:train => [ | ||
:load => KeyedArray([7.0 7.7; 8.0 8.2; 9.0 9.9]; time=1:3, loc=[:x, :y]), | ||
:price => KeyedArray([-2.0 4.0; 3.0 2.0; -1.0 -1.0]; time=1:3, id=[:a, :b]), | ||
], | ||
:predict => [ | ||
:load => KeyedArray([7.0 7.7; 8.1 7.9; 9.0 9.9]; time=1:3, loc=[:x, :y]), | ||
:price => KeyedArray([0.5 -1.0; -5.0 -2.0; 0.0 1.0]; time=1:3, id=[:a, :b]), | ||
] | ||
])... | ||
); | ||
|
||
julia> p = Power(2); | ||
|
||
julia> r = FeatureTransforms.apply_append(ds, p, (:_, :price, :_); component_name=:price2); | ||
|
||
julia> [k => parent(parent(v)) for (k, v) in r.data] | ||
6-element Vector{Pair{Tuple{Symbol, Symbol}, Matrix{Float64}}}: | ||
(:train, :load) => [7.0 7.7; 8.0 8.2; 9.0 9.9] | ||
(:train, :price) => [-2.0 4.0; 3.0 2.0; -1.0 -1.0] | ||
(:predict, :load) => [7.0 7.7; 8.1 7.9; 9.0 9.9] | ||
(:predict, :price) => [0.5 -1.0; -5.0 -2.0; 0.0 1.0] | ||
(:train, :price2) => [4.0 16.0; 9.0 4.0; 1.0 1.0] | ||
(:predict, :price2) => [0.25 1.0; 25.0 4.0; 0.0 1.0] | ||
``` | ||
""" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These docstrings are a bit verbose (ie: several duplicate sentences between them). Could we simplify the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah that's a good idea |
||
function FeatureTransforms.apply_append( | ||
ds::KeyedDataset, t::Transform, keys...; | ||
dims=:, inner=false, component_name=:component, kwargs... | ||
) | ||
patterns = _transform_pattern(keys, dims) | ||
|
||
if inner # batched apply_append on each component | ||
return map(ds, patterns...) do a | ||
FeatureTransforms.apply_append(a, t; dims=dims, kwargs...) | ||
end | ||
Comment on lines
+136
to
+139
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just thought this could be handy and worth including. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I could see this being useful, but I'm not sure we have enough use-cases yet. That being said, it wouldn't be too hard to deprecated if it isn't useful. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure how useful this is right away either, and it might complicate the code and tests if we have to support both given the implementation is rather easy (just a |
||
else # merge transformed components as new components of dataset | ||
# select any components the keys match | ||
selected = unique(x[1:end-1] for x in dimpaths(ds) if any(p -> x in p, patterns)) | ||
|
||
# construct keys of new transformed components | ||
new_keys = [(k[1:end-1]..., component_name) for k in selected] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. By using a single component name this assumes, but does not enforce, that there is only one kind of component being transformed e.g. But we also discussed the idea of passing in a full dimspath, which I'm open to. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Might be good to throw an argument error if that assumption doesn't hold for now. Passing multiple dimpaths does seem noisy, and hard to justify without use-cases. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So the condition would be that the last part of each dimpath is the same? Off the top of my head:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, though that could probably be simplified to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this assumption is fine IMO There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe shouldn't call it There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, I'd probably use a variable like |
||
|
||
# pair new keys with transformed components | ||
pairs = map(new_keys, selected) do new_k, k | ||
new_k => FeatureTransforms.apply(ds.data[k], t; dims=dims, kwargs...) | ||
end | ||
|
||
return merge(ds, KeyedDataset(pairs...)) | ||
end | ||
end |
Original file line number | Diff line number | Diff line change | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -126,12 +126,12 @@ julia> [k => parent(parent(v)) for (k, v) in Impute.filter(ds; dims=:loc).data] | |||||||||||||
""" | ||||||||||||||
Impute.apply(ds::KeyedDataset, f::Filter; dims) = Impute.apply!(deepcopy(ds), f; dims=dims) | ||||||||||||||
|
||||||||||||||
_pattern(dims::Pattern) = dims | ||||||||||||||
_pattern(dims::Tuple) = Pattern(dims) | ||||||||||||||
_pattern(dims) = Pattern(:__, dims) | ||||||||||||||
_impute_pattern(dims::Pattern) = dims | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Having to distinguish
Lines 79 to 84 in d39b647
Ideally we'd standardise how to handle patterns/dims everywhere. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, maybe add a comment to invenia/Impute.jl#66? If we're gonna change that in Impute.jl it'd be good to do that before a 1.0 release? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||||||||||||||
_impute_pattern(dims::Tuple) = Pattern(dims) | ||||||||||||||
_impute_pattern(dims) = Pattern(:__, dims) | ||||||||||||||
|
||||||||||||||
function Impute.apply!(ds::KeyedDataset, f::Filter; dims) | ||||||||||||||
pattern = _pattern(dims) | ||||||||||||||
pattern = _impute_pattern(dims) | ||||||||||||||
dim = pattern.segments[end] | ||||||||||||||
|
||||||||||||||
dim in (:_, :__) && throw(ArgumentError( | ||||||||||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comply with FeatureTransforms compat
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kind of annoying that FeatureTransforms only supports 1.5, but I guess all our packages should support it anyways?