Implement apply method that appends result to data and / or forces promotion #38

glennmoy · 2021-03-02T13:42:17Z

Related to #12

We have two methods for applying a transform to data

apply takes the transform but preserves the original data
apply! takes the transform and mutates the original data in-place

While apply is universally supported, apply! is only supported for transforms that can directly replace the input.

In one case, this means it needs the output to be the same type:

julia> p = Power(1.2);

julia> x = Int64[1, 2, 3];

julia> FeatureTransforms.apply(x, p)
3-element Array{Float64,1}:
 1.0
 2.2973967099940698
 3.7371928188465517

julia> FeatureTransforms.apply!(x, p)
ERROR: InexactError: Int64(2.2973967099940698)

In this example, we might just want to force the type promotion.
But another simple case arises when the output is a different shape to the input.
Consider LinearCombination, which typically takes more than 1 input but produces just 1 output:

lc = LinearCombination([1, -1])
A = [1 2; 5 9]

julia> FeatureTransforms.apply(A, lc);  # works

julia> FeatureTransforms.apply!(A, lc)
ERROR: DimensionMismatch("tried to assign 2 elements to 4 destinations")

Note that this kind of transform is Many-to-One, so would expect similar problems for One-to-Many and Many-to-Many.

We therefore might want some apply-like methods that would:

Force mutation of the input (where possible) for One-to-One transforms by converting the underlying types.
Append the input with the result (where possible) for Many-to-One, One-to-Many, or Many-to-Many transforms.

Given the types of problems these are solving it might be desirable to have these achieved by separate methods.
But note that it's possible to solve both problems using (2) and this would be a consistent behaviour.

Here are some ideas for how we might approach the solution:

Special keyword args: apply!(...; force=true), apply!(...; append=true).
Special methods: apply_force!, apply_and_append!, also apply!! (cf https://github.com/JuliaFolds/BangBang.jl)
Define traits based on the transform cardinality with special rules in place for, e.g., apply!!(x, ::OnetoOne; kwargs...), apply!!(x, ::ManytoOne; kwargs...), apply!!(x, ::ManytoMany; kwargs...).

This also opens the question of how to name the columns for the appended data for a Table.
Should it be provided by the user? or automatically generated?

The text was updated successfully, but these errors were encountered:

glennmoy · 2021-03-02T13:55:44Z

Note that one example where all this will fail regardless is in some dimension-reducing transforms like PCA, for which it would be impossible to append to the input data. But this is a hard limitation no matter what we do with the above.

bencottier · 2021-03-02T14:25:02Z

Note that this kind of transform is Many-to-One, so would expect similar problems for One-to-Many and Many-to-Many, although neither of these kinds of transforms have been implemented yet.

Is OneHotEncoding One-to-Many? Just thinking about the problems for that one, it could be appended to a table in some cases but it's weird to have multiple columns for one transform result.

bencottier · 2021-03-02T14:33:48Z

I'll note I had similar thoughts while writing an example for the docs. These are not my all-things-considered thoughts.

Wanting apply! to force type promotion. I had a HoD transform, but it became convoluted as an example, when I wanted to apply! MeanStdScaling to the result of HoD (which is Int type). I don't know if that made sense in terms of feature engineering, but there are surely similar cases.
Wanting (at least the option) to append the result. I wanted to get a DataFrame out (or mutated) from a DataFrame in.
For many-to-one transforms, we could make the results column name an optional argument. The default could join the input column names. It doesn't seem so bad - the behaviour is defined clearly, and if the user isn't happy with that they can use their preferred column names.

glennmoy · 2021-03-02T14:48:21Z

Is OneHotEncoding One-to-Many?

Yeah I guess it is given the output type. I forgot to consider it. Will update the text.

glennmoy · 2021-03-17T18:50:16Z

It was suggested that instead of an apply method, this could be another (binary) Transform.

bencottier · 2021-03-18T11:58:11Z

It was suggested that instead of an apply method, this could be another (binary) Transform.

Thinking about initialising an append Transform, it seems weird to me, but maybe it helps composability.

glennmoy · 2021-03-18T12:33:09Z

Thinking about initialising an append Transform, it seems weird to me, but maybe it helps composability.

TBH I'm not sure I like it. We'll have to see after trying out a few ideas.

glennmoy added the design label Mar 2, 2021

glennmoy mentioned this issue Mar 2, 2021

Implement Chained Transform #35

Closed

glennmoy mentioned this issue Mar 2, 2021

RFC: Implement appending apply function #40

Closed

glennmoy self-assigned this Mar 22, 2021

glennmoy mentioned this issue Mar 30, 2021

Define apply_append method #69

Merged

glennmoy closed this as completed in #69 Mar 31, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement apply method that appends result to data and / or forces promotion #38

Implement apply method that appends result to data and / or forces promotion #38

glennmoy commented Mar 2, 2021 •

edited

Loading

glennmoy commented Mar 2, 2021

bencottier commented Mar 2, 2021 •

edited

Loading

bencottier commented Mar 2, 2021 •

edited

Loading

glennmoy commented Mar 2, 2021 •

edited

Loading

glennmoy commented Mar 17, 2021

bencottier commented Mar 18, 2021

glennmoy commented Mar 18, 2021

Implement apply method that appends result to data and / or forces promotion #38

Implement apply method that appends result to data and / or forces promotion #38

Comments

glennmoy commented Mar 2, 2021 • edited Loading

glennmoy commented Mar 2, 2021

bencottier commented Mar 2, 2021 • edited Loading

bencottier commented Mar 2, 2021 • edited Loading

glennmoy commented Mar 2, 2021 • edited Loading

glennmoy commented Mar 17, 2021

bencottier commented Mar 18, 2021

glennmoy commented Mar 18, 2021

glennmoy commented Mar 2, 2021 •

edited

Loading

bencottier commented Mar 2, 2021 •

edited

Loading

bencottier commented Mar 2, 2021 •

edited

Loading

glennmoy commented Mar 2, 2021 •

edited

Loading