Define apply_append method #69

glennmoy · 2021-03-30T15:44:43Z

Closes #38

Supersedes #40

julia> df = DataFrame(:a=>[1, 2, 3, 4, 5], :b=>[5, 4, 3, 2, 1]);

julia> p = Power(3);

julia> FeatureTransforms.apply_append(df, p; cols=[:a, :b], header=[:a3, :b3])
5×4 DataFrame
 Row │ a      b      a3     b3    
     │ Int64  Int64  Int64  Int64 
─────┼────────────────────────────
   1 │     1      5      1    125
   2 │     2      4      8     64
   3 │     3      3     27     27
   4 │     4      2     64      8
   5 │     5      1    125      1

julia> M = reshape(1:9, 3, 3)
3×3 reshape(::UnitRange{Int64}, 3, 3) with eltype Int64:
 1  4  7
 2  5  8
 3  6  9

julia> FeatureTransforms.apply_append(M, p; dims=1, inds=1, append_dim=1)
4×3 Array{Int64,2}:
 1   4    7
 2   5    8
 3   6    9
 1  64  343

glennmoy · 2021-03-30T15:47:23Z

src/linear_combination.jl

+    new_size = collect(size(A))
+    setindex!(new_size, 1, dim(A, append_dim))


this was unfortunately necessary to introduce NamedDims as a dependency.

Without NamedDims.dim there was no way to map the dimname to the dim - and so no way to know how to reshape the output of LinearCombination so it could be cat to the input.

This comes down to setindex! though, right? Because it has to use an integer index. Just wondering if there is an alternative outside of setindex!.

This comes down to setindex! though, right? Because it has to use an integer index.

yeah exactly. I'm not sure what alternative exists though. I don't think there's anything in base that assumes a dim can be a symbol.

codecov · 2021-03-30T15:49:52Z

Codecov Report

Merging #69 (9602516) into main (178a442) will not change coverage.
The diff coverage is 100.00%.

❗ Current head 9602516 differs from pull request most recent head 0a5252d. Consider uploading reports for the commit 0a5252d to get more accurate results

@@            Coverage Diff            @@
##              main       #69   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           10        10           
  Lines          103       113   +10     
=========================================
+ Hits           103       113   +10

Impacted Files	Coverage Δ
src/FeatureTransforms.jl	`100.00% <ø> (ø)`
src/apply.jl	`100.00% <100.00%> (ø)`
src/linear_combination.jl	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 178a442...0a5252d. Read the comment docs.

morris25 · 2021-03-30T16:47:45Z

test/linear_combination.jl

+            lc = LinearCombination([1, 1, 1])
+
+            expected1 = [1 1 1; 2 2 2; 3 3 3; 6 6 6]
+            @test FeatureTransforms.apply_append(M, lc; dims=1, append_dim=1) == expected1


Does apply_append only work when dims == append_dim or can we add a test for when they are different?

It's technically possible but for linear combination it only makes sense to append to the dim you reduced over. It's not guaranteed to work any other way so I don't think testing for it is needed.

"not guaranteed to work" not because the code won't allow it, but because the structure of your data won't.

bencottier · 2021-03-30T16:13:32Z

src/apply.jl

+the usual [`Transform`](@ref) being invoked.
+"""
+function apply_append(A::AbstractArray, t; append_dim, kwargs...)::AbstractArray
+    return cat(A, apply(A, t; kwargs...); dims=append_dim)


I guess this copies the axiskeys of the chunk being applied to, so there isn't the equivalent of header?

I can understand not needing this, given axis keys are more like metadata for each data point. For example even if you take Power(3) of the data, it still corresponds to time=t.

What's your thinking on it?

I guess this copies the axiskeys of the chunk being applied to, so there isn't the equivalent of header?

Correct, I went looking in the AxisKeys code for something like it but new_keys is set internally and can't be over-written externally.

What's your thinking on it?

For transforms that affect the whole (2D) Array, it makes sense to append on the 3rd dimension and create a new dim that way. Otherwise you end up with redundant keys in one dimension.

For linear combination, or transforms on single columns it seems unavoidable for now that you have to rekey on the other side. We could do that processing here? But it would require loading AxisKeys explicitly to allow new keys be provided. For now I'm happy to leave it and see how much of an issue it becomes.

bencottier · 2021-03-30T16:23:12Z

src/linear_combination.jl

+    new_size = collect(size(A))
+    setindex!(new_size, 1, dim(A, append_dim))


This comes down to setindex! though, right? Because it has to use an integer index. Just wondering if there is an alternative outside of setindex!.

bencottier · 2021-03-30T16:31:07Z

src/apply.jl

+is appended to `A` along the `append_dim` dimension. The remaining `kwargs` correspond to
+the usual [`Transform`](@ref) being invoked.
+"""
+function apply_append(A::AbstractArray, t; append_dim, kwargs...)::AbstractArray


Looking at some tests repeating dims=x, append_dim=x made me think, would it make sense to set the default value for append_dim as dims inside the function? Maybe that only works when dims is single, as is currently the case.

It definitely does for linear combination... I'll have a look and see if it makes sense in general and make the change if so.

I think I'll leave this for another PR - I have an idea or two of how to make this work but it'll require another review

glennmoy mentioned this pull request Mar 30, 2021

RFC: Implement appending apply function #40

Closed

glennmoy commented Mar 30, 2021

View reviewed changes

morris25 approved these changes Mar 30, 2021

View reviewed changes

bencottier approved these changes Mar 30, 2021

View reviewed changes

glennmoy mentioned this pull request Mar 31, 2021

Set default value for append_dim as dims inside apply_append #70

Open

Add apply_append methods

0a5252d

glennmoy force-pushed the gm/apply_append branch from 9602516 to 0a5252d Compare March 31, 2021 12:43

glennmoy enabled auto-merge (squash) March 31, 2021 12:43

glennmoy merged commit 35befa8 into main Mar 31, 2021

glennmoy deleted the gm/apply_append branch March 31, 2021 12:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Define apply_append method #69

Define apply_append method #69

glennmoy commented Mar 30, 2021

glennmoy Mar 30, 2021

bencottier Mar 30, 2021

glennmoy Mar 31, 2021

codecov bot commented Mar 30, 2021 •

edited

Loading

morris25 Mar 30, 2021

glennmoy Mar 31, 2021

glennmoy Mar 31, 2021

bencottier Mar 30, 2021

glennmoy Mar 31, 2021

bencottier Mar 30, 2021

bencottier Mar 30, 2021

glennmoy Mar 31, 2021

glennmoy Mar 31, 2021

glennmoy Mar 31, 2021

		new_size = collect(size(A))
		setindex!(new_size, 1, dim(A, append_dim))

Define apply_append method #69

Define apply_append method #69

Conversation

glennmoy commented Mar 30, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Mar 30, 2021 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Mar 30, 2021 •

edited

Loading