Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define apply_append method #69

Merged
merged 1 commit into from
Mar 31, 2021
Merged

Define apply_append method #69

merged 1 commit into from
Mar 31, 2021

Conversation

glennmoy
Copy link
Member

Closes #38

Supersedes #40

julia> df = DataFrame(:a=>[1, 2, 3, 4, 5], :b=>[5, 4, 3, 2, 1]);

julia> p = Power(3);

julia> FeatureTransforms.apply_append(df, p; cols=[:a, :b], header=[:a3, :b3])
5×4 DataFrame
 Row │ a      b      a3     b3    
     │ Int64  Int64  Int64  Int64 
─────┼────────────────────────────
   11      5      1    125
   22      4      8     64
   33      3     27     27
   44      2     64      8
   55      1    125      1

julia> M = reshape(1:9, 3, 3)
3×3 reshape(::UnitRange{Int64}, 3, 3) with eltype Int64:
 1  4  7
 2  5  8
 3  6  9

julia> FeatureTransforms.apply_append(M, p; dims=1, inds=1, append_dim=1)
4×3 Array{Int64,2}:
 1   4    7
 2   5    8
 3   6    9
 1  64  343

Comment on lines +56 to +57
new_size = collect(size(A))
setindex!(new_size, 1, dim(A, append_dim))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was unfortunately necessary to introduce NamedDims as a dependency.

Without NamedDims.dim there was no way to map the dimname to the dim - and so no way to know how to reshape the output of LinearCombination so it could be cat to the input.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comes down to setindex! though, right? Because it has to use an integer index. Just wondering if there is an alternative outside of setindex!.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comes down to setindex! though, right? Because it has to use an integer index.

yeah exactly. I'm not sure what alternative exists though. I don't think there's anything in base that assumes a dim can be a symbol.

@codecov
Copy link

codecov bot commented Mar 30, 2021

Codecov Report

Merging #69 (9602516) into main (178a442) will not change coverage.
The diff coverage is 100.00%.

❗ Current head 9602516 differs from pull request most recent head 0a5252d. Consider uploading reports for the commit 0a5252d to get more accurate results
Impacted file tree graph

@@            Coverage Diff            @@
##              main       #69   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           10        10           
  Lines          103       113   +10     
=========================================
+ Hits           103       113   +10     
Impacted Files Coverage Δ
src/FeatureTransforms.jl 100.00% <ø> (ø)
src/apply.jl 100.00% <100.00%> (ø)
src/linear_combination.jl 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 178a442...0a5252d. Read the comment docs.

lc = LinearCombination([1, 1, 1])

expected1 = [1 1 1; 2 2 2; 3 3 3; 6 6 6]
@test FeatureTransforms.apply_append(M, lc; dims=1, append_dim=1) == expected1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does apply_append only work when dims == append_dim or can we add a test for when they are different?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's technically possible but for linear combination it only makes sense to append to the dim you reduced over. It's not guaranteed to work any other way so I don't think testing for it is needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"not guaranteed to work" not because the code won't allow it, but because the structure of your data won't.

the usual [`Transform`](@ref) being invoked.
"""
function apply_append(A::AbstractArray, t; append_dim, kwargs...)::AbstractArray
return cat(A, apply(A, t; kwargs...); dims=append_dim)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this copies the axiskeys of the chunk being applied to, so there isn't the equivalent of header?

I can understand not needing this, given axis keys are more like metadata for each data point. For example even if you take Power(3) of the data, it still corresponds to time=t.

What's your thinking on it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this copies the axiskeys of the chunk being applied to, so there isn't the equivalent of header?

Correct, I went looking in the AxisKeys code for something like it but new_keys is set internally and can't be over-written externally.

What's your thinking on it?

For transforms that affect the whole (2D) Array, it makes sense to append on the 3rd dimension and create a new dim that way. Otherwise you end up with redundant keys in one dimension.

For linear combination, or transforms on single columns it seems unavoidable for now that you have to rekey on the other side. We could do that processing here? But it would require loading AxisKeys explicitly to allow new keys be provided. For now I'm happy to leave it and see how much of an issue it becomes.

Comment on lines +56 to +57
new_size = collect(size(A))
setindex!(new_size, 1, dim(A, append_dim))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comes down to setindex! though, right? Because it has to use an integer index. Just wondering if there is an alternative outside of setindex!.

is appended to `A` along the `append_dim` dimension. The remaining `kwargs` correspond to
the usual [`Transform`](@ref) being invoked.
"""
function apply_append(A::AbstractArray, t; append_dim, kwargs...)::AbstractArray
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at some tests repeating dims=x, append_dim=x made me think, would it make sense to set the default value for append_dim as dims inside the function? Maybe that only works when dims is single, as is currently the case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It definitely does for linear combination... I'll have a look and see if it makes sense in general and make the change if so.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'll leave this for another PR - I have an idea or two of how to make this work but it'll require another review

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#70

@glennmoy glennmoy enabled auto-merge (squash) March 31, 2021 12:43
@glennmoy glennmoy merged commit 35befa8 into main Mar 31, 2021
@glennmoy glennmoy deleted the gm/apply_append branch March 31, 2021 12:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement apply method that appends result to data and / or forces promotion
3 participants