Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add LinearCombination transform #8

Merged
merged 6 commits into from
Feb 8, 2021
Merged

Add LinearCombination transform #8

merged 6 commits into from
Feb 8, 2021

Conversation

nicoleepp
Copy link
Contributor

Closes #5

Note that I couldn't get a linear combination using Statistics.mean to work. A lot of operations were returning NaNs. Also, that mean function expects a whole matrix of weights and not just the weights for each column. For these reasons, I opted not to define the CustomTransform instead of doing this explicitly.

From the issue description, I am unclear what the role of dims kwarg is or how it should work.

@nicoleepp nicoleepp requested a review from glennmoy February 2, 2021 00:08
Base automatically changed from gm/wip_interface to main February 2, 2021 11:45
@glennmoy
Copy link
Member

glennmoy commented Feb 2, 2021

From the issue description, I am unclear what the role of dims kwarg is or how it should work.

I think dims would apply when you want the linear combination to only operate on a subset of columns.

e.g.

df = DataFrame(
    :time = Day(1):Day(1):Day(5),
    :a = [1, 2, 3, 4, 5],
    :b = [1, 1, 2, 3, 5],
)

lc = LinearCombination([1, 5])

Transforms.apply(df, lc; cols=[:a, :b])

similarly for dims on arrays.
Does this make sense?

@bencottier
Copy link
Contributor

similarly for dims on arrays.

If it's selecting a subset of columns, then dims for arrays seems misleading - like it will apply the linear combination along a different axis. I think it should always be called cols for this usage.

Copy link
Contributor

@bencottier bencottier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should handle dims/cols as Glenn explained, but otherwise this looks good to me.

test/linear_combination.jl Outdated Show resolved Hide resolved
@nicoleepp
Copy link
Contributor Author

nicoleepp commented Feb 2, 2021

similarly for dims on arrays.

If it's selecting a subset of columns, then dims for arrays seems misleading - like it will apply the linear combination along a different axis. I think it should always be called cols for this usage.

I agree using dims would make me think it chooses only certain rows, not a subset of the array cols

@codecov
Copy link

codecov bot commented Feb 2, 2021

Codecov Report

❗ No coverage uploaded for pull request base (main@e7ba638). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff            @@
##             main        #8   +/-   ##
========================================
  Coverage        ?   100.00%           
========================================
  Files           ?         5           
  Lines           ?        38           
  Branches        ?         0           
========================================
  Hits            ?        38           
  Misses          ?         0           
  Partials        ?         0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e7ba638...1311db1. Read the comment docs.

@nicoleepp
Copy link
Contributor Author

I have implemented cols as Ben suggested and left the dims tests commented out while we decide what dims should mean. I may just not understand what dims means for an Array but having both does seem confusing.

Copy link
Contributor

@bencottier bencottier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. I don't think there is a use for dims as long as this transform is only applied on columns.

@glennmoy
Copy link
Member

glennmoy commented Feb 4, 2021

If it's selecting a subset of columns, then dims for arrays seems misleading - like it will apply the linear combination along a different axis. I think it should always be called cols for this usage.

Looks good to me. I don't think there is a use for dims as long as this transform is only applied on columns.

Sorry, that's dead right, in my first response I completely confused dims / cols.

But I think there is still a place for dims because we might not always take the linear combination column-wise.

e.g. consider just taking a sum over some matrix - this resembles a LinearCombination(ones(3)) in this case.

julia> A = [1 2 3; 4 5 6; 7 8 9];

julia> sum(A, dims=2)
3×1 Array{Int64,2}:
  6
 15
 24

julia> sum(A, dims=1)
1×3 Array{Int64,2}:
 12  15  18

I think all that needs to change is to change eachrow to eachslice(...; dims=dims)?

This means we would also have to change cols to inds/index/idx (or some better term).

src/linear_combination.jl Outdated Show resolved Hide resolved
src/linear_combination.jl Outdated Show resolved Hide resolved
src/linear_combination.jl Show resolved Hide resolved
src/linear_combination.jl Outdated Show resolved Hide resolved
@glennmoy glennmoy mentioned this pull request Feb 4, 2021
src/linear_combination.jl Show resolved Hide resolved
src/linear_combination.jl Outdated Show resolved Hide resolved
test/linear_combination.jl Outdated Show resolved Hide resolved
src/linear_combination.jl Show resolved Hide resolved
src/linear_combination.jl Show resolved Hide resolved
src/linear_combination.jl Outdated Show resolved Hide resolved
src/linear_combination.jl Show resolved Hide resolved
src/linear_combination.jl Outdated Show resolved Hide resolved
test/linear_combination.jl Outdated Show resolved Hide resolved
src/linear_combination.jl Outdated Show resolved Hide resolved
d = Colon()
@test_throws ArgumentError Transforms.apply(A, lc; dims=d)
end

@testset "dims = 1" begin
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this also work if passing in the axis key? e.g. foo, bar?

Copy link
Member

@glennmoy glennmoy Feb 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you do the same with AxisArrays though? If I use (Colon(), :foo, :bar) like for KeyedArray in test/power.jl I get

dims = foo: Error During Test at /Users/bencottier/JuliaEnvs/Transform.jl/test/power.jl:65
  Got exception outside of a @test
  MethodError: no method matching length(::Symbol)
  Closest candidates are:
    length(!Matched::Base.Iterators.Flatten{Tuple{}}) at iterators.jl:1061
    length(!Matched::Base.MethodList) at reflection.jl:872
    length(!Matched::DataStructures.EnumerateAll) at /Users/bencottier/.julia/packages/DataStructures/ixwFs/src/multi_dict.jl:96
    ...
  Stacktrace:
   [1] #eachslice#196 at ./abstractarraymath.jl:496 [inlined]
   [2] apply!(::AxisArray{Int64,2,Array{Int64,2},Tuple{Axis{:foo,Array{String,1}},Axis{:bar,Array{String,1}}}}, ::Power; dims::Symbol, kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /Users/bencottier/JuliaEnvs/Transform.jl/src/transformers.jl:57
   [3] apply(::AxisArray{Int64,2,Array{Int64,2},Tuple{Axis{:foo,Array{String,1}},Axis{:bar,Array{String,1}}}}, ::Power; kwargs::Base.Iterators.Pairs{Symbol,Symbol,Tuple{Symbol},NamedTuple{(:dims,),Tuple{Symbol}}}) at /Users/bencottier/JuliaEnvs/Transform.jl/src/transformers.jl:64
   [4] macro expansion at /Users/bencottier/JuliaEnvs/Transform.jl/test/power.jl:66 [inlined]
   [5] macro expansion at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.5/Test/src/Test.jl:1190 [inlined]
   [6] macro expansion at /Users/bencottier/JuliaEnvs/Transform.jl/test/power.jl:65 [inlined]
   [7] macro expansion at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.5/Test/src/Test.jl:1115 [inlined]
   [8] macro expansion at /Users/bencottier/JuliaEnvs/Transform.jl/test/power.jl:62 [inlined]
   [9] macro expansion at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.5/Test/src/Test.jl:1115 [inlined]
   [10] top-level scope at /Users/bencottier/JuliaEnvs/Transform.jl/test/power.jl:3
   [11] include(::String) at ./client.jl:457
   [12] top-level scope at /Users/bencottier/JuliaEnvs/Transform.jl/test/runtests.jl:12
   [13] top-level scope at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.5/Test/src/Test.jl:1115
   [14] top-level scope at /Users/bencottier/JuliaEnvs/Transform.jl/test/runtests.jl:11
   [15] include(::String) at ./client.jl:457
   [16] top-level scope at none:6
   [17] eval(::Module, ::Any) at ./boot.jl:331
   [18] exec_options(::Base.JLOptions) at ./client.jl:272
   [19] _start() at ./client.jl:506

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like eachslice doesn't work on AxisArrays maybe. Nevermind it was only a passing thought.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah it doesn't work without some extra voodoo

julia> eachslice(A, dims=axisdim(A, Axis{:foo}))
Base.Generator{Base.OneTo{Int64},Base.var"#199#202"{AxisArray{Int64,2,Array{Int64,2},Tuple{Axis{:foo,Array{String,1}},Axis{:bar,Array{String,1}}}},Tuple{},Tuple{Colon}}}(Base.var"#199#202"{AxisArray{Int64,2,Array{Int64,2},Tuple{Axis{:foo,Array{String,1}},Axis{:bar,Array{String,1}}}},Tuple{},Tuple{Colon}}([1 2 3; 4 5 6], (), (Colon(),)), Base.OneTo(2))

And it's been on their todo list for 5 years 🙄
JuliaArrays/AxisArrays.jl#7

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can make this work if we want to, by checking if it's an AxisArray. I might prefer to do this in a follow up MR just to keep diffs smaller.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#13

test/linear_combination.jl Show resolved Hide resolved
@nicoleepp nicoleepp merged commit 57cccbe into main Feb 8, 2021
@nicoleepp nicoleepp deleted the ne/linear-combination branch February 8, 2021 21:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Linear Combination transform
4 participants