Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apply scaling to certain slices of an array #39

Closed
bencottier opened this issue Mar 2, 2021 · 3 comments · Fixed by #42
Closed

Apply scaling to certain slices of an array #39

bencottier opened this issue Mar 2, 2021 · 3 comments · Fixed by #42
Labels
enhancement New feature or request

Comments

@bencottier
Copy link
Contributor

bencottier commented Mar 2, 2021

MWE:

julia> M = reshape(collect(1:6), 3, 2)
3×2 Array{Int64,2}:
 1  4
 2  5
 3  6

We can apply Power to the second column using inds

julia> FeatureTransforms.apply(M, Power(2); dims=2, inds=[2])
3×1 Array{Int64,2}:
 16
 25
 36

because dims=2 iterates each row and inds=[2] indexes the second element of each row.

But to normalize the second column with MeanStdScaling, we need to use MeanStdScaling(M; dims=1) (i.e. take the mean and std of each column). Then I see no way to only normalise the second column, because dims needs to be consistent.

julia> scaling = MeanStdScaling(M; dims=1)
MeanStdScaling((1 = 2.0, 2 = 5.0), (1 = 1.0, 2 = 1.0))

julia> FeatureTransforms.apply(M, scaling; dims=1, inds=[2])
1×2 Array{Float64,2}:
 0.0  0.0

Maybe we need a slices argument to only apply to certain slices?

@bencottier bencottier added the bug Something isn't working label Mar 2, 2021
@bencottier bencottier changed the title No way to apply scaling to certain slices of an array Apply scaling to certain slices of an array Mar 2, 2021
@bencottier bencottier added enhancement New feature or request and removed bug Something isn't working labels Mar 2, 2021
@glennmoy
Copy link
Member

glennmoy commented Mar 2, 2021

Not sure what the problem is? perhaps you can elaborate?

If I use the same dims I get consistent behaviour? or is the problem that you need to provide both the dims and the inds to get the desired behaviour?

julia> scaling = MeanStdScaling(M; dims=2)
MeanStdScaling((1 = 2.5, 2 = 3.5, 3 = 4.5), (1 = 2.1213203435596424, 2 = 2.1213203435596424, 3 = 2.1213203435596424))

julia> FeatureTransforms.apply(M, scaling; dims=2, inds=[2])
3×1 Array{Float64,2}:
 0.7071067811865476
 0.7071067811865476
 0.7071067811865476

@bencottier
Copy link
Contributor Author

bencottier commented Mar 3, 2021

If I use the same dims I get consistent behaviour? or is the problem that you need to provide both the dims and the inds to get the desired behaviour?

The example you give is normalizing over the rows (i.e. each row ends up with mean=0 and std=1) and selecting one column.

The problem is you can't normalize a column AND select the column in an array.

Consider the analogy to Table data: when constructing MeanStdScaling, you can specify what columns to normalise (i.e. each column ends up with mean=0 and std=1). Then you can apply the scaling to those same columns. I'm saying you should be able to do the same for arrays.

@bencottier
Copy link
Contributor Author

bencottier commented Mar 3, 2021

I think this is related to LinearCombination still using eachslice. IIRC the original purpose of inds was to generalise the idea of applying to specific columns or rows.

Example:

julia> M = [2 4; 1 5; 3 6]
3×2 Array{Int64,2}:
 2  4
 1  5
 3  6

I can add the first and third rows as follows:

julia> lc_rows = LinearCombination([1, 1]);

julia> FeatureTransforms.apply(M, lc_rows; dims=2, inds=[1, 3])
2-element Array{Int64,1}:
  5
 10

But this meaning of inds has changed along with the meaning of dims in the general apply methods. The example below doesn't square the first and third row:

julia> p = Power(2)

julia> FeatureTransforms.apply(M, p; dims=2, inds=[1, 3])
ERROR: BoundsError: attempt to access 2-element Array{Int64,1} at index [[1, 3]]
Stacktrace:
 [1] throw_boundserror(::Array{Int64,1}, ::Tuple{Array{Int64,1}}) at ./abstractarray.jl:541
 [2] checkbounds at ./abstractarray.jl:506 [inlined]
 [3] view at ./subarray.jl:158 [inlined]
 [4] maybeview at ./views.jl:133 [inlined]
 [5] (::FeatureTransforms.var"#3#4"{Array{Int64,1},Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}},Power})(::Array{Int64,1}) at /Users/bencottier/JuliaEnvs/Transform.jl/src/transformers.jl:82
 [6] mapslices(::FeatureTransforms.var"#3#4"{Array{Int64,1},Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}},Power}, ::Array{Int64,2}; dims::Int64) at ./abstractarray.jl:2083
 [7] #apply#2 at /Users/bencottier/JuliaEnvs/Transform.jl/src/transformers.jl:80 [inlined]
 [8] top-level scope at REPL[29]:1

I think it's because a value i in inds now means "the ith index of the row", rather than "the ith index along the rows" (refer back to my Power example in the original post).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants