Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use std of 0 for singleton vectors #90

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Project.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name = "FeatureTransforms"
uuid = "8fd68953-04b8-4117-ac19-158bf6de9782"
authors = ["Invenia Technical Computing Corporation"]
version = "0.3.6"
version = "0.3.7"

[deps]
Dates = "ade2ca70-3891-5945-98fb-dc099432e06a"
Expand Down
18 changes: 10 additions & 8 deletions src/scaling.jl
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,8 @@ struct MeanStdScaling <: AbstractScaling
σ::Real

"""
MeanStdScaling(A::AbstractArray; dims=:, inds=:) -> MeanStdScaling
MeanStdScaling(table, [cols]) -> MeanStdScaling
MeanStdScaling(A::AbstractArray; dims=:, inds=:, corrected=true) -> MeanStdScaling
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just pass generic kwargs...? Or does it get confused about what to do with the dims kwarg?

MeanStdScaling(table, [cols], corrected=true) -> MeanStdScaling

Construct a [`MeanStdScaling`](@ref) transform from the statistics of the given data.
By default _all the data_ is considered when computing the mean and standard deviation.
Expand All @@ -46,29 +46,31 @@ struct MeanStdScaling <: AbstractScaling
# `AbstractArray` keyword arguments
* `dims=:`: the dimension along which to take the `inds` slices. Default uses all dims.
* `inds=:`: the indices to use in computing the statistics. Default uses all indices.
* `corrected=true`: passed to `Statistics.std`.

# `Table` keyword arguments
* `cols`: the columns to use in computing the statistics. Default uses all columns.
* `corrected=true`: passed to `Statistics.std`.

!!! note
If you want the `MeanStdScaling` to transform your data consistently you should use
the same `inds`, `dims`, or `cols` keywords when calling `apply`. Otherwise, `apply`
might rescale the wrong data or throw an error.
"""
function MeanStdScaling(A::AbstractArray; dims=:, inds=:)
dims == Colon() && return new(compute_stats(A)...)
return new(compute_stats(selectdim(A, dims, inds))...)
function MeanStdScaling(A::AbstractArray; dims=:, inds=:, corrected=true)
dims == Colon() && return new(compute_stats(A; corrected=corrected)...)
return new(compute_stats(selectdim(A, dims, inds); corrected=corrected)...)
end

function MeanStdScaling(table; cols=_get_cols(table))
function MeanStdScaling(table; cols=_get_cols(table), corrected=true)
Tables.istable(table) || throw(MethodError(MeanStdScaling, table))
columntable = Tables.columns(table)
data = reduce(vcat, [getproperty(columntable, c) for c in _to_vec(cols)])
return new(compute_stats(data)...)
return new(compute_stats(data; corrected=corrected)...)
end
end

compute_stats(x) = (mean(x), std(x))
compute_stats(x; corrected) = (mean(x), std(x; corrected=corrected))

function _apply(A::AbstractArray, scaling::MeanStdScaling; inverse=false, eps=1e-3, kwargs...)
inverse && return scaling.μ .+ scaling.σ .* A
Expand Down
20 changes: 20 additions & 0 deletions test/scaling.jl
Original file line number Diff line number Diff line change
Expand Up @@ -255,6 +255,26 @@
@test scaling.σ == 0.5
end
end

@testset "std correction" begin
@testset "singleton" begin
x = [2.]

scaling = MeanStdScaling(x)
@test scaling.μ == 2.
@test isnan(scaling.σ)

scaling = MeanStdScaling(x; corrected=false)
@test scaling.μ == 2.
@test scaling.σ == 0.
end

@testset "Array" begin
scaling = MeanStdScaling(M; corrected=false)
@test scaling.μ == 0.5
@test scaling.σ ≈ 0.81650 atol=1e-5
end
end
end

@testset "Vector" begin
Expand Down