Clean up apply! code #25

glennmoy · 2021-02-22T19:50:33Z

Closes #18

There is an unfortunate conflict in how the dims kwarg is implemented in eachslice vs mapslices #18

mapslices is more intuitive to our researchers but it cannot be used to mutate data directly because it doesn't take a view of the underlying data (there is no such thing as a mapview JuliaLang/julia#29146)

This is important for having a mutating apply! method, which is now only possible with eachslices + some workarounds.

This PR attempts to simplify this interface by just delegating to apply inside apply! and avoiding the complications involved in using eachslice in the first place. (The only place it is used in now is LinearCombination).

But since apply! delegates to apply which delegates to _apply, there is also no longer any need for the _apply! method.
So now users only need to implement one method (_apply) and Transforms takes care of the rest.

With this change I think the code is much simpler to follow and more "honest" in what it is using under-the-hood (mapslices).
However, I haven't taken steps to reduce allocations and it's probably 2x more expensive on that front but we can probably make those improvements as well.

codecov · 2021-02-22T19:55:13Z

Codecov Report

Merging #25 (6d438f1) into main (a814357) will decrease coverage by 4.22%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main      #25      +/-   ##
==========================================
- Coverage   99.22%   95.00%   -4.23%     
==========================================
  Files           9        9              
  Lines         129      100      -29     
==========================================
- Hits          128       95      -33     
- Misses          1        5       +4

Impacted Files	Coverage Δ
src/periodic.jl	`100.00% <ø> (ø)`
src/power.jl	`100.00% <ø> (ø)`
src/temporal.jl	`100.00% <ø> (ø)`
src/utils.jl	`0.00% <ø> (-88.89%)`	⬇️
src/one_hot_encoding.jl	`100.00% <100.00%> (ø)`
src/scaling.jl	`96.15% <100.00%> (-3.85%)`	⬇️
src/transformers.jl	`100.00% <100.00%> (ø)`
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a814357...6d438f1. Read the comment docs.

nicoleepp · 2021-02-22T19:57:01Z

Probably worth going through the transforms and deleting the unnecessary _apply! methods

glennmoy · 2021-02-22T20:02:08Z

Probably worth going through the transforms and deleting the unnecessary _apply! methods

yeah I think I got them all 👍

bencottier

I think this is a better way forward, because of the design issues it avoids.

bencottier · 2021-02-23T11:26:09Z

src/transformers.jl

+to each row.
+"""
+function apply!(A::AbstractArray, t::Transform; dims=:, kwargs...)
+    A[:] = apply(A, t; dims=dims, kwargs...)


This was true before, but I noticed that A[:] = doesn't work for UnitRange. I think that's to be expected (even on user end) because UnitRange is immutable.

as in, if A = 1:3 this kind of assignment won't work?

Yep, simply because:

julia> x = 1:3 1:3 julia> x isa AbstractArray true julia> x[:] = 4:6 ERROR: setindex! not defined for UnitRange{Int64} Stacktrace: [1] error(::String, ::Type{T} where T) at ./error.jl:42 [2] error_if_canonical_setindex(::IndexLinear, ::UnitRange{Int64}, ::Int64) at ./abstractarray.jl:1161 [3] setindex! at ./abstractarray.jl:1152 [inlined] [4] macro expansion at ./multidimensional.jl:802 [inlined] [5] macro expansion at ./cartesian.jl:64 [inlined] [6] macro expansion at ./multidimensional.jl:797 [inlined] [7] _unsafe_setindex! at ./multidimensional.jl:789 [inlined] [8] _setindex! at ./multidimensional.jl:785 [inlined] [9] setindex!(::UnitRange{Int64}, ::UnitRange{Int64}, ::Function) at ./abstractarray.jl:1153 [10] top-level scope at REPL[39]:1

Also Floats in case you're wondering:

julia> x = 1.:3. 1.0:1.0:3.0 julia> x isa AbstractArray true julia> x[:] = 4.:6. ERROR: setindex! not defined for StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}} Stacktrace: [1] error(::String, ::Type{T} where T) at ./error.jl:42 [2] error_if_canonical_setindex(::IndexLinear, ::StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}, ::Int64) at ./abstractarray.jl:1161 [3] setindex! at ./abstractarray.jl:1152 [inlined] [4] macro expansion at ./multidimensional.jl:802 [inlined] [5] macro expansion at ./cartesian.jl:64 [inlined] [6] macro expansion at ./multidimensional.jl:797 [inlined] [7] _unsafe_setindex! at ./multidimensional.jl:789 [inlined] [8] _setindex! at ./multidimensional.jl:785 [inlined] [9] setindex!(::StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}, ::StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}, ::Function) at ./abstractarray.jl:1153 [10] top-level scope at REPL[43]:1

bencottier · 2021-02-23T11:43:21Z

src/periodic.jl

@@ -43,11 +43,6 @@ function _apply(x, P::Periodic{T}; kwargs...) where T <: Period
    map(xi -> _periodic(P.f, xi, P.period, P.phase_shift), x)
 end

-function _apply!(x::AbstractArray{T}, P::Periodic; kwargs...) where T <: Real


Not sure if this is informative benchmarking, but I was curious about the effect.

This branch:

julia> using BenchmarkTools, Transforms julia> x = collect(1.:1_000_000.); julia> p = Periodic(sin, 7); julia> @benchmark Transforms.apply!(x, p) BenchmarkTools.Trial: memory estimate: 22.89 MiB allocs estimate: 9 -------------- minimum time: 6.602 ms (0.00% GC) median time: 10.553 ms (19.06% GC) mean time: 10.830 ms (33.86% GC) maximum time: 20.315 ms (63.28% GC) -------------- samples: 462 evals/sample: 1

main branch:

julia> @benchmark Transforms.apply!(x, p) BenchmarkTools.Trial: memory estimate: 22.89 MiB allocs estimate: 7 -------------- minimum time: 6.271 ms (0.00% GC) median time: 10.960 ms (17.62% GC) mean time: 10.793 ms (35.26% GC) maximum time: 21.596 ms (59.10% GC) -------------- samples: 464 evals/sample: 1

Why are there more allocations? What are approaches to reducing allocations? I'm not familiar with it.

Why are there more allocations? What are approaches to reducing allocations? I'm not familiar with it.

I'm not sure either, I gotta look into it.

One thing to note however, is that when using BenchmarkTools I think you have to interpolate the function args as this can affect the result https://github.com/JuliaCI/BenchmarkTools.jl#quick-start (note: this doesn't apply to @time)

just to note that with @views this goes back to 7

julia> @benchmark Transforms.apply!($x, $p) BenchmarkTools.Trial: memory estimate: 22.89 MiB allocs estimate: 7 -------------- minimum time: 5.029 ms (0.00% GC) median time: 6.940 ms (16.92% GC) mean time: 6.427 ms (19.47% GC) maximum time: 9.906 ms (32.37% GC) -------------- samples: 778 evals/sample: 1

glennmoy · 2021-02-23T13:01:03Z

Some basic performance checks after implementing @views

Note that now apply and apply! are equally performant and are roughly the average of what they each were before (where apply got better and apply! got worse).

using Transforms
using Random
Random.seed!(1)

X = rand(1.0:10.0, 100, 100)

transforms = (
    Periodic(sin, 7),
    Power(3),
    MeanStdScaling(X; dims=1),
    LinearCombination(ones(100)),
    OneHotEncoding(1:10),
)

for t in transforms
    @time Transforms.apply(X, t; dims=1);
    @time Transforms.apply(X, t; dims=1);

    try
        @time Transforms.apply!(X, t; dims=1);
        X = rand(1.0:10.0, 100, 100)
        @time Transforms.apply!(X, t; dims=1);
    catch e
    end
end

"""
This branch:

Periodic(sin, 7)
apply
0.780758 seconds (2.84 M allocations: 146.639 MiB, 5.11% gc time)
0.000626 seconds (1.64 k allocations: 380.922 KiB)
apply!
0.027169 seconds (71.10 k allocations: 4.024 MiB)
0.000667 seconds (1.64 k allocations: 380.922 KiB)

Power(3)
apply
0.160279 seconds (347.52 k allocations: 18.862 MiB)
0.000445 seconds (1.74 k allocations: 221.500 KiB)
apply!
0.007062 seconds (5.40 k allocations: 417.305 KiB)
0.000387 seconds (1.74 k allocations: 221.500 KiB)

MeanStdScaling(X; dims=1)
apply
0.184251 seconds (419.53 k allocations: 22.622 MiB, 5.67% gc time)
0.000619 seconds (2.04 k allocations: 234.000 KiB)
apply!
0.007365 seconds (5.70 k allocations: 430.008 KiB)
0.000334 seconds (2.04 k allocations: 234.000 KiB)

LinearCombination(ones(100))
apply
0.329633 seconds (1.26 M allocations: 64.869 MiB, 3.28% gc time)
0.000814 seconds (30.61 k allocations: 812.000 KiB)

OneHotEncoding
apply
0.021401 seconds (31.42 k allocations: 1.939 MiB)
0.000913 seconds (11.98 k allocations: 968.531 KiB)
"""

"""
Main:

Periodic(sin, 7)
apply
0.675031 seconds (2.59 M allocations: 133.186 MiB, 4.47% gc time)
0.000668 seconds (1.64 k allocations: 463.734 KiB)
apply!
0.342571 seconds (1.51 M allocations: 78.363 MiB, 6.02% gc time)
0.000292 seconds (1.21 k allocations: 300.781 KiB)

Power(3)
apply
0.153669 seconds (338.18 k allocations: 18.510 MiB)
0.000476 seconds (1.54 k allocations: 291.812 KiB)
apply!
0.075057 seconds (210.79 k allocations: 10.918 MiB)
0.000199 seconds (1.61 k allocations: 155.469 KiB)

MeanStdScaling(X; dims=1)
apply
0.214341 seconds (575.51 k allocations: 31.068 MiB, 4.94% gc time)
0.000420 seconds (2.04 k allocations: 390.250 KiB)
apply!
0.093857 seconds (278.64 k allocations: 14.865 MiB)
0.000199 seconds (1.91 k allocations: 167.969 KiB)

LinearCombination(ones(100))
apply
0.308854 seconds (1.04 M allocations: 53.751 MiB, 6.56% gc time)
0.001061 seconds (30.61 k allocations: 812.000 KiB)

OneHotEncoding
apply
0.020939 seconds (31.30 k allocations: 1.934 MiB)
0.000875 seconds (11.98 k allocations: 968.531 KiB)
"""

bencottier · 2021-02-23T14:28:00Z

src/one_hot_encoding.jl

@@ -32,7 +32,7 @@ function _apply(x, encoding::OneHotEncoding; kwargs...)

    results = zeros(Int, length(x), n_categories)

-    for (i, value) in enumerate(x)
+    @views for (i, value) in enumerate(x)


Why @views at this level? Is it because it should be done whenever iterating the data, and this is the only internal _apply that does so?

Not necessarily every time but since we are indexing into results and assigning new values I figured using @views might help here. I could be wrong though.

bencottier · 2021-02-23T14:31:42Z

Note that now apply and apply! are equally performant and are roughly the average of what they each were before (where apply got better and apply! got worse).

Hmm, true for the first timing, but for the second timing apply looks slightly worse on average. It's well within the same order of magnitude though, so it looks pretty good.

glennmoy · 2021-02-23T16:46:59Z

Note that now apply and apply! are equally performant and are roughly the average of what they each were before (where apply got better and apply! got worse).

Hmm, true for the first timing, but for the second timing apply looks slightly worse on average. It's well within the same order of magnitude though, so it looks pretty good.

maybe I'm blind - can you point out where apply is doing worse in this branch?

Glenn Moynihan added 5 commits February 22, 2021 18:55

Directly delegate to apply inside apply!

88a6441

Delete invert_dims

7bd767a

bump Project

ab8e001

Update apply! docstring

1ffcbca

Delete unecessary _apply! methods

eafb982

bencottier reviewed Feb 23, 2021

View reviewed changes

Glenn Moynihan added 2 commits February 23, 2021 12:15

Reword apply! docstring

cfaab06

Use @views to reduce allocations

6d438f1

bencottier reviewed Feb 23, 2021

View reviewed changes

glennmoy changed the title ~~POC: Clean up apply! code~~ Clean up apply! code Feb 23, 2021

nicoleepp approved these changes Feb 23, 2021

View reviewed changes

glennmoy mentioned this pull request Feb 23, 2021

WIP: Use mapslices in LinearCombination #27

Closed

glennmoy merged commit 9249440 into main Feb 24, 2021

glennmoy mentioned this pull request Mar 9, 2021

Replace mapslices with selectdim #45

Merged

glennmoy deleted the gm/apply!_cleanup branch March 9, 2021 20:18

glennmoy mentioned this pull request Mar 9, 2021

Refactor LinearCombination #47

Merged

bencottier mentioned this pull request Mar 23, 2021

No-arg MeanStdScaling constructor #59

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clean up apply! code #25

Clean up apply! code #25

glennmoy commented Feb 22, 2021

codecov bot commented Feb 22, 2021 •

edited

Loading

nicoleepp commented Feb 22, 2021

glennmoy commented Feb 22, 2021

bencottier left a comment

bencottier Feb 23, 2021

glennmoy Feb 23, 2021

bencottier Feb 23, 2021 •

edited

Loading

bencottier Feb 23, 2021

glennmoy Feb 23, 2021

glennmoy Feb 23, 2021

glennmoy commented Feb 23, 2021 •

edited

Loading

bencottier Feb 23, 2021

glennmoy Feb 23, 2021

bencottier commented Feb 23, 2021

glennmoy commented Feb 23, 2021

Clean up apply! code #25

Clean up apply! code #25

Conversation

glennmoy commented Feb 22, 2021

codecov bot commented Feb 22, 2021 • edited Loading

Codecov Report

nicoleepp commented Feb 22, 2021

glennmoy commented Feb 22, 2021

bencottier left a comment

Choose a reason for hiding this comment

bencottier Feb 23, 2021

Choose a reason for hiding this comment

glennmoy Feb 23, 2021

Choose a reason for hiding this comment

bencottier Feb 23, 2021 • edited Loading

Choose a reason for hiding this comment

bencottier Feb 23, 2021

Choose a reason for hiding this comment

glennmoy Feb 23, 2021

Choose a reason for hiding this comment

glennmoy Feb 23, 2021

Choose a reason for hiding this comment

glennmoy commented Feb 23, 2021 • edited Loading

bencottier Feb 23, 2021

Choose a reason for hiding this comment

glennmoy Feb 23, 2021

Choose a reason for hiding this comment

bencottier commented Feb 23, 2021

glennmoy commented Feb 23, 2021

codecov bot commented Feb 22, 2021 •

edited

Loading

bencottier Feb 23, 2021 •

edited

Loading

glennmoy commented Feb 23, 2021 •

edited

Loading