-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clean up apply! code #25
Conversation
Codecov Report
@@ Coverage Diff @@
## main #25 +/- ##
==========================================
- Coverage 99.22% 95.00% -4.23%
==========================================
Files 9 9
Lines 129 100 -29
==========================================
- Hits 128 95 -33
- Misses 1 5 +4
Continue to review full report at Codecov.
|
Probably worth going through the transforms and deleting the unnecessary |
yeah I think I got them all 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a better way forward, because of the design issues it avoids.
src/transformers.jl
Outdated
to each row. | ||
""" | ||
function apply!(A::AbstractArray, t::Transform; dims=:, kwargs...) | ||
A[:] = apply(A, t; dims=dims, kwargs...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was true before, but I noticed that A[:] =
doesn't work for UnitRange
. I think that's to be expected (even on user end) because UnitRange
is immutable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as in, if A = 1:3
this kind of assignment won't work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, simply because:
julia> x = 1:3
1:3
julia> x isa AbstractArray
true
julia> x[:] = 4:6
ERROR: setindex! not defined for UnitRange{Int64}
Stacktrace:
[1] error(::String, ::Type{T} where T) at ./error.jl:42
[2] error_if_canonical_setindex(::IndexLinear, ::UnitRange{Int64}, ::Int64) at ./abstractarray.jl:1161
[3] setindex! at ./abstractarray.jl:1152 [inlined]
[4] macro expansion at ./multidimensional.jl:802 [inlined]
[5] macro expansion at ./cartesian.jl:64 [inlined]
[6] macro expansion at ./multidimensional.jl:797 [inlined]
[7] _unsafe_setindex! at ./multidimensional.jl:789 [inlined]
[8] _setindex! at ./multidimensional.jl:785 [inlined]
[9] setindex!(::UnitRange{Int64}, ::UnitRange{Int64}, ::Function) at ./abstractarray.jl:1153
[10] top-level scope at REPL[39]:1
Also Floats in case you're wondering:
julia> x = 1.:3.
1.0:1.0:3.0
julia> x isa AbstractArray
true
julia> x[:] = 4.:6.
ERROR: setindex! not defined for StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}
Stacktrace:
[1] error(::String, ::Type{T} where T) at ./error.jl:42
[2] error_if_canonical_setindex(::IndexLinear, ::StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}, ::Int64) at ./abstractarray.jl:1161
[3] setindex! at ./abstractarray.jl:1152 [inlined]
[4] macro expansion at ./multidimensional.jl:802 [inlined]
[5] macro expansion at ./cartesian.jl:64 [inlined]
[6] macro expansion at ./multidimensional.jl:797 [inlined]
[7] _unsafe_setindex! at ./multidimensional.jl:789 [inlined]
[8] _setindex! at ./multidimensional.jl:785 [inlined]
[9] setindex!(::StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}, ::StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}, ::Function) at ./abstractarray.jl:1153
[10] top-level scope at REPL[43]:1
@@ -43,11 +43,6 @@ function _apply(x, P::Periodic{T}; kwargs...) where T <: Period | |||
map(xi -> _periodic(P.f, xi, P.period, P.phase_shift), x) | |||
end | |||
|
|||
function _apply!(x::AbstractArray{T}, P::Periodic; kwargs...) where T <: Real |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this is informative benchmarking, but I was curious about the effect.
This branch:
julia> using BenchmarkTools, Transforms
julia> x = collect(1.:1_000_000.);
julia> p = Periodic(sin, 7);
julia> @benchmark Transforms.apply!(x, p)
BenchmarkTools.Trial:
memory estimate: 22.89 MiB
allocs estimate: 9
--------------
minimum time: 6.602 ms (0.00% GC)
median time: 10.553 ms (19.06% GC)
mean time: 10.830 ms (33.86% GC)
maximum time: 20.315 ms (63.28% GC)
--------------
samples: 462
evals/sample: 1
main
branch:
julia> @benchmark Transforms.apply!(x, p)
BenchmarkTools.Trial:
memory estimate: 22.89 MiB
allocs estimate: 7
--------------
minimum time: 6.271 ms (0.00% GC)
median time: 10.960 ms (17.62% GC)
mean time: 10.793 ms (35.26% GC)
maximum time: 21.596 ms (59.10% GC)
--------------
samples: 464
evals/sample: 1
Why are there more allocations? What are approaches to reducing allocations? I'm not familiar with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are there more allocations? What are approaches to reducing allocations? I'm not familiar with it.
I'm not sure either, I gotta look into it.
One thing to note however, is that when using BenchmarkTools I think you have to interpolate the function args as this can affect the result https://github.com/JuliaCI/BenchmarkTools.jl#quick-start (note: this doesn't apply to @time
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just to note that with @views
this goes back to 7
julia> @benchmark Transforms.apply!($x, $p)
BenchmarkTools.Trial:
memory estimate: 22.89 MiB
allocs estimate: 7
--------------
minimum time: 5.029 ms (0.00% GC)
median time: 6.940 ms (16.92% GC)
mean time: 6.427 ms (19.47% GC)
maximum time: 9.906 ms (32.37% GC)
--------------
samples: 778
evals/sample: 1
Some basic performance checks after implementing Note that now using Transforms
using Random
Random.seed!(1)
X = rand(1.0:10.0, 100, 100)
transforms = (
Periodic(sin, 7),
Power(3),
MeanStdScaling(X; dims=1),
LinearCombination(ones(100)),
OneHotEncoding(1:10),
)
for t in transforms
@time Transforms.apply(X, t; dims=1);
@time Transforms.apply(X, t; dims=1);
try
@time Transforms.apply!(X, t; dims=1);
X = rand(1.0:10.0, 100, 100)
@time Transforms.apply!(X, t; dims=1);
catch e
end
end
"""
This branch:
Periodic(sin, 7)
apply
0.780758 seconds (2.84 M allocations: 146.639 MiB, 5.11% gc time)
0.000626 seconds (1.64 k allocations: 380.922 KiB)
apply!
0.027169 seconds (71.10 k allocations: 4.024 MiB)
0.000667 seconds (1.64 k allocations: 380.922 KiB)
Power(3)
apply
0.160279 seconds (347.52 k allocations: 18.862 MiB)
0.000445 seconds (1.74 k allocations: 221.500 KiB)
apply!
0.007062 seconds (5.40 k allocations: 417.305 KiB)
0.000387 seconds (1.74 k allocations: 221.500 KiB)
MeanStdScaling(X; dims=1)
apply
0.184251 seconds (419.53 k allocations: 22.622 MiB, 5.67% gc time)
0.000619 seconds (2.04 k allocations: 234.000 KiB)
apply!
0.007365 seconds (5.70 k allocations: 430.008 KiB)
0.000334 seconds (2.04 k allocations: 234.000 KiB)
LinearCombination(ones(100))
apply
0.329633 seconds (1.26 M allocations: 64.869 MiB, 3.28% gc time)
0.000814 seconds (30.61 k allocations: 812.000 KiB)
OneHotEncoding
apply
0.021401 seconds (31.42 k allocations: 1.939 MiB)
0.000913 seconds (11.98 k allocations: 968.531 KiB)
"""
"""
Main:
Periodic(sin, 7)
apply
0.675031 seconds (2.59 M allocations: 133.186 MiB, 4.47% gc time)
0.000668 seconds (1.64 k allocations: 463.734 KiB)
apply!
0.342571 seconds (1.51 M allocations: 78.363 MiB, 6.02% gc time)
0.000292 seconds (1.21 k allocations: 300.781 KiB)
Power(3)
apply
0.153669 seconds (338.18 k allocations: 18.510 MiB)
0.000476 seconds (1.54 k allocations: 291.812 KiB)
apply!
0.075057 seconds (210.79 k allocations: 10.918 MiB)
0.000199 seconds (1.61 k allocations: 155.469 KiB)
MeanStdScaling(X; dims=1)
apply
0.214341 seconds (575.51 k allocations: 31.068 MiB, 4.94% gc time)
0.000420 seconds (2.04 k allocations: 390.250 KiB)
apply!
0.093857 seconds (278.64 k allocations: 14.865 MiB)
0.000199 seconds (1.91 k allocations: 167.969 KiB)
LinearCombination(ones(100))
apply
0.308854 seconds (1.04 M allocations: 53.751 MiB, 6.56% gc time)
0.001061 seconds (30.61 k allocations: 812.000 KiB)
OneHotEncoding
apply
0.020939 seconds (31.30 k allocations: 1.934 MiB)
0.000875 seconds (11.98 k allocations: 968.531 KiB)
""" |
@@ -32,7 +32,7 @@ function _apply(x, encoding::OneHotEncoding; kwargs...) | |||
|
|||
results = zeros(Int, length(x), n_categories) | |||
|
|||
for (i, value) in enumerate(x) | |||
@views for (i, value) in enumerate(x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why @views at this level? Is it because it should be done whenever iterating the data, and this is the only internal _apply that does so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessarily every time but since we are indexing into results
and assigning new values I figured using @views
might help here. I could be wrong though.
Hmm, true for the first timing, but for the second timing |
maybe I'm blind - can you point out where |
Closes #18
There is an unfortunate conflict in how the
dims
kwarg is implemented ineachslice
vsmapslices
#18mapslices
is more intuitive to our researchers but it cannot be used to mutate data directly because it doesn't take aview
of the underlying data (there is no such thing as amapview
JuliaLang/julia#29146)This is important for having a mutating
apply!
method, which is now only possible witheachslices
+ some workarounds.This PR attempts to simplify this interface by just delegating to
apply
insideapply!
and avoiding the complications involved in usingeachslice
in the first place. (The only place it is used in now isLinearCombination
).But since
apply!
delegates toapply
which delegates to_apply
, there is also no longer any need for the_apply!
method.So now users only need to implement one method (
_apply
) andTransforms
takes care of the rest.With this change I think the code is much simpler to follow and more "honest" in what it is using under-the-hood (
mapslices
).However, I haven't taken steps to reduce allocations and it's probably 2x more expensive on that front but we can probably make those improvements as well.