Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mapslices performance #26868

Open
gideonsimpson opened this issue Apr 20, 2018 · 5 comments
Open

mapslices performance #26868

gideonsimpson opened this issue Apr 20, 2018 · 5 comments
Labels
arrays [a, r, r, a, y, s] performance Must go faster

Comments

@gideonsimpson
Copy link

Consider the following piece of code:

n=10^6;
X = randn(1,n);
V(x) = 0.5 * dot(x,x);
f(V, X) = [ V((@view X[:,n])) for n = 1:size(X,2) ]

Timing on this reveals:

julia> @time mapslices(V, X, 1);
  1.177690 seconds (12.00 M allocations: 267.024 MiB, 6.91% gc time)
julia> @time f(V,X);
  0.047745 seconds (1.00 M allocations: 53.406 MiB, 5.58% gc time)

which is substantially different.

Note that in this simple example, my data could be a column vector; however, this is a surrogate for problems where I have time series data and X is d x n with 1< d << n.

@mbauman mbauman added arrays [a, r, r, a, y, s] performance Must go faster labels Apr 20, 2018
@KristofferC
Copy link
Member

KristofferC commented Apr 20, 2018

Better on 0.7, but not parity:

julia> @time mapslices(V, X, 1);
  0.202644 seconds (5.00 M allocations: 114.428 MiB, 5.10% gc time)

julia> @time f(V,X);
  0.032617 seconds (1.00 M allocations: 53.406 MiB, 16.69% gc time)

Edit:

Both idx and ridx at:

julia/base/abstractarray.jl

Lines 1940 to 1958 in 8f46450

@noinline function inner_mapslices!(safe_for_reuse, indices, nidx, idx, otherdims, ridx, Aslice, A, f, R)
if safe_for_reuse
# when f returns an array, R[ridx...] = f(Aslice) line copies elements,
# so we can reuse Aslice
for I in indices # skip the first element, we already handled it
replace_tuples!(nidx, idx, ridx, otherdims, I)
_unsafe_getindex!(Aslice, A, idx...)
R[ridx...] = f(Aslice)
end
else
# we can't guarantee safety (#18524), so allocate new storage for each slice
for I in indices
replace_tuples!(nidx, idx, ridx, otherdims, I)
R[ridx...] = f(A[idx...])
end
end
return R
end

are Vector{Any} and seems to be used in quite tight loops.

@zsoerenm
Copy link
Contributor

zsoerenm commented Jun 1, 2018

JuliennedArrays might help in this case:

julia> @benchmark mapslices(V, X, 1)
BenchmarkTools.Trial: 
  memory estimate:  267.02 MiB
  allocs estimate:  11999527
  --------------
  minimum time:     750.878 ms (2.31% GC)
  median time:      782.824 ms (2.33% GC)
  mean time:        803.105 ms (3.38% GC)
  maximum time:     905.073 ms (2.32% GC)
  --------------
  samples:          7
  evals/sample:     1

julia> @benchmark f(V, X)
BenchmarkTools.Trial: 
  memory estimate:  53.41 MiB
  allocs estimate:  1000006
  --------------
  minimum time:     23.018 ms (5.64% GC)
  median time:      25.374 ms (5.97% GC)
  mean time:        37.715 ms (32.41% GC)
  maximum time:     101.140 ms (67.24% GC)
  --------------
  samples:          133
  evals/sample:     1

julia> using JuliennedArrays
julia> @benchmark map(V, julienne(X, (:,*)))
BenchmarkTools.Trial: 
  memory estimate:  53.41 MiB
  allocs estimate:  1000006
  --------------
  minimum time:     26.012 ms (5.17% GC)
  median time:      28.425 ms (5.71% GC)
  mean time:        41.413 ms (30.04% GC)
  maximum time:     109.614 ms (65.14% GC)
  --------------
  samples:          121
  evals/sample:     1

There are plans to migrate it into Base: #23645 (comment)

@stillyslalom
Copy link
Contributor

stillyslalom commented Oct 13, 2020

Updated timings on latest nightly:

julia> @btime mapslices($V, $X, dims=1);
  377.917 ms (10998504 allocations: 251.75 MiB)

julia> @btime f($V,$X);
  14.712 ms (2 allocations: 7.63 MiB)

julia> @btime map(V, Slices($X, 1));
  14.832 ms (6 allocations: 7.63 MiB)

mapslices is still struggling, but eachslice has improved ~5x since v1.5.2, and is now tied with hand-written & julienned versions:

julia> @btime map(V, eachslice($X, dims=2));
  76.877 ms (4000009 allocations: 129.70 MiB) # v1.5.2
julia> @btime map(V, eachslice($X, dims=2));
  14.937 ms (7 allocations: 7.63 MiB) # v1.6.0-DEV.1208

@stillyslalom
Copy link
Contributor

stillyslalom commented Oct 13, 2020

...and just for fun,

using LoopVectorization, Tullio
g(X) = @tullio y[j] := 0.5*X[i, j]^2
julia> @btime g($X)
  754.101 μs (172 allocations: 7.64 MiB)

@mcabbott
Copy link
Contributor

mcabbott commented Aug 12, 2021

Timing with #40996:

julia> @btime mapslices($V, $X, dims=1);
  251.758 ms (10998504 allocations: 251.75 MiB) # master
  7.804 ms (22 allocations: 7.63 MiB)           # with PR 40996

julia> @btime f($V,$X);
  5.087 ms (2 allocations: 7.63 MiB)

julia> @btime map(V, eachslice($X, dims=2));
  5.082 ms (7 allocations: 7.63 MiB)

This example is one for which you only need half of mapslices, since f returns a scalar. The second half deals with re-assembling slices returned by f to make another big array.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrays [a, r, r, a, y, s] performance Must go faster
Projects
None yet
Development

No branches or pull requests

6 participants