-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use a consistent convention for dims
#18
Comments
julia> M = [1 2; 3 4];
julia> function add_one!(x)
x[:] = x .+ 1
end
add_one! (generic function with 1 method)
julia> mapslices(x -> add_one!(x), M; dims=1)
2×2 Array{Int64,2}:
2 3
4 5
julia> M
2×2 Array{Int64,2}:
1 2
3 4 Compare julia> map(x -> add_one!(x), eachslice(M; dims=2))
2-element Array{Array{Int64,1},1}:
[2, 4]
[3, 5]
julia> M
2×2 Array{Int64,2}:
2 3
4 5 |
SplitApplyCombine.jl Discovered from this post julia> using SplitApplyCombine
julia> f_ms(A) = mapslices(x -> sum(x), A; dims=(2, 3))
f_ms (generic function with 1 method)
julia> f_es(A) = map(x -> sum(x), eachslice(A; dims=1))
f_es (generic function with 1 method)
julia> f_sac(A) = map(x -> sum(x), splitdimsview(A, 1))
f_sac (generic function with 1 method)
julia> A = rand(150, 50, 100);
julia> isapprox(f_es(A), f_sac(A); atol=1e-15)
true
julia> isapprox(f_es(A), f_ms(A); atol=1e-12)
false
julia> isapprox(f_es(A), f_ms(A); atol=1e-11) # quite a difference in precision...
true
julia> @btime f_es(A)
765.021 μs (6 allocations: 1.45 KiB)
julia> @btime f_ms(A);
1.640 ms (1988 allocations: 89.39 KiB)
julia> @btime f_sac(A);
811.317 μs (1 allocation: 1.33 KiB) also supports mutation: julia> map(splitdimsview(M, 2)) do x
add_one!(x)
end
2-element Array{Array{Int64,1},1}:
[2, 4]
[3, 5]
julia> M
2×2 Array{Int64,2}:
2 3
4 5 EDIT: |
Note converting from one |
Overall, I'm favouring Upsides:
Downsides:
|
Yeah, |
I did run into issues trying to use |
Yeah I'm getting different results with
|
Different results are to be expected given this changes what dims things operate on. If that's the only issue shouldn't be a problem. But I do think |
If you're mutating the values of the slice shouldn't that preserve the dimensionality of the input array though? I haven't run into that issue with Impute.jl |
If you are mutating it does, if you are constructing a new type to return it doesn't but mapslices does preserve the shape when not mutating |
Are there links to parts of the code where we can't do the mutating form? I was thinking that even for cases where the type needs to change, so the base API can't be mutating, that we'd still have enough info to pre-allocate the output? Then we'd just iterate over the slices of the inputs and outputs and mutate the pre-allocated outputs? |
https://github.com/invenia/Transforms.jl/blob/main/src/transformers.jl#L75 The output of |
Polling result from asking 7 people in Research (3 ML, 1 PS, 1 DS, 2 RSE): all say the However, that raises an interesting point: consider julia> M = NamedDimsArray{(:features, :observations)}([1 3; 5 7])
2×2 NamedDimsArray(::Array{Int64,2}, (:features, :observations)):
→ observations
↓ features 1 3
5 7
julia> mean(M; dims=2)
2×1 NamedDimsArray(::Array{Float64,2}, (:features, :observations)):
→ observations
↓ features 2.0
6.0
julia> mean(eachslice(M; dims=2))
2-element NamedDimsArray(::Array{Float64,1}, (:features,)):
↓ features 2.0
6.0
julia> map(mean, eachslice(M; dims=2))
2-element Array{Float64,1}:
3.0
5.0 Notice that the convention using This raises the question: can the It could, but after the polling and discussion with people, I still think we should be consistent and use the
If you accept all of that, a lingering problem is that the general
|
In retrospect, I'm not a fan of using |
Can you elaborate more on why you aren't a fan anymore? |
Just that if our default behaviour is basically just running NOTE: This might be a relevant discussion on the differences. JuliaLang/julia#29146 |
I think @bencottier's survey provides the strongest reason(s) for favouring It would also be interesting to check our currently implemented Transforms (which I will do later) for examples where I suspect there won't be many examples... |
What happens? Can you open an issue in NamedDims.j and reference it here for later? |
SplitApplyCombine.jl restricts julia> using SplitApplyCombine, NamedDims
julia> M = NamedDimsArray{(:features, :observations)}([1 3; 5 7])
julia> map(x -> sum(x), splitdimsview(M, :features))
ERROR: MethodError: no method matching splitdimsview(::NamedDimsArray{(:features, :observations),Int64,2,Array{Int64,2}}, ::Symbol)
Closest candidates are:
splitdimsview(::AbstractArray) at /Users/bencottier/.julia/packages/SplitApplyCombine/RS5bI/src/splitdims.jl:127
splitdimsview(::AbstractArray, ::Int64) at /Users/bencottier/.julia/packages/SplitApplyCombine/RS5bI/src/splitdims.jl:129
splitdimsview(::AbstractArray{var"#s46",N} where var"#s46", ::Tuple{Vararg{Int64,M}}) where {N, M} at /Users/bencottier/.julia/packages/SplitApplyCombine/RS5bI/src/splitdims.jl:130
Stacktrace:
[1] top-level scope at REPL[4]:1
julia> map(x -> sum(x), splitdimsview(M, 1))
2-element Array{Int64,1}:
4
12
julia> map(x -> sum(x), eachslice(M; dims=:features))
2-element Array{Int64,1}:
4
12 |
Some more relevant links:
Some relevant packages: |
The
apply
method forAbstractArray
intransformers.jl
usesmapslices
to apply a transform on each slice of the array, along a given dimensiondims
.Meanwhile, the equivalent
apply!
method useseachslice
.The problem is that
mapslices
andeachslice
have opposite notions ofdims
. For example:For higher dimensions,
dims=3
ineachslice
is equivalent todims=[1, 2]
inmapslices
, for example (and note thateachslice
only supports a single dimension indims
).Note also that
Statistics.mean
andStatistics.std
uses the same notion ofdims
asmapslices
.We should adopt a consistent convention for the meaning of
dims
, and explain this clearly in documentation.The text was updated successfully, but these errors were encountered: