Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance regression in broadcasting with CartesianIndices on v1.6.0-DEV #38086

Closed
kimikage opened this issue Oct 18, 2020 · 9 comments · Fixed by #39333
Closed

Performance regression in broadcasting with CartesianIndices on v1.6.0-DEV #38086

kimikage opened this issue Oct 18, 2020 · 9 comments · Fixed by #39333
Labels
performance Must go faster regression Regression in behavior compared to a previous version

Comments

@kimikage
Copy link
Contributor

kimikage commented Oct 18, 2020

Although I haven't identified the cause, I've noticed ~10x slowdown in simple broadcasting operations on nightly.

julia> versioninfo() # just the last version I had cached, not bisected
Julia Version 1.6.0-DEV.1117
Commit 36effbe10a (2020-10-02 17:38 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-10.0.1 (ORCJIT, skylake)

julia> xu8 = rand(UInt8, 1000, 1000); yu8 = rand(UInt8, 1000, 1000);

julia> @btime $xu8 .+ $yu8;
  85.700 μs (2 allocations: 976.70 KiB)
julia> versioninfo()
Julia Version 1.6.0-DEV.1274
Commit 444aa87348 (2020-10-17 22:11 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.0 (ORCJIT, skylake)

julia> @btime $xu8 .+ $yu8;
  795.400 μs (2 allocations: 976.70 KiB)

julia> xu8a = rand(UInt8, 1000 * 10, 1000); yu8a = rand(UInt8, 1000 * 10, 1000);

julia> @btime $xu8a .+ $yu8a; # roughly proportional to the number of elements
  11.190 ms (2 allocations: 9.54 MiB)

julia> xu8b = rand(UInt8, 1000 ÷ 10, 1000); yu8b = rand(UInt8, 1000 ÷ 10, 1000);

julia> @btime $xu8b .+ $yu8b;
  86.100 μs (2 allocations: 97.77 KiB)
`@code_native` result for "1-D" arrays (It's somewhat misleading. See comments below.)
julia> @code_native broadcast(+, UInt8[], UInt8[]) # (**Edit:** FOR 1-D ARRAYS) at least it's SIMD vectorized.
...snip...
; ││││┌ @ simdloop.jl:77 within `macro expansion' @ broadcast.jl:932
; │││││┌ @ broadcast.jl:575 within `getindex'
; ││││││┌ @ broadcast.jl:620 within `_broadcast_getindex'
; │││││││┌ @ broadcast.jl:644 within `_getindex' @ broadcast.jl:645
; ││││││││┌ @ broadcast.jl:614 within `_broadcast_getindex'
; │││││││││┌ @ array.jl:809 within `getindex'
L1216:
        vmovdqu (%rcx,%rbx), %ymm0
        vmovdqu 32(%rcx,%rbx), %ymm1
        vmovdqu 64(%rcx,%rbx), %ymm2
        vmovdqu 96(%rcx,%rbx), %ymm3
; ││││││└└└└
; ││││││┌ @ broadcast.jl:621 within `_broadcast_getindex'
; │││││││┌ @ broadcast.jl:648 within `_broadcast_getindex_evalf'
; ││││││││┌ @ int.jl:87 within `+'
        vpaddb  (%r9,%rbx), %ymm0, %ymm0
        vpaddb  32(%r9,%rbx), %ymm1, %ymm1
        vpaddb  64(%r9,%rbx), %ymm2, %ymm2
        vpaddb  96(%r9,%rbx), %ymm3, %ymm3
; │││││└└└└
; │││││┌ @ array.jl:847 within `setindex!'
        vmovdqu %ymm0, (%rdx,%rbx)
        vmovdqu %ymm1, 32(%rdx,%rbx)
        vmovdqu %ymm2, 64(%rdx,%rbx)
        vmovdqu %ymm3, 96(%rdx,%rbx)
; ││││└└
; ││││┌ @ simdloop.jl:78 within `macro expansion'
; │││││┌ @ int.jl:87 within `+'
        subq    $-128, %rbx
        cmpq    %rbx, %rdi
        jne     L1216
; ││││└└
...snip...

The most noticeable difference is the LLVM version (i.e. 10 vs 11), but I have no evidence that the LLVM 11 is the cause at the moment.

@kimikage

This comment has been minimized.

@KristofferC KristofferC added performance Must go faster regression Regression in behavior compared to a previous version labels Oct 18, 2020
@kimikage
Copy link
Contributor Author

kimikage commented Oct 18, 2020

I just checked the code of the "innermost" loop with @code_native and didn't find anything wrong with it. In other words, this problem should not be perceptible with 1-D arrays.

julia> xu8d1 = rand(UInt8, 1000 * 1000); yu8d1 = rand(UInt8, 1000 * 1000);

julia> @btime $xu8d1 .+ $yu8d1;
  65.900 μs (2 allocations: 976.70 KiB)

Edit:
Wait! The notion that the innermost loop is independent of the dimensions of the array seems to have become a thing of the past. 😅

Perhaps this is not a problem with the LLVM backend, as shown by @code_llvm.

@kimikage
Copy link
Contributor Author

BTW, I've learned a bit about the Julia language features and the x86 native codes, but I'm not familiar with the process of generating the LLVM IR codes. So I would appreciate it if someone could help me with this cause analysis and fix.

@chriselrod
Copy link
Contributor

chriselrod commented Oct 23, 2020

FWIW, this fixes it for me:

@inline function Base.copyto!(dest::AbstractArray, bc::Base.Broadcast.Broadcasted{Nothing})
    axes(dest) == axes(bc) || Base.Broadcast.throwdm(axes(dest), axes(bc))
    # Performance optimization: broadcast!(identity, dest, A) is equivalent to copyto!(dest, A) if indices match
    if bc.f === identity && bc.args isa Tuple{AbstractArray} # only a single input argument to broadcast!
        A = bc.args[1]
        if axes(dest) == axes(A)
            return copyto!(dest, A)
        end
    end
    bc′ = Base.Broadcast.preprocess(dest, bc)
    @inbounds @simd for I in eachindex(bc′)
        @inbounds dest[I] = bc′[I]
    end
    return dest
end

the diff is adding an @inbounds in front of the @simd. Not sure why that should matter; could a check in eachindex(bc′) be interfering with SIMD?
The code_llvm showed that the function wasn't SIMD before this change, but is SIMD after, fixing the performance problem:

julia> xu8 = rand(UInt8, 1000, 1000); yu8 = rand(UInt8, 1000, 1000);

julia> @btime $xu8 .+ $yu8;
  829.650 μs (2 allocations: 976.70 KiB)

julia> @inline function Base.copyto!(dest::AbstractArray, bc::Base.Broadcast.Broadcasted{Nothing})
           axes(dest) == axes(bc) || Base.Broadcast.throwdm(axes(dest), axes(bc))
           # Performance optimization: broadcast!(identity, dest, A) is equivalent to copyto!(dest, A) if indices match
           if bc.f === identity && bc.args isa Tuple{AbstractArray} # only a single input argument to broadcast!
               A = bc.args[1]
               if axes(dest) == axes(A)
                   return copyto!(dest, A)
               end
           end
           bc′ = Base.Broadcast.preprocess(dest, bc)
           @inbounds @simd for I in eachindex(bc′)
               @inbounds dest[I] = bc′[I]
           end
           return dest
       end

julia> @btime $xu8 .+ $yu8;
  162.681 μs (2 allocations: 976.70 KiB)

julia> @btime $xu8 .+ $yu8;
  145.871 μs (2 allocations: 976.70 KiB)

Maybe the check hits a heuristic limit and decides not to emit all the different SIMD loop versions (for different possible broadcasted dimensions), instead doing none at all?

@kimikage
Copy link
Contributor Author

I'm slogging on the next minor version release of FixedPointNumbers, but its benchmarks are useless due to this problem. 😭
I hope to have this issue resolved before Julia v1.6 is released.

After all this time I have checked the details of @chriselrod's workaround.
The @inbounds in front of @simd suppresses the boundary check of getindex in simd_index.

@propagate_inbounds function simd_index(iter::CartesianIndices, Ilast::CartesianIndex, I1::Int)
CartesianIndex(getindex(iter.indices[1], I1+first(Base.axes1(iter.indices[1]))), Ilast.I...)
end

Looking at #37829 (comment), it seems that the default boundary check is intentional.
However, if the length of the inner iterator (iter.indices[1]) is implemented correctly, it seems that getindex will not cause boundary errors as long as it is via @simd.

cc: @johnnychen94, @timholy

@kimikage kimikage changed the title Performance regression in simple broadcasting operations on v1.6.0-DEV during Oct. 2020 Performance regression in broadcasting with CartesianIndices on v1.6.0-DEV Jan 20, 2021
@kimikage
Copy link
Contributor Author

BTW, I am wondering if first(Base.axes1(iter.indices[1])) can be replaced by the more general function firstindex.

@KristofferC
Copy link
Member

I thnk it is fine to just move out the @inbounds in front of the @simd, at least as a first step to fix the reggression?

@kimikage
Copy link
Contributor Author

I grep'd in this repository and found that @simd is mostly used with ranges, and in only few cases it can be used with CartesianIndices.

For this reason, I think @inbounds @simd is a good first step. My concern is that I don't know how @simd is used in packages or user codes, and the second step may be forgotten. 😅

I would like to submit a PR and discuss it there.

@KristofferC
Copy link
Member

My concern is that I don't know how @simd is used in packages or user codes, and the second step may be forgotten.

I agree, but let's split it into two parts. One is the immediate fix here (moving the @inbounds) and the second is the discussion of how to structure things so that it might not be needed in the future.

N5N3 added a commit to N5N3/julia that referenced this issue Nov 21, 2021
move `@inbounds` outside the loop body. see JuliaLang#38086
N5N3 added a commit to N5N3/julia that referenced this issue Nov 21, 2021
move `@inbounds` outside the loop body. see JuliaLang#38086
N5N3 added a commit to N5N3/julia that referenced this issue Nov 21, 2021
move `@inbounds` outside the loop body. see JuliaLang#38086
N5N3 added a commit to N5N3/julia that referenced this issue Nov 22, 2021
move `@inbounds` outside the loop body. see JuliaLang#38086
N5N3 added a commit to N5N3/julia that referenced this issue Nov 28, 2021
move `@inbounds` outside the loop body. see JuliaLang#38086
N5N3 added a commit to N5N3/julia that referenced this issue Nov 28, 2021
move `@inbounds` outside the loop body. see JuliaLang#38086
N5N3 added a commit to N5N3/julia that referenced this issue Nov 28, 2021
move `@inbounds` outside the loop body. see JuliaLang#38086
N5N3 added a commit to N5N3/julia that referenced this issue Nov 28, 2021
move `@inbounds` outside the loop body. see JuliaLang#38086
N5N3 added a commit to N5N3/julia that referenced this issue Nov 30, 2021
revert changes in reshapedarray.jl
use Iterators.rest

Update broadcast.jl

move `@inbounds` outside the loop body. see JuliaLang#38086
N5N3 added a commit to N5N3/julia that referenced this issue Dec 1, 2021
revert changes in reshapedarray.jl
use Iterators.rest

Update broadcast.jl

move `@inbounds` outside the loop body. see JuliaLang#38086
N5N3 added a commit to N5N3/julia that referenced this issue Dec 2, 2021
revert changes in reshapedarray.jl
use Iterators.rest

Update broadcast.jl

move `@inbounds` outside the loop body. see JuliaLang#38086
N5N3 added a commit to N5N3/julia that referenced this issue Dec 4, 2021
revert changes in reshapedarray.jl
use Iterators.rest

Update broadcast.jl

move `@inbounds` outside the loop body. see JuliaLang#38086
N5N3 added a commit to N5N3/julia that referenced this issue Dec 4, 2021
revert changes in reshapedarray.jl
use Iterators.rest

Update broadcast.jl

move `@inbounds` outside the loop body. see JuliaLang#38086
N5N3 added a commit to N5N3/julia that referenced this issue Dec 5, 2021
revert changes in reshapedarray.jl
use Iterators.rest

Update broadcast.jl

move `@inbounds` outside the loop body. see JuliaLang#38086
N5N3 added a commit to N5N3/julia that referenced this issue Feb 5, 2022
move `@inbounds` outside the loop body. see JuliaLang#38086

Update multidimensional.jl

Update multidimensional.jl

Update multidimensional.jl

Update multidimensional.jl
N5N3 added a commit to N5N3/julia that referenced this issue Feb 5, 2022
move `@inbounds` outside the loop body. see JuliaLang#38086

Update multidimensional.jl

Update multidimensional.jl

Update multidimensional.jl

Update multidimensional.jl
N5N3 added a commit to N5N3/julia that referenced this issue Feb 6, 2022
move `@inbounds` outside the loop body. see JuliaLang#38086

Update multidimensional.jl

Update multidimensional.jl

Update multidimensional.jl

Update multidimensional.jl
N5N3 added a commit to N5N3/julia that referenced this issue Feb 9, 2022
move `@inbounds` outside the loop body. see JuliaLang#38086

Update multidimensional.jl

Update multidimensional.jl

Update multidimensional.jl

Update multidimensional.jl
N5N3 added a commit to N5N3/julia that referenced this issue Feb 9, 2022
move `@inbounds` outside the loop body. see JuliaLang#38086

Update multidimensional.jl

Update multidimensional.jl

Update multidimensional.jl

Update multidimensional.jl
N5N3 added a commit to N5N3/julia that referenced this issue Feb 9, 2022
move `@inbounds` outside the loop body. see JuliaLang#38086

Update multidimensional.jl

Update multidimensional.jl

Update multidimensional.jl

Update multidimensional.jl
N5N3 added a commit to N5N3/julia that referenced this issue Feb 9, 2022
move `@inbounds` outside the loop body. see JuliaLang#38086

Update multidimensional.jl

Update multidimensional.jl

Update multidimensional.jl

Update multidimensional.jl
N5N3 added a commit to N5N3/julia that referenced this issue Feb 9, 2022
move `@inbounds` outside the loop body. see JuliaLang#38086

Update multidimensional.jl

Update multidimensional.jl

Update multidimensional.jl

Update multidimensional.jl
N5N3 added a commit to N5N3/julia that referenced this issue Feb 9, 2022
move `@inbounds` outside the loop body. see JuliaLang#38086

Update multidimensional.jl

Update multidimensional.jl

Update multidimensional.jl

Update multidimensional.jl
N5N3 added a commit to N5N3/julia that referenced this issue Feb 9, 2022
move `@inbounds` outside the loop body. see JuliaLang#38086

Update multidimensional.jl

Update multidimensional.jl

Update multidimensional.jl

Update multidimensional.jl
N5N3 added a commit to N5N3/julia that referenced this issue Feb 10, 2022
move `@inbounds` outside the loop body. see JuliaLang#38086

Update multidimensional.jl

Update multidimensional.jl

Update multidimensional.jl

Update multidimensional.jl
N5N3 added a commit to N5N3/julia that referenced this issue Feb 11, 2022
move `@inbounds` outside the loop body. see JuliaLang#38086

Update multidimensional.jl

Update multidimensional.jl

Update multidimensional.jl

Update multidimensional.jl
N5N3 added a commit to N5N3/julia that referenced this issue Feb 11, 2022
move `@inbounds` outside the loop body. see JuliaLang#38086

Update multidimensional.jl

Update multidimensional.jl

Update multidimensional.jl

Update multidimensional.jl
N5N3 added a commit to N5N3/julia that referenced this issue Feb 11, 2022
move `@inbounds` outside the loop body. see JuliaLang#38086

Update multidimensional.jl

Update multidimensional.jl

Update multidimensional.jl

Update multidimensional.jl
N5N3 added a commit to N5N3/julia that referenced this issue Feb 11, 2022
move `@inbounds` outside the loop body. see JuliaLang#38086

Update multidimensional.jl

Update multidimensional.jl

Update multidimensional.jl

Update multidimensional.jl
N5N3 added a commit to N5N3/julia that referenced this issue Feb 12, 2022
move `@inbounds` outside the loop body. see JuliaLang#38086

Update multidimensional.jl

Update multidimensional.jl

Update multidimensional.jl

Update multidimensional.jl
N5N3 added a commit to N5N3/julia that referenced this issue Feb 12, 2022
move `@inbounds` outside the loop body. see JuliaLang#38086

Update multidimensional.jl

Update multidimensional.jl

Update multidimensional.jl

Update multidimensional.jl
N5N3 added a commit to N5N3/julia that referenced this issue Feb 12, 2022
move `@inbounds` outside the loop body. see JuliaLang#38086

Update multidimensional.jl

Update multidimensional.jl

Update multidimensional.jl

Update multidimensional.jl
N5N3 added a commit to N5N3/julia that referenced this issue Feb 13, 2022
move `@inbounds` outside the loop body. see JuliaLang#38086

Update multidimensional.jl

Update multidimensional.jl

Update multidimensional.jl

Update multidimensional.jl
N5N3 added a commit to N5N3/julia that referenced this issue Feb 13, 2022
move `@inbounds` outside the loop body. see JuliaLang#38086

Update multidimensional.jl

Update multidimensional.jl

Update multidimensional.jl

Update multidimensional.jl
N5N3 added a commit to N5N3/julia that referenced this issue Feb 13, 2022
move `@inbounds` outside the loop body. see JuliaLang#38086

Update multidimensional.jl

Update multidimensional.jl

Update multidimensional.jl

Update multidimensional.jl
N5N3 added a commit to N5N3/julia that referenced this issue Feb 13, 2022
move `@inbounds` outside the loop body. see JuliaLang#38086

Update multidimensional.jl

Update multidimensional.jl

Update multidimensional.jl

Update multidimensional.jl
N5N3 added a commit to N5N3/julia that referenced this issue Feb 13, 2022
move `@inbounds` outside the loop body. see JuliaLang#38086

Update multidimensional.jl

Update multidimensional.jl

Update multidimensional.jl

Update multidimensional.jl
N5N3 added a commit to N5N3/julia that referenced this issue Feb 14, 2022
move `@inbounds` outside the loop body. see JuliaLang#38086

Update multidimensional.jl

Update multidimensional.jl

Update multidimensional.jl

Update multidimensional.jl
N5N3 added a commit to N5N3/julia that referenced this issue Feb 14, 2022
move `@inbounds` outside the loop body. see JuliaLang#38086

Update multidimensional.jl

Update multidimensional.jl

Update multidimensional.jl

Update multidimensional.jl
N5N3 added a commit to N5N3/julia that referenced this issue Feb 15, 2022
move `@inbounds` outside the loop body. see JuliaLang#38086

Update multidimensional.jl

Update multidimensional.jl

Update multidimensional.jl

Update multidimensional.jl
N5N3 added a commit to N5N3/julia that referenced this issue Feb 15, 2022
move `@inbounds` outside the loop body. see JuliaLang#38086

Update multidimensional.jl

Update multidimensional.jl

Update multidimensional.jl

Update multidimensional.jl
N5N3 added a commit to N5N3/julia that referenced this issue Feb 15, 2022
move `@inbounds` outside the loop body. see JuliaLang#38086

Update multidimensional.jl

Update multidimensional.jl

Update multidimensional.jl

Update multidimensional.jl
N5N3 added a commit to N5N3/julia that referenced this issue Feb 16, 2022
move `@inbounds` outside the loop body. see JuliaLang#38086

Update multidimensional.jl

Update multidimensional.jl

Update multidimensional.jl

Update multidimensional.jl
KristofferC pushed a commit to N5N3/julia that referenced this issue Feb 25, 2022
move `@inbounds` outside the loop body. see JuliaLang#38086
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster regression Regression in behavior compared to a previous version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants