Sparse matrix: fix fast implementation of findnext and findprev for cartesian coordinates #32007

tkluck · 2019-05-12T09:45:25Z

#23317 introduced special cases for findnext in cases of sparse vectors/matrices. The case of matrices was inadvertently broken when CartesianIndex was introduced. This pull request fixes that.

In addition, a recent pull request (#31354) identified a need for findprev(f, m) to have a sparse implementation not only for the case f == !iszero, but also for any f that has f(zero(eltype(m))) equal to false. In particular, the hash function is a heavy user of findprev(!isequal(x), m) for different values of x, including x == 0. This is part of this branch, as well.

I feel incredibly bad about submitting this pull request here as one of its commits is a revert of #31354 . These are my reasons for taking that step:

In the case of eltype(m) == BigInt (or any other eltype that exists on the heap), the test for f(zero(...)) does an allocation which quickly becomes a major bottleneck.
fixing/extending the existing logic from Sparse vector/matrix: add fast implementation of find_next and find_prev (fixed) #23317 seems more sensible than duplicating functionality.

I did make a point of including the new test cases from the other pull request. For example, that ensures that these functions work with

w = [ "a" ""; "" "b"]
w_sp = sparse(w)

as their argument.

…#31354)" This seems to duplicate work from JuliaLang#23317 and it causes performance degradation in the cases that one was designed for. See JuliaLang#31354 (comment) This reverts commit e0bef65.

ViralBShah · 2019-05-13T05:01:32Z

Pinging @mbauman @andreasnoack for review.

KristofferC · 2019-05-13T10:58:12Z

We need to decide what to do for 1.2 here.

Either leave things as is, revert #31354, or merge this and backport to the 1.2 branch.

ViralBShah · 2019-05-13T12:24:25Z

Does this maintain the performance improvements for the common case described in #31354? If so, let's merge this, and also add test cases to prevent regression. My preference is then to backport.

tkluck · 2019-05-13T12:38:00Z

Yes, it maintains the same behaviour of hash time scaling with fill rate.

stdlib/SparseArrays/src/sparsematrix.jl

stdlib/SparseArrays/test/sparse.jl

mbauman · 2019-05-13T23:07:01Z

Ah, ok, this is indeed much simpler. Just to wrap my own head around what happened here:

We initially just had one findnext(f::typeof(!iszero), v::AbstractSparseArray, i::Integer), which used _sparse_findnextnz to do its work. This was sub-optimal for two reasons: it didn't work for CartesianIndices (after the find-to-keys-ificiation), and it only handled !iszero.
sparse findnext findprev hash performance improved #31354 fixed this by adding an additional findnext(pred::Function, A::SparseArrays.SparseMatrixCSC, ij::CartesianIndex{2}) method and essentially a parallel _sparse_findnextnz system that worked on CartesianIndices instead of Integers.
This PR fixes up the original system, making _sparse_findnextnz work with CartesianIndices. I thought it should be quicker, too, but it looks like that's not the case in this simple spot-check:

julia> using Revise, SparseArrays, BenchmarkTools

julia> Revise.track(SparseArrays)

julia> A = spzeros(10000,10000);

julia> A[end,end]= 1
1

julia> @benchmark findnext(isequal(0), $A, CartesianIndex(1,1))
BenchmarkTools.Trial:
  memory estimate:  240 bytes
  allocs estimate:  5
  --------------
  minimum time:     700.908 ns (0.00% GC)
  median time:      752.756 ns (0.00% GC)
  mean time:        812.077 ns (9.16% GC)
  maximum time:     448.030 μs (99.81% GC)
  --------------
  samples:          10000
  evals/sample:     119

julia> @benchmark findnext(!isequal(0), $A, CartesianIndex(1,1))
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     27.390 ns (0.00% GC)
  median time:      29.360 ns (0.00% GC)
  mean time:        28.993 ns (0.00% GC)
  maximum time:     47.038 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     995

shell> git merge --no-commit pr/32007
Automatic merge went well; stopped before committing as requested

julia> @benchmark findnext(isequal(0), $A, CartesianIndex(1,1))
BenchmarkTools.Trial:
  memory estimate:  240 bytes
  allocs estimate:  5
  --------------
  minimum time:     683.640 ns (0.00% GC)
  median time:      736.847 ns (0.00% GC)
  mean time:        791.763 ns (8.20% GC)
  maximum time:     376.179 μs (99.77% GC)
  --------------
  samples:          10000
  evals/sample:     150

julia> @benchmark findnext(!isequal(0), $A, CartesianIndex(1,1))
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     6.755 μs (0.00% GC)
  median time:      6.774 μs (0.00% GC)
  mean time:        6.955 μs (0.00% GC)
  maximum time:     13.100 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     5

JeffBezanson · 2019-05-14T14:58:26Z

This could use extra test cases to cover what broke.

KristofferC · 2019-05-14T16:18:24Z

I think it was only performance?

tkluck · 2019-05-18T15:45:19Z

@mbauman yes, your summary sums it up. As for your spot benchmarks -- I identified the reason for the slower performance and am about to add a fix for them to this branch.

…type(...))

@mbauman

Thanks to @mbauman for spotting this issue in JuliaLang#32007 (comment).

tkluck · 2019-05-18T16:10:47Z

@JeffBezanson and @KristofferC , yes it was only performance.

@mbauman

…artesian coordinates (#32007) Revert "sparse findnext findprev hash performance improved (#31354)" This seems to duplicate work from #23317 and it causes performance degradation in the cases that one was designed for. See #31354 (comment) This reverts commit e0bef65. Thanks to @mbauman for spotting this issue in #32007 (comment). (cherry picked from commit ec797ef)

KristofferC · 2019-07-16T10:28:34Z

This wasn't backported to RC2 because it still had the triage label on. I assume that we just go with this PR though.

@mbauman

…artesian coordinates (#32007) Revert "sparse findnext findprev hash performance improved (#31354)" This seems to duplicate work from #23317 and it causes performance degradation in the cases that one was designed for. See JuliaLang/julia#31354 (comment) This reverts commit 8623d9a. Thanks to @mbauman for spotting this issue in JuliaLang/julia#32007 (comment).

tkluck added 2 commits May 12, 2019 11:43

Revert "sparse findnext findprev hash performance improved (JuliaLang…

310a47a

…#31354)" This seems to duplicate work from JuliaLang#23317 and it causes performance degradation in the cases that one was designed for. See JuliaLang#31354 (comment) This reverts commit e0bef65.

_sparse_findnextnz: work with CartesianIndex

c45eff0

tkluck mentioned this pull request May 12, 2019

sparse findnext findprev hash performance improved #31354

Merged

ViralBShah requested a review from mbauman May 13, 2019 05:01

KristofferC added this to the 1.2 milestone May 13, 2019

KristofferC added the triage This should be discussed on a triage call label May 13, 2019

ViralBShah added the sparse Sparse arrays label May 13, 2019

mbauman reviewed May 13, 2019

View reviewed changes

stdlib/SparseArrays/src/sparsematrix.jl Show resolved Hide resolved

stdlib/SparseArrays/test/sparse.jl Outdated Show resolved Hide resolved

JeffBezanson added the backport 1.2 label May 16, 2019

KlausC and others added 3 commits May 18, 2019 17:50

sparse findXXX: cherry-pick test cases from JuliaLang#31354

54802d1

sparse findprev/next: skip zeros if the predicate is false on zero(el…

5b73a97

…type(...))

_sparse_findnextnz: search m.colptr using searchsorted for performance

9c3a15b

Thanks to @mbauman for spotting this issue in JuliaLang#32007 (comment).

tkluck force-pushed the sparse-find-next-v2 branch from 6a0d49b to 9c3a15b Compare May 18, 2019 16:00

JeffBezanson merged commit ec797ef into JuliaLang:master May 23, 2019

KristofferC removed the triage This should be discussed on a triage call label Jul 16, 2019

KristofferC mentioned this pull request Jul 16, 2019

Backports for 1.2.0 release #32592

Merged

14 tasks

JeffBezanson removed the backport 1.2 label Aug 5, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sparse matrix: fix fast implementation of findnext and findprev for cartesian coordinates #32007

Sparse matrix: fix fast implementation of findnext and findprev for cartesian coordinates #32007

tkluck commented May 12, 2019

ViralBShah commented May 13, 2019

KristofferC commented May 13, 2019

ViralBShah commented May 13, 2019 •

edited

Loading

tkluck commented May 13, 2019

mbauman commented May 13, 2019

JeffBezanson commented May 14, 2019

KristofferC commented May 14, 2019

tkluck commented May 18, 2019

tkluck commented May 18, 2019

KristofferC commented Jul 16, 2019

Sparse matrix: fix fast implementation of findnext and findprev for cartesian coordinates #32007

Sparse matrix: fix fast implementation of findnext and findprev for cartesian coordinates #32007

Conversation

tkluck commented May 12, 2019

ViralBShah commented May 13, 2019

KristofferC commented May 13, 2019

ViralBShah commented May 13, 2019 • edited Loading

tkluck commented May 13, 2019

mbauman commented May 13, 2019

JeffBezanson commented May 14, 2019

KristofferC commented May 14, 2019

tkluck commented May 18, 2019

tkluck commented May 18, 2019

KristofferC commented Jul 16, 2019

ViralBShah commented May 13, 2019 •

edited

Loading