Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sparse matrix: fix fast implementation of findnext and findprev for cartesian coordinates #32007

Merged
merged 5 commits into from
May 23, 2019

Conversation

tkluck
Copy link
Contributor

@tkluck tkluck commented May 12, 2019

#23317 introduced special cases for findnext in cases of sparse vectors/matrices. The case of matrices was inadvertently broken when CartesianIndex was introduced. This pull request fixes that.

In addition, a recent pull request (#31354) identified a need for findprev(f, m) to have a sparse implementation not only for the case f == !iszero, but also for any f that has f(zero(eltype(m))) equal to false. In particular, the hash function is a heavy user of findprev(!isequal(x), m) for different values of x, including x == 0. This is part of this branch, as well.

I feel incredibly bad about submitting this pull request here as one of its commits is a revert of #31354 . These are my reasons for taking that step:

I did make a point of including the new test cases from the other pull request. For example, that ensures that these functions work with

w = [ "a" ""; "" "b"]
w_sp = sparse(w)

as their argument.

tkluck added 2 commits May 12, 2019 11:43
…#31354)"

This seems to duplicate work from JuliaLang#23317 and it causes performance
degradation in the cases that one was designed for. See
JuliaLang#31354 (comment)

This reverts commit e0bef65.
@ViralBShah
Copy link
Member

Pinging @mbauman @andreasnoack for review.

@ViralBShah ViralBShah requested a review from mbauman May 13, 2019 05:01
@KristofferC KristofferC added this to the 1.2 milestone May 13, 2019
@KristofferC KristofferC added the triage This should be discussed on a triage call label May 13, 2019
@KristofferC
Copy link
Member

We need to decide what to do for 1.2 here.

Either leave things as is, revert #31354, or merge this and backport to the 1.2 branch.

@ViralBShah
Copy link
Member

ViralBShah commented May 13, 2019

Does this maintain the performance improvements for the common case described in #31354? If so, let's merge this, and also add test cases to prevent regression. My preference is then to backport.

@ViralBShah ViralBShah added the sparse Sparse arrays label May 13, 2019
@tkluck
Copy link
Contributor Author

tkluck commented May 13, 2019

Yes, it maintains the same behaviour of hash time scaling with fill rate.

@mbauman
Copy link
Member

mbauman commented May 13, 2019

Ah, ok, this is indeed much simpler. Just to wrap my own head around what happened here:

  • We initially just had one findnext(f::typeof(!iszero), v::AbstractSparseArray, i::Integer), which used _sparse_findnextnz to do its work. This was sub-optimal for two reasons: it didn't work for CartesianIndices (after the find-to-keys-ificiation), and it only handled !iszero.
  • sparse findnext findprev hash performance improved #31354 fixed this by adding an additional findnext(pred::Function, A::SparseArrays.SparseMatrixCSC, ij::CartesianIndex{2}) method and essentially a parallel _sparse_findnextnz system that worked on CartesianIndices instead of Integers.
  • This PR fixes up the original system, making _sparse_findnextnz work with CartesianIndices. I thought it should be quicker, too, but it looks like that's not the case in this simple spot-check:
julia> using Revise, SparseArrays, BenchmarkTools

julia> Revise.track(SparseArrays)

julia> A = spzeros(10000,10000);

julia> A[end,end]= 1
1

julia> @benchmark findnext(isequal(0), $A, CartesianIndex(1,1))
BenchmarkTools.Trial:
  memory estimate:  240 bytes
  allocs estimate:  5
  --------------
  minimum time:     700.908 ns (0.00% GC)
  median time:      752.756 ns (0.00% GC)
  mean time:        812.077 ns (9.16% GC)
  maximum time:     448.030 μs (99.81% GC)
  --------------
  samples:          10000
  evals/sample:     119

julia> @benchmark findnext(!isequal(0), $A, CartesianIndex(1,1))
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     27.390 ns (0.00% GC)
  median time:      29.360 ns (0.00% GC)
  mean time:        28.993 ns (0.00% GC)
  maximum time:     47.038 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     995

shell> git merge --no-commit pr/32007
Automatic merge went well; stopped before committing as requested

julia> @benchmark findnext(isequal(0), $A, CartesianIndex(1,1))
BenchmarkTools.Trial:
  memory estimate:  240 bytes
  allocs estimate:  5
  --------------
  minimum time:     683.640 ns (0.00% GC)
  median time:      736.847 ns (0.00% GC)
  mean time:        791.763 ns (8.20% GC)
  maximum time:     376.179 μs (99.77% GC)
  --------------
  samples:          10000
  evals/sample:     150

julia> @benchmark findnext(!isequal(0), $A, CartesianIndex(1,1))
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     6.755 μs (0.00% GC)
  median time:      6.774 μs (0.00% GC)
  mean time:        6.955 μs (0.00% GC)
  maximum time:     13.100 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     5

@JeffBezanson
Copy link
Member

This could use extra test cases to cover what broke.

@KristofferC
Copy link
Member

I think it was only performance?

@tkluck
Copy link
Contributor Author

tkluck commented May 18, 2019

@mbauman yes, your summary sums it up. As for your spot benchmarks -- I identified the reason for the slower performance and am about to add a fix for them to this branch.

@tkluck tkluck force-pushed the sparse-find-next-v2 branch from 6a0d49b to 9c3a15b Compare May 18, 2019 16:00
@tkluck
Copy link
Contributor Author

tkluck commented May 18, 2019

@JeffBezanson and @KristofferC , yes it was only performance.

@JeffBezanson JeffBezanson merged commit ec797ef into JuliaLang:master May 23, 2019
@KristofferC KristofferC removed the triage This should be discussed on a triage call label Jul 16, 2019
KristofferC pushed a commit that referenced this pull request Jul 16, 2019
…artesian coordinates (#32007)

Revert "sparse findnext findprev hash performance improved (#31354)"

This seems to duplicate work from #23317 and it causes performance
degradation in the cases that one was designed for. See
#31354 (comment)

This reverts commit e0bef65.

Thanks to @mbauman for spotting this issue in
#32007 (comment).

(cherry picked from commit ec797ef)
@KristofferC KristofferC mentioned this pull request Jul 16, 2019
14 tasks
@KristofferC
Copy link
Member

This wasn't backported to RC2 because it still had the triage label on. I assume that we just go with this PR though.

KristofferC pushed a commit to JuliaSparse/SparseArrays.jl that referenced this pull request Nov 2, 2021
…artesian coordinates (#32007)

Revert "sparse findnext findprev hash performance improved (#31354)"

This seems to duplicate work from #23317 and it causes performance
degradation in the cases that one was designed for. See
JuliaLang/julia#31354 (comment)

This reverts commit 8623d9a.

Thanks to @mbauman for spotting this issue in
JuliaLang/julia#32007 (comment).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sparse Sparse arrays
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants