Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fast inplace broadcasting multiplication of SparseMatrixCSC and a Vector #47

Open
oxinabox opened this issue Oct 5, 2018 · 3 comments

Comments

@oxinabox
Copy link
Contributor

oxinabox commented Oct 5, 2018

I would like to know if this would be a good PR to the SparseArrays stdlib
I can't workout what this operation is called.
It does A.*b inplace for A::SparseMatrixCSC and a b::AbstractVector.

It seems loosely relevant to JuliaLang/julia#26561 (in that i was thinking about at that problem when I wrote it)

Inplace operations can avoid allocating memory so is faster.

using SparseArrays

function sparse_column_vecmul!(A::SparseMatrixCSC, x::AbstractVector{T}) where T
    size(A,2)==length(x) || DimensionMismatch()
    cols, rows, vals = findnz(A);
    
    x_ii=1
    x_val = @inbounds x[x_ii]
    rows_to_nan = Int64[]
    for A_ii in 1:length(rows)
        col= @inbounds cols[A_ii]
        row= @inbounds rows[A_ii]
        if row > x_ii #Note that our result is row sorted
            x_ii+=1
            x_val = @inbounds x[x_ii]
            if !isfinite(x_val) 
                # Got to deal with this later, row will become dense.
                push!(rows_to_nan, row)
            end
        end
        @inbounds vals[A_ii]*=x_val
    end

    # Go back and NaN any rows we have to
    for row in rows_to_nan
        for col in SparseArrays.nonzeroinds(@view(A[:,row]))
            # don't do the ones we already hit as they may be Inf (or NaN)
            @inbounds A[row,col] = T(NaN)
        end
    end
    
    A
end

Benchmarks

using BenchmarkTools
A = sprand(100,10,0.1)
x = rand(100)
  • @btime A.*x; 7.920 μs (17 allocations: 22.58 KiB)
  • @btime sparse_column_vecmul!(A, x) 1.044 μs (4 allocations: 2.47 KiB)

Not a perfectly fair comparison as A was being mutated but i doubt that changed the timing.

over 7x speedup is not to be sneered at given how big sparse matrixes become.

@mbauman
Copy link
Contributor

mbauman commented Oct 16, 2018

I'm not sure about what to call this function, but could we possibly do a pigeon-hole sort of optimization within the broadcast! definition itself when we detect this case?

@KristofferC KristofferC transferred this issue from JuliaLang/julia Jan 14, 2022
@bcsj
Copy link

bcsj commented Apr 13, 2022

I guess I found a similar bottleneck today. But I think the improved computation can potentially be done in much fewer lines of code.
https://discourse.julialang.org/t/speeding-up-elementwise-vector-sparsematrixcsc-multiplication-broadcasting/79437

@stevengj
Copy link
Contributor

I can't workout what this operation is called.

mul!(A, Diagonal(b), A)

?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants