fast inplace broadcasting multiplication of SparseMatrixCSC and a Vector #47

oxinabox · 2018-10-05T13:12:48Z

I would like to know if this would be a good PR to the SparseArrays stdlib
I can't workout what this operation is called.
It does A.*b inplace for A::SparseMatrixCSC and a b::AbstractVector.

It seems loosely relevant to JuliaLang/julia#26561 (in that i was thinking about at that problem when I wrote it)

Inplace operations can avoid allocating memory so is faster.

using SparseArrays

function sparse_column_vecmul!(A::SparseMatrixCSC, x::AbstractVector{T}) where T
    size(A,2)==length(x) || DimensionMismatch()
    cols, rows, vals = findnz(A);
    
    x_ii=1
    x_val = @inbounds x[x_ii]
    rows_to_nan = Int64[]
    for A_ii in 1:length(rows)
        col= @inbounds cols[A_ii]
        row= @inbounds rows[A_ii]
        if row > x_ii #Note that our result is row sorted
            x_ii+=1
            x_val = @inbounds x[x_ii]
            if !isfinite(x_val) 
                # Got to deal with this later, row will become dense.
                push!(rows_to_nan, row)
            end
        end
        @inbounds vals[A_ii]*=x_val
    end

    # Go back and NaN any rows we have to
    for row in rows_to_nan
        for col in SparseArrays.nonzeroinds(@view(A[:,row]))
            # don't do the ones we already hit as they may be Inf (or NaN)
            @inbounds A[row,col] = T(NaN)
        end
    end
    
    A
end

Benchmarks

using BenchmarkTools
A = sprand(100,10,0.1)
x = rand(100)

@btime A.*x; 7.920 μs (17 allocations: 22.58 KiB)
@btime sparse_column_vecmul!(A, x) 1.044 μs (4 allocations: 2.47 KiB)

Not a perfectly fair comparison as A was being mutated but i doubt that changed the timing.

over 7x speedup is not to be sneered at given how big sparse matrixes become.

The text was updated successfully, but these errors were encountered:

mbauman · 2018-10-16T19:47:25Z

I'm not sure about what to call this function, but could we possibly do a pigeon-hole sort of optimization within the broadcast! definition itself when we detect this case?

bcsj · 2022-04-13T16:44:53Z

I guess I found a similar bottleneck today. But I think the improved computation can potentially be done in much fewer lines of code.
https://discourse.julialang.org/t/speeding-up-elementwise-vector-sparsematrixcsc-multiplication-broadcasting/79437

stevengj · 2022-04-14T19:16:18Z

I can't workout what this operation is called.

mul!(A, Diagonal(b), A)

?

KristofferC transferred this issue from JuliaLang/julia Jan 14, 2022

ViralBShah added the performance label Jan 17, 2022

pitsianis mentioned this issue Jun 22, 2024

Row-wise and column-wise scaling of a sparse matrix runs out of memory #543

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fast inplace broadcasting multiplication of SparseMatrixCSC and a Vector #47

fast inplace broadcasting multiplication of SparseMatrixCSC and a Vector #47

oxinabox commented Oct 5, 2018

mbauman commented Oct 16, 2018

bcsj commented Apr 13, 2022

stevengj commented Apr 14, 2022

fast inplace broadcasting multiplication of SparseMatrixCSC and a Vector #47

fast inplace broadcasting multiplication of SparseMatrixCSC and a Vector #47

Comments

oxinabox commented Oct 5, 2018

Benchmarks

mbauman commented Oct 16, 2018

bcsj commented Apr 13, 2022

stevengj commented Apr 14, 2022