Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Matrix Multiplication API #473

Closed
jebej opened this issue Sep 28, 2017 · 208 comments
Closed

Matrix Multiplication API #473

jebej opened this issue Sep 28, 2017 · 208 comments

Comments

@jebej
Copy link
Contributor

jebej commented Sep 28, 2017

Currently, there is the following line in the sparse matmult code:

https://github.com/JuliaLang/julia/blob/056b374919e11977d5a8d57b446ad1c72f3e6b1d/base/sparse/linalg.jl#L94-L95

I am assuming that this means we want to have the more general A_mul_B*!(α,A,B,β,C) = αAB + βC methods which overwrite C (the BLAS gemm API) for dense arrays. Is that still the case? (It also seems possible to keep both APIs, i.e. keep the A_mul_B*!(C,A,B) methods, which would simply be defined as A_mul_B*!(C,A,B) = A_mul_B*!(1,A,B,0,C).)

I would personally like to have the gemm API defined for all array types (this has been expressed elsewhere). Implementing these methods for dense arrays seems fairly simple, since they would just directly call gemm et al. The sparse case is already implemented. The only real modification would be to the pure julia generic matmult, which does not accept α and β arguments.

This would lead to generic code that works with any type of array/number. I currently have a simple implementation of expm (after having done the modification to _generic_matmatmult!) which works with bigfloats and sparse arrays.

@Sacha0
Copy link
Member

Sacha0 commented Sep 28, 2017

Ref. #160, JuliaLang/julia#20053, #464. Best!

@jebej
Copy link
Contributor Author

jebej commented Sep 28, 2017

Thanks for the references. I suppose this issue has more to do with adding the gemm-style methods than an API revamp, but it can be closed if we think it still is too similar to #160.

@jebej
Copy link
Contributor Author

jebej commented Sep 30, 2017

As a starting point, would there be support for having _generic_matmatmul! have the gemm API? It's a fairly simple change, and purely additive/nonbreaking, since the current method would simply be implemented by having α=1 and β=0. I can make the PR. I'd probably go similarly to this version (in that version I cut all the transpose stuff because I didn't need it, I wouldn't do that here).

@andreasnoack
Copy link
Member

Yes. That would be a good start. We need to consider the ordering of the arguments though. Originally, I thought it was more natural to follow the BLAS ordering but we fairly consistent about having output arguments first which is also the case for the current three-argument A_mul_B!. Furthermore and as you have already pointed out, the three-argument version would correspond to the five-argument version with α=1 and β=0 and default value arguments are last. Of course, we don't necessarily have to use the default value syntax for this but it would make sense to use it here.

@StefanKarpinski
Copy link
Member

Why not just introduce a generic gemm function?

@jebej
Copy link
Contributor Author

jebej commented Oct 3, 2017

Yes. That would be a good start. We need to consider the ordering of the arguments though. Originally, I thought it was more natural to follow the BLAS ordering but we fairly consistent about having output arguments first which is also the case for the current three-argument A_mul_B!. Furthermore and as you have already pointed out, the three-argument version would correspond to the five-argument version with α=1 and β=0 and default value arguments are last. Of course, we don't necessarily have to use the default value syntax for this but it would make sense to use it here.

Sounds good. We can continue the discussion about the actual argument ordering and method renaming in #160. This is more about just having the five-argument version available, so I'll keep the current Ax_mul_Bx!(α,A,B,β,C) interface.

Why not just introduce a generic gemm function?

Are you suggesting renaming _generic_matmatmul! to gemm! in addition to the changes above?

@jebej
Copy link
Contributor Author

jebej commented Oct 3, 2017

To be clearer, I do think we should end up having a single method mul(C,A,B,α=1,β=0), along with lazy transpose/adjoint types to dispatch on, but that will be an other PR.

@andreasnoack
Copy link
Member

Why not just introduce a generic gemm function?

I think gemm is a misnomer in Julia. In BLAS, the ge part indicates that the matrices are general, i.e. have no special structure, the first m is multiply and the list m is matrix. In Julia, the ge (general) part is encoded in the signature as is the last m (matrix) so I think we should just call it mul!.

@dlfivefifty
Copy link
Contributor

Is the notation mul!(α, A, B, β, C) from SparseArrays.jl

https://github.com/JuliaLang/julia/blob/160a46704fd1b349b5425f104a4ac8b323ea85af/stdlib/SparseArrays/src/linalg.jl#L32

"finalised" as the official syntax? And this would be in addition to mul!(C, A, B), lmul!(A, B), and rmul!(A,B)?

I'm not too big a fan of having A, B, and C in different orders.

@andreasnoack
Copy link
Member

andreasnoack commented Feb 14, 2018

Is the notation mul!(α, A, B, β, C) from SparseArrays.jl "finalised" as the official syntax?

I'd say no. Originally, I liked the idea of following BLAS (and the order also matched how this is usually written mathematically) but now I think it makes sense to just add the scaling arguments as optional fourth and fifth arguments.

@dlfivefifty
Copy link
Contributor

So just to clarify, you'd like optional arguments in the sense

function mul!(C, A, B, α=1, β=0)
 ...
end

The other option would be optional keyword arguments

function mul!(C, A, B; α=1, β=0)
...
end

but I'm not sure people are too happy with unicode.

@andreasnoack
Copy link
Member

I'm very happy with unicode but it is true that we try always have an ascii option and it would be possible here. The names α and β are also not super intuitive unless you know BLAS so I think using positional arguments the better solution here.

@perrutquist
Copy link
Contributor

In my opinion, a more logical nomenclature would be to let muladd!(A, B, C, α=1, β=1) map to the various BLAS routines that do multiplication and addition. (gemm as above, but also e.g. axpy when A is a scalar.)

The mul! function could then have an interface like mul!(Y, A, B, ...) taking an arbitrary number of arguments (just like * does) as long as all intermediate results can be stored in Y. (An optional kwarg could specify the order of multiplication, with a reasonable default)

mul!(Y, A::AbstractVecOrMat, B:AbstractVecOrMat, α::Number) would have the default implementation muladd!(A, B, Y, α=α, β=0), as would the other permutations of two matrix/vectors and a scalar.

@matthieugomez
Copy link

matthieugomez commented Jul 8, 2018

Another vote to have mul!(C, A, B, α, β) defined for dense matrices. This would allow to write generic code for dense and sparse matrices. I wanted to define such a function in my non linear least squares package but I guess this is type piracy.

@dmbates
Copy link
Member

dmbates commented Oct 3, 2018

I also have been tempted to write mul!(C, A, B, α, β) methods for the MixedModels package and engage in a bit of type piracy, but it would be much better if such methods were in the LinearAlgebra package. Having methods for a muladd! generic for this operation would be fine with me too.

@simonbyrne
Copy link
Contributor

I'm in favour, though I think it should probably have a different name than just mul!. muladd! seems reasonable, but am certainly open to suggestions.

@simonbyrne
Copy link
Contributor

Maybe mulinc! for multiply-increment?

@YingboMa
Copy link

YingboMa commented Oct 3, 2018

Maybe we can have something like addmul!(C, A, B, α=1, β=1)?

@StefanKarpinski
Copy link
Member

Isn’t this a form of muladd!? Or is the idea behind calling it addmul! that it mutates the add argument rather than the multiply argument? Would one ever mutate the multiply argument?

@simonbyrne
Copy link
Contributor

Note that we do mutate non-first elements in some cases, e.g. lmul! and ldiv!, so we could do them in the usual "muladd" order (i.e. muladd!(A,B,C)). The question is what order should α and β go? One option would be to make the keyword arguments?

@tkf
Copy link
Member

tkf commented Oct 4, 2018

Wouldn't it be nice if you leave an option to the implementers to dispatch on the types of the scalars α and β? It's easy to add sugars for end-users.

@andreasnoack
Copy link
Member

I thought we already settled on mul!(C, A, B, α, β) with default values for α, β. We are using this version in https://github.com/JuliaLang/julia/blob/b8ca1a499ff4044b9cb1ba3881d8c6fbb1f3c03b/stdlib/SparseArrays/src/linalg.jl#L32-L50. I think some packages are also using this form but I don't remember which on top of my head.

@tkf
Copy link
Member

tkf commented Oct 4, 2018

Thanks! It would be nice if that's documented.

@simonbyrne
Copy link
Contributor

I thought we already settled on mul!(C, A, B, α, β) with default values for α, β.

SparseArrays uses it, but I don't recall it being discussed anywhere.

@dmbates
Copy link
Member

dmbates commented Oct 4, 2018

In some ways the muladd! name is more natural because it is a multiplication followed by an addition. However, the default values of α and β, muladd!(C, A, B, α=1, β=0) (note that the default for β is zero, not one), turn it back into mul!(C, A, B).

It seems to be a case of pedantry vs consistency whether to call this mul! or muladd! and I think the case of the existing method in SparseArrays would argue for mul!. I find myself in the curious case of arguing for consistency despite my favorite Oscar Wilde quote, "Consistency is the last refuge of the unimaginative."

@simonbyrne
Copy link
Contributor

simonbyrne commented Oct 4, 2018

In some ways the muladd! name is more natural because it is a multiplication followed by an addition. However, the default values of α and β, muladd!(C, A, B, α=1, β=0) (note that the default for β is zero, not one), turn it back into mul!(C, A, B).

There is an interesting exception to this when C contains Infs or NaNs: theoretically, if β==0, the result should still be NaNs. This doesn't happen in practice because BLAS and our sparse matrix code explicitly check for β==0 then replace it with zeros.

@tkf
Copy link
Member

tkf commented Aug 12, 2019

I'd vote for strong zero (:heart:) if it were "Merge for 1.x" instead of "Merge for 1.3". Wouldn't the options make sense if all "Merge for 1.3" are "Merge 1.x"?

@chriscoey
Copy link

Thanks @chethega. @tkf I am in the position of really needing the new mul! ASAP without caring too much about the NaN decision (as long as performance is not compromised).

@tkf
Copy link
Member

tkf commented Aug 12, 2019

Did you check out LazyArrays.jl? FYI, it has really nice fused matrix multiplication support. You can also use BLAS.gemm! etc. safely since they are public methods https://docs.julialang.org/en/latest/stdlib/LinearAlgebra/#LinearAlgebra.BLAS.gemm!

@chriscoey
Copy link

I actually truly need the generic mul! since we use a wide variety of structured and sparse and plain old dense matrices, for representing different optimization problems most efficiently. I'm here for the genericity and speed.

@tkf
Copy link
Member

tkf commented Aug 12, 2019

I see. And I just remembered that we discussed things in LazyArrays.jl so of course you already knew it...

Regarding "ASAP", Julia's four months release cycle is, at least as a design, to avoid "merge rush" just before feature freeze. I know it's not fair for me to say this because I've tried the same thing before... But I think someone needs to mention this as a reminder. The bright side is Julia is super easy to build. You can start using it right after it is merged until next release.

Edit: release -> merge

@chriscoey
Copy link

Thanks. I find deadlines to be helpful motivators, and I would like to avoid letting this go stale again. So I would propose that we try to use the deadline as a goal.

@tkf
Copy link
Member

tkf commented Aug 12, 2019

It's great that you are actively injecting energy to this thread!

@dlfivefifty
Copy link
Contributor

I actually truly need the generic mul! since we use a wide variety of structured and sparse and plain old dense matrices, for representing different optimization problems most efficiently. I'm here for the genericity and speed.

A 5-argument mul! won't work well when you have many different types: you'll need combinatorially many overrides to avoid ambiguity. This is one of the motivations behind LazyArrays.jl MemoryLayout system. It's used for "structured and sparse" matrices precisely for this reason in BandedMatrices.jl and BlockBandedMatrices.jl. (Here even sub views of banded matrices dispatch to banded BLAS routines.)

@chriscoey
Copy link

Thanks, I will try LazyArrays again.

Since the 5-arg mul seems to be generally considered as a temporary stopgap (until some solution like LazyArrays can be used in 2.0), I think we can aim to get it merged without necessarily being the ideal or perfect solution for the long run.

@chethega when do you think we should count the tallies for the new nonbinding vote?

@chethega
Copy link
Contributor

@tkf Sure, you're right that strong/weak/undef zeros make sense for 1.x as well.

However, I think there are quite a few people who would rather have 1.3 mul! now than wait until 1.4 to get 5-arg mul!. If there was no deadline, then I'd wait some more and take some more time for thinking how to conduct a proper poll (at least 10 days for voting). Most importantly, we cannot have a meaningful vote without first presenting and benchmarking competing implementations on the speed and elegance of weak/strong zeros. I personally suspect that weak zeros could be made almost as fast as strong zeros, by first checking iszero(alpha), then scanning the matrix for !isfinite values, and only then using a slow path with extra allocation; but I prefer strong zero semantics anyway.

@chethega when do you think we should count the tallies for the new nonbinding vote?

Triage must make a decision (delay / strong / weak / backstop) this week for the 1.3 alpha. I think Thursday 15th or Wednesday 14th are sensible options for triage to make a count, and take it into consideration. I probably won't be able to join on Thursday, so someone else will have to count.

Realistically, it is OK to be conservative here, and miss the deadline, and continue the discussion and wait for 1.4.

On the other hand, we may already have reached consensus without noticing it: @andreasnoack made some powerful arguments that a zero coefficient should be a strong zero. It may well be that he managed to convince all the weak zero proponents. It may well be that there is a large majority that really wants 5-arg mul!, preferably last year, and does not really care about this little detail. If that was the case, then it would be a pity to further delay the feature, just because no one wants to shut up the discussion.

@dlfivefifty
Copy link
Contributor

Why not just throw an error for now:

β == 0.0 && any(isnan,C) && throw(ArgumentError("use β = false"))

@chethega
Copy link
Contributor

Why not just throw an error for now

I added that option to the poll. Great compromise idea!

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Aug 12, 2019

Just to set expectations: the feature freeze for 1.3 is in three days so there’s basically no way this can make it in time. We’re being pretty strict about feature freezes and branching since it’s the only part of the release cycle that we can really control the timing of.

@andreasnoack
Copy link
Member

The work is already done in JuliaLang/julia#29634 though. Just needs to be adjusted and rebased.

@chriscoey
Copy link

@tkf for JuliaLang/julia#29634 could you list the work that remains to be done (incl renaming and handling zeros according to the vote)? I know you're busy so perhaps we can find a way to split any remaining todos so the burden won't fall on you again.

@tkf
Copy link
Member

tkf commented Aug 13, 2019

The TODOs I can think of ATM are:

My PR implements BLAS semantics of the β = 0 handling. So other handling, like throwing an error, also has to be implemented.

@tkf
Copy link
Member

tkf commented Aug 13, 2019

My PR implements BLAS semantics of the β = 0 handling.

Sorry, my memory was stale; my implementation was not consistent and propagates NaN sometimes. So additional TODO is to make the behavior of β = 0.0 consistent.

@andreasnoack
Copy link
Member

The MulAddMul type would just be for internal use, right?

@tkf
Copy link
Member

tkf commented Aug 13, 2019

Yes, it's completely an internal detail. My worries were (1) there may be too many specializations (beta=0 etc. is encoded in the type parameter) and (2) it decreases the readability of the source code.

@andreasnoack
Copy link
Member

Those are valid concerns. We already produce a ton of specializations in the linear algebra code so it's good to think through if we really need to specialize here. My thinking has generally been that we should optimize for small matrices since it not free (as you said, it complicates the source code and could increase compile times) and people are better off using StaticArrays for small matrix multiplications. Hence, I lean towards just checking the values at runtime but we can always adjust this later if we change our mind so we shouldn't let this cause any delays.

@dlfivefifty
Copy link
Contributor

FYI soft zeros do have simple implementations:

if iszero(β) && β !== false && !iszero(α)
   lmul!(zero(T),y) # this handles soft zeros correctly
   BLAS.gemv!(α, A, x, one(T), y) # preserves soft zeros
elseif iszero(α) && iszero(β)
   BLAS.gemv!(one(T), A, x, one(T), y) # puts NaNs in the correct place
   lmul!(zero(T), y) # everything not NaN should be zero
elseif iszero(α) && !iszero(β)
   BLAS.gemv!(one(T), A, x, β, y) # puts NaNs in the correct place
   BLAS.gemv!(-one(T), A, x, one(T), y) # subtracts out non-NaN changes
end

@tkf
Copy link
Member

tkf commented Aug 13, 2019

@andreasnoack Sorry, I forgot that we actually needed the specialization to optimize the inner-most loop for some structured matrices like mul!(C, A::BiTriSym, B, α, β) JuliaLang/julia#29634 (comment). Some specializations can be removed but that's actually more work (so delays).

@chriscoey
Copy link

Hence, I lean towards just checking the values at runtime but we can always adjust this later if we change our mind so we shouldn't let this cause any delays.

Great!

@andreasnoack
Copy link
Member

Fixed by JuliaLang/julia#29634

@tkf
Copy link
Member

tkf commented Aug 14, 2019

@andreasnoack Thanks a lot for reviewing and merging this in time!

Now that it is merged for 1.3 it started makes me very nervous about the implementation 😄. I appreciate if people here can test their linear algebra code more thoroughly when 1.3-rc comes out!

@KristofferC
Copy link
Member

No need to worry, there will be plenty of time for 1.3 RCs + PkgEvals to shake out bugs.

@KristofferC KristofferC transferred this issue from JuliaLang/julia Nov 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests