Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-deterministic mul! of ComplexF32 arrays under Apple Silicon #1051

Closed
pablosanjose opened this issue Jan 25, 2024 · 25 comments · Fixed by JuliaLang/julia#53074
Closed

Non-deterministic mul! of ComplexF32 arrays under Apple Silicon #1051

pablosanjose opened this issue Jan 25, 2024 · 25 comments · Fixed by JuliaLang/julia#53074
Labels
heisenbug This bug occurs unpredictably regression 1.10 Regression in the 1.10 release system:apple silicon Affects Apple Silicon only (Darwin/ARM64) - e.g. M1 and other M-series chips

Comments

@pablosanjose
Copy link
Contributor

mul! of ComplexF32 arrays is subtly broken under Apple Silicon in Julia 1.9 and 1.10

Here is a MWE

using LinearAlgebra

function test()
    C = ComplexF32
    mat = zeros(C, 1, 1)
    for _ in 1:100
        v = [C(1-0.2im) C(2+0.3im)]
        mul!(mat, v, v', C(1+im), 1)
    end
    return mat
end

allequal(test() for _ in 1:10000)

This should give always true, but it randomly gives false in the version/OSs above. The deviations of "bad" runs of test() (a small percentage of the 10000) is small, but well within the precision of ComplexF32.

This was tested on a Macbook Pro M1 (fails) and a Xeon server under Linux (does not fail). Can someone reproduce?

Notably, current master does not have this problem on macos. I'll try to bisect to see when it was fixed, in case we can backport the patch.

@ViralBShah ViralBShah added the heisenbug This bug occurs unpredictably label Jan 25, 2024
@ViralBShah
Copy link
Member

ViralBShah commented Jan 25, 2024

Can you try it with AppleAccelerate.jl and also 1.11-dev (which uses a new openblas)?

@pablosanjose
Copy link
Contributor Author

pablosanjose commented Jan 25, 2024

OpenBLAS on 1.11-dev, AppleAccelerate on 1.9.4 and AppleAccelerate on 1.10.0 all work fine.

So no need to bisect then, the fix should be JuliaLang/julia#52762, right? Could that be backported to 1.10?

EDIT: not so sure... I just tried and it still fails on JuliaLang/julia#52762. Trying JuliaLang/julia#52928 now.

@giordano
Copy link
Contributor

Does it work if you start julia with

OPENBLAS_CORETYPE=NEOVERSEN1 julia

?

@pablosanjose
Copy link
Contributor Author

pablosanjose commented Jan 25, 2024

@giordano, on what version? On 1.10 the OPENBLAS_CORETYPE=NEOVERSEN1 doesn't fix it, it still fails

On JuliaLang/julia#52928 there is no need to set OPENBLAS_CORETYPE, it works perfectly out of the box

@giordano giordano added the system:apple silicon Affects Apple Silicon only (Darwin/ARM64) - e.g. M1 and other M-series chips label Jan 25, 2024
@giordano
Copy link
Contributor

on what version? On 1.10 the OPENBLAS_CORETYPE=NEOVERSEN1 doesn't fix it, it still fails

On 1.10.0, because it ships OpenBLAS v0.3.23 which still falls back on the generic armv8 kernels on Apple Silicon, but in v0.3.26 (what's used now in master, or 1.11-DEV) it automatically uses the neoversen1 kernels, which are much faster (so that setting OPENBLAS_CORETYPE=NEOVERSEN1 on julia v1.10 should usually be a bit faster than the default behaviour). In Julia v1.11 you don't need to set it because you already get that (as you can see with OPENBLAS_VERBOSE=2).

On JuliaLang/julia#52928 there is no need to set OPENBLAS_CORETYPE, it works perfectly out of the box

I'm a bit confused, the build in JuliaLang/julia#52928 shouldn't theoretically affect Apple Silicon, it doesn't have SVE, unless the problem was in the specific version of the compiler used.

@pablosanjose
Copy link
Contributor Author

pablosanjose commented Jan 25, 2024

On 1.10.0, because it ships OpenBLAS v0.3.23 which still falls back on the generic armv8 kernels on Apple Silicon, but in v0.3.26 (what's used now in master, or 1.11-DEV) it automatically uses the neoversen1 kernels, which are much faster (so that setting OPENBLAS_CORETYPE=NEOVERSEN1 on julia v1.10 should usually be a bit faster than the default behaviour). In Julia v1.11 you don't need to set it because you already get that (as you can see with OPENBLAS_VERBOSE=2).

Ok, so your hunch is that setting the OPENBLAS_CORETYPE env var should be enough in 1.10? I can confirm that it is not, it still fails.

I'm a bit confused, the build in JuliaLang/julia#52928 shouldn't theoretically affect Apple Silicon, it doesn't have SVE, unless the problem was in the specific version of the compiler used.

Me too. So I've tried JuliaLang/julia#52762 again, which failed for me before, but now it works (!). I might have forgotten to do a make cleanall the first time.

So at this point my understanding is that something in OpenBLAS v0.3.26 fixed the issue, but that it was still present with OpenBLAS v0.3.23, even with the OPENBLAS_CORETYPE=NEOVERSEN1 envvar.

@giordano
Copy link
Contributor

Ok, so your hunch is that setting the OPENBLAS_CORETYPE env var should be enough in 1.10?

Not really, I was wondering if you were hitting a bug in the generic armv8 kernels, and switching to the neoversen1 ones would work around it, but apparently it isn't the case. Using OPENBLAS_CORETYPE=NEOVERSEN1 is still useful on 1.10 to get better speed, regardless of whether it fixes your bug.

So I've tried JuliaLang/julia#52762 again, which failed for me before, but now it works (!). I might have forgotten to do a make cleanall the first time.

That's good, switching to neoversen1 kernels wasn't the only change between v0.3.23 and v0.3.26, the diff is pretty large, mine was a low-effort attempt to guess a possible solution, but perhaps there was some other bug somewhere else.

@ViralBShah
Copy link
Member

@giordano will the openblas bump that is making its way through Yggdrasil fix the perf issue you mention here - even on 1.10?

@pablosanjose
Copy link
Contributor Author

pablosanjose commented Jan 25, 2024

So I managed to bisect this (took a while!) and it turns out the fix was JuliaLang/julia#51660.

EDIT: FWIW the corresponding Julia version for the fix is 1.11.0-DEV.632

@ViralBShah
Copy link
Member

Perhaps it is this? OpenMathLib/OpenBLAS#4003

@gbaraldi
Copy link
Member

That sounds very suspicious

@giordano
Copy link
Contributor

@pablosanjose if you have a fortran compiler on your M1 (I don't) would you be able to bisect openblas between 0.3.23 and 0.3.24? You don't need to touch Julia, can use 1.10, you only need to use BLAS.lbt_forward to run blas functions using the just built library, building openblas takes relatively little.

@giordano
Copy link
Contributor

@pablosanjose A script like this (untested!) should help for bisecting:

#!/bin/bash

make clean || true
make DYNAMIC_ARCH=1 LIBPREFIX=libopenblas64_ INTERFACE64=1 SYMBOLSUFFIX=64_ -j || exit 125

julia --startup-file=no -e '
using LinearAlgebra
BLAS.lbt_forward(realpath("./libopenblas64_.dylib"); clear=true)
function test()
    C = ComplexF32
    mat = zeros(C, 1, 1)
    for _ in 1:100
        v = [C(1-0.2im) C(2+0.3im)]
        mul!(mat, v, adjoint(v), C(1+im), 1)
    end
    return mat
end
exit(Int(!allequal(test() for _ in 1:100_000)))
'

You can run it with

git bisect start
git bisect bad v0.3.23
git bisect good v0.3.24
git bisect run ./script.sh

Side note, besides not having a Fortran compiler at hand, I can't even reproduce the issue on my MacBook Pro (13-inch, M1, 2020), macOS 12.6:

julia> using LinearAlgebra

julia> function test()
           C = ComplexF32
           mat = zeros(C, 1, 1)
           for _ in 1:100
               v = [C(1-0.2im) C(2+0.3im)]
               mul!(mat, v, v', C(1+im), 1)
           end
           return mat
       end
test (generic function with 1 method)

julia> allequal(test() for _ in 1:1_000_000)
true

julia> versioninfo()
Julia Version 1.10.0
Commit 3120989f39b (2023-12-25 18:01 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: macOS (arm64-apple-darwin22.4.0)
  CPU: 8 × Apple M1
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, apple-m1)
  Threads: 1 on 4 virtual cores

Same in Julia v1.9.4.

@ViralBShah
Copy link
Member

ViralBShah commented Jan 26, 2024

FWIW, OpenBLAS 0.3.24 release notes do say there was a bug in CGEMM for Apple aarch64. Perhaps OpenMathLib/OpenBLAS#4003 can be backported to 0.3.23 and we can test if this issue is still present. The issue is OpenMathLib/OpenBLAS#3995 and seems like is only on M2 and higher.

@pablosanjose
Copy link
Contributor Author

@giordano, thanks for the pointers, I'll try.
It's a bit unsettling that you cannot reproduce on your machine, but your CPU is actually different. Mine's an M1 Max (macos 14.2.1)
@ViralBShah , I've no idea, but OpenMathLib/OpenBLAS#4003 includes M1 in the title, so it might still be the fix, let's see

@pablosanjose
Copy link
Contributor Author

I'm having trouble compiling. It must be some stupid path or flag issue, but I'm stuck, so I'd appreciate some advice from anyone with some make experience. The invocation suggesed by @giordano

make clean
make DYNAMIC_ARCH=1 LIBPREFIX=libopenblas64_ INTERFACE64=1 SYMBOLSUFFIX=64_ -j

fails for me with these last lines

ranlib ../../../libopenblas64_p-r0.3.23.a
./gensymbol objconv arm64 _ 0 0  0 0 0 0 "" "64_" 1 0 1 1 1 1 > objconv.def
./gensymbol osx arm64 _ 0 0  0 0 0 0 "" "64_" 1 0 1 1 1 1 > osx.def
objconv @objconv.def ../libopenblas64_p-r0.3.23.a ../libopenblas64_p-r0.3.23.a.osx.renamed
make[1]: objconv: No such file or directory
make[1]: *** [../libopenblas64_p-r0.3.23.a.osx.renamed] Error 1
make: *** [shared] Error 2

@pablosanjose
Copy link
Contributor Author

Gah, it was staring me in the face, objconv was not installed. Sorry for the noise.

@giordano
Copy link
Contributor

Perhaps OpenMathLib/OpenBLAS#4003 can be backported to 0.3.23 and we can test if this issue is still present

That does look like a good candidate, but I believe having a successful bisect to directly identify the fix would be better than guessing, as well educated the guess is 🙂

@pablosanjose
Copy link
Contributor Author

To save some time I focused directly on the two subsequent commits 51dd1339e and efcf71255 (the actual OpenMathLib/OpenBLAS#4003), and I can confirm that in v1.10, the first one fails and the second one doesn't.

@giordano
Copy link
Contributor

Awesome, thanks for confirming that's it! Now we can backport that single patch.

@ViralBShah
Copy link
Member

@pablosanjose Will you be able to make a PR in Yggdrasil for OpenBLAS and OpenBLAS32 0.3.23?

@pablosanjose
Copy link
Contributor Author

I'd love to try! Great learning experience for me, never dipped my toes into the Tree of Life itself :-D.

I'll do my best, and I'll ask for help if required.

@ViralBShah ViralBShah added the regression 1.10 Regression in the 1.10 release label Jan 26, 2024
@giordano
Copy link
Contributor

In this case all you have to do is to place https://github.com/OpenMathLib/OpenBLAS/commit/efcf71255aaf0c636fc76e5235dde8bb9cf76469.patch in the O/OpenBLAS/[email protected]/bundled/patches and O/OpenBLAS/[email protected]/bundled/patches directories.

@pablosanjose
Copy link
Contributor Author

That simple? Ok, done!

pablosanjose referenced this issue in pablosanjose/julia Jan 27, 2024
pablosanjose referenced this issue in pablosanjose/julia Jan 27, 2024
pablosanjose referenced this issue in pablosanjose/julia Jan 27, 2024
pablosanjose referenced this issue in pablosanjose/julia Jan 27, 2024
KristofferC referenced this issue in JuliaLang/julia Jan 29, 2024
giordano referenced this issue in JuliaLang/julia Jan 30, 2024
Tests issue #53054. The same test is included in backport #53074.
@giordano giordano linked a pull request Jan 30, 2024 that will close this issue
@giordano
Copy link
Contributor

Fixed by JuliaLang/julia#53074.

KristofferC referenced this issue in JuliaLang/julia Jan 30, 2024
Drvi referenced this issue in RelationalAI/julia Jun 7, 2024
@KristofferC KristofferC transferred this issue from JuliaLang/julia Nov 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
heisenbug This bug occurs unpredictably regression 1.10 Regression in the 1.10 release system:apple silicon Affects Apple Silicon only (Darwin/ARM64) - e.g. M1 and other M-series chips
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants