-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tests failing on Windows #15
Comments
Version 0.9 of the JLL also doesn't seem to work: #13 (comment) |
I can't recall now whether I tested blis_jll v0.9 directly on windows, but v0.8 has worked before. So this is awkward.... The failure on Julia 1.8 is known already - this was a "notorious" bug that required updating GCC 12 itself - Will need to look more closely at the issues in Julia 1.10 and nightly, which are a different type of error, involving the |
I looked at the header files in
indicating that it was configured as expected, using the GFortran complex return convention. I also looked at the disassembled function
which saves the first argument (
which copy This looks as if this function was using the Intel Fortran complex return convention. If that is correct then BLIS would have been miscompiled, either due to of an error in its configuration management or due to an error in our Yggdrasil recipe. |
Looking at this a bit, I think maybe the Windows x64 calling convention may actually be using the Intel-style convention for returning a complex number, so I think GCC is then actually implicitly converting the given code to do the Intel return format when compiling on Windows. Looking at the only "official" calling convention document that I can find: https://learn.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-170, the return values are handled like this:
So it may be that it is treating a complex number as a struct or vector of doubles, so it then uses the pointer return method. Another reference I found was Raymond Chen's blog here. There are two posts that might be relevant. In https://devblogs.microsoft.com/oldnewthing/20040114-00/?p=41053, he says
So since a complex would be greater than 64 bits, then I think it would be the secret first parameter method (aka. what is called Intel here). There is also https://devblogs.microsoft.com/oldnewthing/20101222-00/?p=11943, where he says
Which sounds to me like this will be the calling convention that is used normally. He then further says:
Unfortunately, I can't find any definitive source that talks about this in more detail, and my spelunking in the GCC source code didn't lead to anything understandable about how the Windows ABI looks like here. |
Follow-up, looking at the LLVM source code seems to suggest that Windows uses the indirect return for a complex type (e.g. the hidden argument). Inside the Windows ABI info it has (https://github.com/llvm/llvm-project/blob/46c05dfb6c98870f8416eeb9bf787d54ac806b12/clang/lib/CodeGen/Targets/X86.cpp#L3303)
And for the complex double, the Does this call work through OpenBLAS? I wonder what calling convention it follows on Windows. |
I'm using the following test script: using BLISBLAS, LinearAlgebra
println("Float64:")
a = ones(Float64, 4096) .+ 1im
@show BLAS.dotc(a, a)
println("Float32:")
a = ones(Float32, 4096) .+ 1im
@show BLAS.dotc(a, a)
println("Done!") On my machine, the I've spent almost the whole day on this, digging into the assembly, debugging LBT and am currently very confused. To trace out the sequence of events, first Julia calls So instead, we re-translate back to the fortran names, which shouldn't cause any issue, we should simply be able to wrap a few arguments in references and be on our way. When we call |
Thanks for your efforts all! The call is running through LBT and when calling Stacktrace from the CI on this issue:
On Windows:
This is on macOS (same environment as above):
So there seems to be a difference with the complex autodetection of |
So it appears to me that the BLIS folks also implemented |
The top question I have is why is autodetection different on windows to linux/mac systems. |
My best guess is that it's a confusion between the ABI matching that I'm trying to do in LBT, and the special corners of the Windows x64 ABI that Ian pointed out above.
I wrote the following test code: #include <stdio.h>
#include <complex.h>
__attribute__((noinline)) complex double foo(float x) {
return 1.0 + x*I;
}
int main() {
volatile float x = 1.0;
complex double z = foo(x);
printf("%f + %fi\n", creal(z), cimag(z));
return 0;
} I compiled and disassembled it:
It looks to me like we're both passing on the stack, and in registers, both to |
The fact that LBT works for |
Hmm, yes, we don't appear to be patching BLIS to change any of the CBLAS calls to have a 64 suffix. Our patch labelled "cblas" in Yggdrasil is actually patching the calls that CBLAS is making to normal blas to have the suffix. This would be annoying to do though because all the CBLAS functions are just straight definitions and aren't formed using any macros to build the names.
I get the feeling this is a quirk of the compiler, actually. When actually doing the indirect return method, BLIS has a void return type.
My reading of the Windows x86_64 ABI suggests that complex float won't be an indirect return, because it is 64 bits (the condition in the LLVM compiler for generating this indirect return has the strict greater-than check for 64 bit width). So I think that it should treat it as a normal return instead.
At least for the Linux ABI, it appears it will be returned in registers for both float and double. LLVM implements the classification for complex numbers here: https://github.com/llvm/llvm-project/blob/46c05dfb6c98870f8416eeb9bf787d54ac806b12/clang/lib/CodeGen/Targets/X86.cpp#L1942, and it says it uses SSE style behavior for both float and double element types. The SSE style behavior will use the vector registers to hold the data. (The formal specification of this is in https://refspecs.linuxbase.org/elf/x86_64-abi-0.99.pdf, Section 3.2.3). So it looks like the difference is probably that Windows mixes the two return styles (double uses implicit arguments, float uses registers), whereas Linux uses registers for both. Does LBT do the complex return classification on a per-function basis, or is it done globally for the library a single time? |
Windows x64 automatically forces return values onto the stack if they are larger than 64 bits wide [0]. This causes return values from e.g. `zdotc` to be pushed onto a secret first argument, but not the return values from e.g. `cdotc`. To address this, we add a new complex return style, "Float Normal, Double Argument", to specify that `complex float`-returning functions use the normal return style, whereas `complex double`-returning functions use the argument return style. This should fix JuliaLinearAlgebra/BLISBLAS.jl#15 [0] https://learn.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-170
Windows x64 automatically forces return values onto the stack if they are larger than 64 bits wide [0]. This causes return values from e.g. `zdotc` to be pushed onto a secret first argument, but not the return values from e.g. `cdotc`. To address this, we add a new complex return style, "Float Normal, Double Argument", to specify that `complex float`-returning functions use the normal return style, whereas `complex double`-returning functions use the argument return style. This should fix JuliaLinearAlgebra/BLISBLAS.jl#15 [0] https://learn.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-170
Windows x64 automatically forces return values onto the stack if they are larger than 64 bits wide [0]. This causes return values from e.g. `zdotc` to be pushed onto a secret first argument, but not the return values from e.g. `cdotc`. To address this, we add a new complex return style, "Float Normal, Double Argument", to specify that `complex float`-returning functions use the normal return style, whereas `complex double`-returning functions use the argument return style. This should fix JuliaLinearAlgebra/BLISBLAS.jl#15 [0] https://learn.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-170
Windows x64 automatically forces return values onto the stack if they are larger than 64 bits wide [0]. This causes return values from e.g. `zdotc` to be pushed onto a secret first argument, but not the return values from e.g. `cdotc`. To address this, we add a new complex return style, "Float Normal, Double Argument", to specify that `complex float`-returning functions use the normal return style, whereas `complex double`-returning functions use the argument return style. This should fix JuliaLinearAlgebra/BLISBLAS.jl#15 [0] https://learn.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-170
Windows x64 automatically forces return values onto the stack if they are larger than 64 bits wide [0]. This causes return values from e.g. `zdotc` to be pushed onto a secret first argument, but not the return values from e.g. `cdotc`. To address this, we add a new complex return style, "Float Normal, Double Argument", to specify that `complex float`-returning functions use the normal return style, whereas `complex double`-returning functions use the argument return style. This should fix JuliaLinearAlgebra/BLISBLAS.jl#15 [0] https://learn.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-170
Alright, I've got a PR up to LBT that solves this by adding a new complex return style, one that will create adapters for only This should all appear in Julia v1.12, and is potentially backportable as there's few changes to Julia itself, although I did touch quite a bit of LBT, so we'll have to wait and see if those changes are all good. |
This release fixes issues with complex valued returns from functions such as `cdotc` on Windows x64. See this discussion [0] for initial diagnosis, and this PR [1] for the relevant fixes. [0] JuliaLinearAlgebra/BLISBLAS.jl#15 [1] JuliaLinearAlgebra/libblastrampoline#129
Once JuliaLang/julia#54791 is merged, julia nightly should "just work" here. |
This release fixes issues with complex valued returns from functions such as `cdotc` on Windows x64. See this discussion [0] for initial diagnosis, and this PR [1] for the relevant fixes. [0] JuliaLinearAlgebra/BLISBLAS.jl#15 [1] JuliaLinearAlgebra/libblastrampoline#129
This release fixes issues with complex valued returns from functions such as `cdotc` on Windows x64. See this discussion [0] for initial diagnosis, and this PR [1] for the relevant fixes. [0] JuliaLinearAlgebra/BLISBLAS.jl#15 [1] JuliaLinearAlgebra/libblastrampoline#129
Just to close the loop, I re-ran the tests on |
This release fixes issues with complex valued returns from functions such as `cdotc` on Windows x64. See this discussion [0] for initial diagnosis, and this PR [1] for the relevant fixes. [0] JuliaLinearAlgebra/BLISBLAS.jl#15 [1] JuliaLinearAlgebra/libblastrampoline#129 (cherry picked from commit 3054c00)
This release fixes issues with complex valued returns from functions such as `cdotc` on Windows x64. See this discussion [0] for initial diagnosis, and this PR [1] for the relevant fixes. [0] JuliaLinearAlgebra/BLISBLAS.jl#15 [1] JuliaLinearAlgebra/libblastrampoline#129 (cherry picked from commit 3054c00)
See https://github.com/JuliaLinearAlgebra/BLISBLAS.jl/actions/runs/9282233656
The text was updated successfully, but these errors were encountered: