Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Libblastrampoline soversion breaks ecosystem on Windows #47638

Closed
andreasvarga opened this issue Nov 19, 2022 · 26 comments
Closed

Libblastrampoline soversion breaks ecosystem on Windows #47638

andreasvarga opened this issue Nov 19, 2022 · 26 comments
Assignees
Labels
bug Indicates an unexpected problem or unintended behavior regression Regression in behavior compared to a previous version system:windows Affects only Windows
Milestone

Comments

@andreasvarga
Copy link
Contributor

I am raising this issue following the invitation of ararslan to report any issue related to the recently released Julia 1.8.3. I must say, this is not the first time I am reporting this issue and I am not even sure if here is the right place for my attemp to get support for my problem.

My current project is the development of PeriodicSystems, a collection of computational tools for handling linear dynamical systems with periodic coefficient matrices. In this development I am relying on efficient structure preserving computational tools to compute periodic Schur decompositions, which are available in the Fortran library SLICOT (a pure Julia project has been also started here).

The problem is related to the use of Fortran libraries under Windows with Julia starting with version 1.8. The issue is simple: the library SLICOT_jll for Windows created with the BinaryBuilder.jl does not work for Julia 1.8 and Julia nightly, but it still works for Julia 1.7. The good news is that the libraries generated for Linux work for all Julia versions. Unfortunately, I don't have the necessary competence to debug this error. For the generation of the libraries, I benefitted of the generous help of several people (RalphAS, mkitti, giordano), but the error still persists (see bellow) for the recently updated version of the library (2 days ago).

Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x0 -- unknown function (ip: 0000000000000000)
in expression starting at D:\a\PeriodicSystems.jl\PeriodicSystems.jl\test\test_pschur.jl:12
unknown function (ip: 0000000000000000)
Allocations: 110831288 (Pool: 110814968; Big: 16320); GC: 48
ERROR: LoadError: Package PeriodicSystems errored during testing

Here is the link to the last tests.

I am performing my developments using exclusively a Windows machine. Just now, I am constrained to use Julia 1.7 for my work because of this bug. I am facing the unexpected situation for me to be forced to stop developing the PeriodicSystems, just because, probably, some compiler switches are set in a particular way on which I have no control.

I hope very much there is a solution to my problem in the near future. I thank in advance for any help.

Q: I wonder if am I the only user of Julia with such a problem ?

@brenhinkeller brenhinkeller added system:windows Affects only Windows bug Indicates an unexpected problem or unintended behavior labels Nov 19, 2022
@brenhinkeller
Copy link
Contributor

Sounds like a bug that should be fixed, though I'm not entirely sure if the fix will be here or https://github.com/JuliaPackaging/BinaryBuilder.jl

If you think the bug is in Julia itself, one way to move forward might be to try to bisect which commit caused it between 1.7 and 1.8

@brenhinkeller brenhinkeller added the regression Regression in behavior compared to a previous version label Nov 20, 2022
@andreasvarga
Copy link
Contributor Author

The problems started with Julia 1.8.0 as current version and when I switched to the newly generated library SLICOT_jll-v5.8
(see #4969 and my final comment). For me it is difficult to guess the cause of this incompatibility (Julia or BinaryBuilder).

See also my remark here.

@brenhinkeller
Copy link
Contributor

brenhinkeller commented Nov 20, 2022

Oh, so was the only thing you changed the Julia version, or did you also change the jll or the version of the underlying software SILCOT?

@andreasvarga
Copy link
Contributor Author

Here is a short history of events.

  1. The Fortran library SLICOT has been updated to version 5.8 and accordingly SLICOT_jll was also updated to v5.8 (with the help of RalphAS). This happened under Julia 1.7 and it worked both locally as well as in CI tests.
  2. Then, Julia 1.8 became the current version and I was not able to register because the SLICOT_jll was not compatible to Julia 1.8 (see #68456 ).
  3. With the help from mkitti and giordano versions of SLICOT_jll have been generated for Julia 1.7, 1.8 and 1.9. This allowed the registration, but the CI tests on Windows with Julia 1.8 failed (the Linux test were OK).
  4. I installed locally Julia 1.8, but any call to a wrapper via SLICOT_jll crashes Julia. This is the present situation.

@KristofferC
Copy link
Member

Is there anyway to minify this. Like is there a specific ccall that fails?

@andreasvarga
Copy link
Contributor Author

andreasvarga commented Nov 20, 2022

The following is a MWE for Julia 1.7 under Windows and probably works for Julia 1.7 and 1.8 on Linux. However, it fails on Julia 1.8.1 under Windows.

using SLICOT_jll
using LinearAlgebra
using LinearAlgebra: BlasInt
using Test

function chkargsok(ret::BlasInt)
    if ret < 0
        throw(ArgumentError("invalid argument #$(-ret) to SLICOT call"))
    end
end

function mb03bd!(job::AbstractChar, defl::AbstractChar,
    compq::AbstractChar, qind::AbstractVector{BlasInt}, k::Integer,
    n::Integer, h::Integer, ilo::Integer, ihi::Integer,
    s::AbstractVector{BlasInt}, a::Array{Float64,3},
    q::Array{Float64,3},
    alphar::AbstractVector{Float64}, alphai::AbstractVector{Float64},
    beta::AbstractVector{Float64}, scal::AbstractVector{BlasInt},
    liwork::Integer, ldwork::Integer)

    lda1 = max(1,stride(a,2))
    lda2 = max(1,stride(a,3)÷lda1)
    ldq1 = max(1,stride(q,2))
    ldq2 = max(1,stride(q,3)÷ldq1)
    info = Ref{BlasInt}()
    iwarn = Ref{BlasInt}()
    iwork = Vector{BlasInt}(undef, liwork)
    dwork = Vector{Float64}(undef, ldwork)

    ccall((:mb03bd_, libslicot), Cvoid, (Ref{UInt8}, Ref{UInt8},
            Ref{UInt8}, Ptr{BlasInt}, Ref{BlasInt}, Ref{BlasInt},
            Ref{BlasInt}, Ref{BlasInt}, Ref{BlasInt}, Ptr{BlasInt},
            Ptr{Float64}, Ref{BlasInt}, Ref{BlasInt}, Ptr{Float64},
            Ref{BlasInt}, Ref{BlasInt}, Ptr{Float64}, Ptr{Float64},
            Ptr{Float64}, Ptr{BlasInt}, Ptr{BlasInt}, Ref{BlasInt},
            Ptr{Float64}, Ref{BlasInt}, Ptr{BlasInt}, Ptr{BlasInt},
            Clong, Clong, Clong), job, defl, compq, qind, k, n, h,
            ilo, ihi, s, a, lda1, lda2, q, ldq1, ldq2, alphar,
            alphai, beta, scal, iwork, liwork, dwork, ldwork, iwarn,
            info, 1, 1, 1)
    chkargsok(info[])

    return info[], iwarn[]
end


# MB03BD example
A1 = Matrix{Float64}(I,3,3); A2 = [   1.0   2.0   0.0; 4.0  -1.0   3.0; 0.0   3.0   1.0]; A3 = Matrix{Float64}(I,3,3); 
E1 =  [2.0   0.0   1.0; 0.0  -2.0  -1.0; 0.0   0.0   3.0]; E2 = Matrix{Float64}(I,3,3); 
E3 = [ 1.0   0.0   1.0; 0.0   4.0  -1.0; 0.0   0.0  -2.0];
ev = eigvals(inv(E1)*A2*inv(E3))
ihess = 2

# using the SLICOT wrapper
A = reshape([E1 A2 E3],3,3,3);
KMAX = 3
NMAX = 3
LDA1 = NMAX
LDA2 = NMAX
LDQ1 = NMAX
LDQ2 = NMAX
LDWORK = KMAX + max( 2*NMAX, 8*KMAX )
LIWORK = 2*KMAX + NMAX
QIND = Array{BlasInt,1}(undef, KMAX)
S = [-1,1,-1]; 
Q = Array{Float64,3}(undef, LDQ1, LDQ2, KMAX)
ALPHAR = Array{Float64,1}(undef, NMAX)
ALPHAI = Array{Float64,1}(undef, NMAX)
BETA = Array{Float64,1}(undef, NMAX)
SCAL = Array{BlasInt,1}(undef, NMAX)
IWORK = Array{BlasInt,1}(undef, LIWORK)
DWORK = Array{Float64,1}(undef, LDWORK)

mb03bd!('T','C','I',QIND,3,3,ihess,1,3,S,A,Q,ALPHAR, ALPHAI, BETA, SCAL, LIWORK, LDWORK)

poles = (ALPHAR+im*ALPHAI) ./ BETA .* (2. .^SCAL)

@test sort(real(poles)) ≈ sort(real(ev)) && 
      sort(imag(poles)) ≈ sort(imag(ev))

@KristofferC
Copy link
Member

Can you edit the post and try to condense it as much as possible? Docstrings for example are probably not needed for it to reproduce?

@andreasvarga
Copy link
Contributor Author

OK.

@KristofferC
Copy link
Member

The next step is probably to reconstruct that call in C. That would determine if it is something to do with the way Julia does the ccall or if it is something purely with how the library is called.

@vchuravy
Copy link
Member

So under wine SLICOT_jll v5.7.0+0 loads fine in 1.8.3, but v5.8.0+2 fails with:

julia> using SLICOT_jll
0230:err:module:import_dll Library libblastrampoline-5-0-2.dll (which is needed by L"C:\\users\\vchuravy\\.julia\\artifacts\\88256e6280b745625392beedf5b22c959707ba3a\\bin\\libslicot.dll") not found
ERROR: InitError: could not load library "C:\users\vchuravy\.julia\artifacts\88256e6280b745625392beedf5b22c959707ba3a\bin\libslicot.dll"
Module not found.

@brenhinkeller
Copy link
Contributor

brenhinkeller commented Nov 20, 2022

Ah, so that would suggest the regression is perhaps not in Julia but in SLICOT_jll?

@vchuravy
Copy link
Member

vchuravy commented Nov 20, 2022

Okay after a manual Libdl.dlopen("Z:\\home\\vchuravy\\builds\\julia-1.8.3-win64\\bin\\libblastrampoline-5-0-2.dll") I can load v5.8 and reproduce the issue.

Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x0 -- unknown function (ip: 0000000000000000)
in expression starting at REPL[35]:1
unknown function (ip: 0000000000000000)

So looking closer. dlllist shows we load libblastrampoline.dll and objdump --private-headers libslicot.dll

5.8

        DLL Name: libblastrampoline-5-0-2.dll
        vma:  Hint/Ord Member-Name Bound-To
        3dc614   6487  dasum_64_
        3dc620   6491  daxpy_64_
        3dc62c   6497  dbdsqr_64_

5.7


        DLL Name: libblastrampoline.dll
        vma:  Hint/Ord Member-Name Bound-To
        3c15a4   6487  dasum_64_
        3c15b0   6491  daxpy_64_
        3c15bc   6497  dbdsqr_64_

On Linux these are symlinked, but on Windows these are separate libraries. My hypothesis is that we initalize one of them correctly, but the second "copy" is pointing to the wrong library. So it's might be binarybuilders fault.

@RalphAS
Copy link

RalphAS commented Nov 20, 2022

maybe JuliaPackaging/Yggdrasil#7 ?

@vchuravy vchuravy changed the title Creation of valid binary wrappers for Fortran libraries under Windows for Julia 1.8 fails Libblastrampoline soversion breaks ecosystem on Windows Nov 20, 2022
@vchuravy vchuravy added this to the 1.9 milestone Nov 20, 2022
@vchuravy
Copy link
Member

@staticfloat can you take a look at this? (Happy to jump on a call).

As far as I can tell there are two issues at play here.

  1. Julia loads libblastrampoline.dll without a so-version.
  2. JLL link against an (perhaps overspecic) so-version 5-0-2

We need functioning dlls on Windows for 1.8 and for 1.9 we should be consistent in whether we use so-version or not.

LBT v4

bin/libblastrampoline.dll
lib/libblastrampoline.dll.a

v5.2.0

bin/libblastrampoline.dll
bin/libblastrampoline-5.dll
bin/libblastrampoline-5.dll.a
bin/libblastrampoline-5-2-0.dll
lib/libblastrampoline-5.dll.a

v5.0.2

bin/libblastrampoline.dll
bin/libblastrampoline-5.dll
bin/libblastrampoline-5-0-2.dll.a
bin/libblastrampoline-5-0-2.dll
lib/libblastrampoline-5-0-2.dll.a

https://github.com/JuliaLang/julia/blob/bba41d41319aa898373784438bd38873eab1da41/stdlib/libblastrampoline_jll/src/libblastrampoline_jll.jl#LL22

x-ref: JuliaLinearAlgebra/libblastrampoline#89

@vtjnash
Copy link
Member

vtjnash commented Nov 20, 2022

maybe JuliaPackaging/Yggdrasil#7

seems equivalent to that, but for a different library

@KristofferC
Copy link
Member

@vchuravy, is there a plan for resolving this?

@staticfloat
Copy link
Member

The plan for resolving this is to use the major-versioned SONAME everywhere. In order for that to work, we may need to rebuild a few JLLs such as SuiteSparse to link against the correct name.

@giordano
Copy link
Contributor

Shouldn't we also use the correctly versioned sonames in

const libblastrampoline = if Sys.iswindows()
"libblastrampoline.dll"
elseif Sys.isapple()
"@rpath/libblastrampoline.dylib"
else
"libblastrampoline.so"
end
? The unversioned name is the cause of JuliaLinearAlgebra/libblastrampoline#89 (comment).

@staticfloat
Copy link
Member

Yes, we should

@giordano
Copy link
Contributor

This should have been fixed by #47676 (but JLLs will need to be built against libblastrampoline 5.4.0)

@KristofferC
Copy link
Member

I think this can be closed here.

@andreasvarga
Copy link
Contributor Author

The following error occured, when performing tests with the nightly version on Windows (Linux is OK, see here) for DescriptorSystems, which uses the MatrixPencils package:

Test_dss
test_dss: Error During Test at D:\a\DescriptorSystems.jl\DescriptorSystems.jl\test\test_dss.jl:10
  Got exception outside of a @test
  could not load library "libblastrampoline"
  The specified module could not be found. 

  Stacktrace:
    [1] larf!(side::Char, v::Vector{Float64}, τ::Float64, C::SubArray{Float64, 2, Matrix{Float64}, Tuple{UnitRange{Int64}, UnitRange{Int64}}, false}, work::Vector{Float64})
      @ MatrixPencils.LapackUtil2 C:\Users\runneradmin\.julia\packages\MatrixPencils\OfR5K\src\lapackutil2.jl:206
    [2] larf!
      @ C:\Users\runneradmin\.julia\packages\MatrixPencils\OfR5K\src\lapackutil2.jl:219 [inlined]

I am using in module LapackUtil2 (withing MatrixPencils) the following setting for ccalls:

const liblapack = VERSION < v"1.7" ? Base.liblapack_name : "libblastrampoline"
The setting is OK for Linux, but apparently not working for Windows (on the nightly test).

@giordano
Copy link
Contributor

but JLLs will need to be built against libblastrampoline 5.4.0

and no one has done that as far as I know.

@andreasvarga
Copy link
Contributor Author

andreasvarga commented Feb 23, 2023

In the above case, both Linux and Windows load in the tests

[8e850b90] libblastrampoline_jll v5.4.0+0 @stdlib/libblastrampoline_jll``

(without complaints). Could it be that the generic name "libblastrampoline" is not properly set for Windows?

@giordano
Copy link
Contributor

It is not, that's the reason why we have Base.libblas_name: #48435.

@andreasvarga
Copy link
Contributor Author

The nightly test involves (probably) Julia 1.10. I wonder if the same failure is to be expected on Julia 1.9 (the next stable version) running under Windows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Indicates an unexpected problem or unintended behavior regression Regression in behavior compared to a previous version system:windows Affects only Windows
Projects
None yet
Development

No branches or pull requests

8 participants