Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On Musl systems, can't ccall into a library with the soname #36458

Closed
giordano opened this issue Jun 27, 2020 · 15 comments · Fixed by #37123
Closed

On Musl systems, can't ccall into a library with the soname #36458

giordano opened this issue Jun 27, 2020 · 15 comments · Fixed by #37123
Assignees
Labels
compiler:musl Support for musl linked binaries on linux instead of glibc needs decision A decision on this change is needed
Milestone

Comments

@giordano
Copy link
Contributor

giordano commented Jun 27, 2020

How to reproduce: in a Musl system (I used using BinaryBuilderBase; BinaryBuilderBase.runshell(Linux(:x86_64, libc = :musl);verbose=true)) start Julia and do

julia> using Pkg, Libdl

julia> Pkg.add("Cuba_jll")
   Updating registry at `~/.julia/registries/General`
  Resolving package versions...
No Changes to `~/.julia/environments/v1.5/Project.toml`
No Changes to `~/.julia/environments/v1.5/Manifest.toml`

julia> using Cuba_jll

julia> filter(lib -> occursin("libcuba", lib), dllist())
1-element Array{String,1}:
 "/root/.julia/artifacts/cb5872bf2d53927c3a13ed537915a8d63de707a7/lib/libcuba.so"

julia> libcuba
"libcuba.so"

julia> ccall((:cubacores, libcuba), Ptr{Cvoid}, (Cint, Cint), 0, 1000)
ERROR: could not load library "libcuba.so"
Error loading shared library libcuba.so: No such file or directory
Stacktrace:
 [1] top-level scope at ./REPL[6]:1

julia> const lib = Cuba_jll.libcuba_path
"/root/.julia/artifacts/cb5872bf2d53927c3a13ed537915a8d63de707a7/lib/libcuba.so"

julia> ccall((:cubacores, lib), Ptr{Cvoid}, (Cint, Cint), 0, 1000)
Ptr{Nothing} @0x00007fb842c77920

The function can be ccalled if the library is given with the full path, but not if only the basename is passed, even if the library is already loaded, as shown by its presence in dllist. JLL packages use the basename of the library because they can't set a const with the full path, for relocatability.

strace shows that libcuba.so is searched everywhere, but not where Julia should know where it is:

open("/usr/local/lib64/libcuba.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/usr/local/lib/libcuba.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/usr/local/lib/libcuba.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/usr/lib/libcuba.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/lib64/libcuba.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/lib/libcuba.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/opt/x86_64-linux-musl/x86_64-linux-musl/lib64/libcuba.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/opt/x86_64-linux-musl/x86_64-linux-musl/lib/libcuba.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/opt/x86_64-linux-gnu/x86_64-linux-gnu/lib64/libcuba.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/opt/x86_64-linux-gnu/x86_64-linux-gnu/lib/libcuba.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/opt/x86_64-linux-musl/x86_64-linux-musl/lib64/libcuba.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/opt/x86_64-linux-musl/x86_64-linux-musl/lib/libcuba.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/workspace/destdir/lib64/libcuba.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/workspace/destdir/lib/libcuba.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/workspace/bin/../lib/libcuba.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/workspace/bin/../lib/julia/libcuba.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/lib/libcuba.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/usr/local/lib/libcuba.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/usr/lib/libcuba.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/usr/local/lib64/libcuba.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/usr/local/lib/libcuba.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/usr/local/lib/libcuba.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/usr/lib/libcuba.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/lib64/libcuba.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/lib/libcuba.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/opt/x86_64-linux-musl/x86_64-linux-musl/lib64/libcuba.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/opt/x86_64-linux-musl/x86_64-linux-musl/lib/libcuba.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/opt/x86_64-linux-gnu/x86_64-linux-gnu/lib64/libcuba.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/opt/x86_64-linux-gnu/x86_64-linux-gnu/lib/libcuba.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/opt/x86_64-linux-musl/x86_64-linux-musl/lib64/libcuba.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/opt/x86_64-linux-musl/x86_64-linux-musl/lib/libcuba.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/workspace/destdir/lib64/libcuba.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/workspace/destdir/lib/libcuba.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/workspace/bin/../lib/libcuba.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/workspace/bin/../lib/julia/libcuba.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/lib/libcuba.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/usr/local/lib/libcuba.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/usr/lib/libcuba.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)

First reported in JuliaStats/Rmath.jl#70.

@ViralBShah ViralBShah added the compiler:musl Support for musl linked binaries on linux instead of glibc label Jun 27, 2020
@ViralBShah
Copy link
Member

I installed an Alpine VM and tried out the musl binaries. Only pure Julia packages that have no dependency on a ccall'ed library work. I hope it is not too difficult to fix in time for the 1.5 release.

@staticfloat staticfloat added the needs decision A decision on this change is needed label Jun 28, 2020
@staticfloat
Copy link
Member

I spent some time digging into this, and it appears that dlopen(soname) of an already-loaded library just doesn't work on musl:

julia> using Cuba_jll
       dlopen(Cuba_jll.libcuba)
ERROR: could not load library "libcuba.so"
Error loading shared library libcuba.so: No such file or directory
Stacktrace:
 [1] dlopen(::String, ::UInt32; throw_error::Bool) at /buildworker/worker/package_musl64/build/usr/share/julia/stdlib/v1.6/Libdl/src/Libdl.jl:114
 [2] dlopen at /buildworker/worker/package_musl64/build/usr/share/julia/stdlib/v1.6/Libdl/src/Libdl.jl:114 [inlined] (repeats 2 times)
 [3] top-level scope at REPL[8]:2

The reason we do this is because ccall() requires a constant expression (e.g. we cannot use paths to libraries computed at __init__() time), and we wanted to provide an API where users did not need to manually dlsym() everything they do in a JLL package.

Possible fixes:

  • Work around this lack in the musl API by performing our own SONAME caching. Probably by eagerly filling libMap after a dlopen().
  • Alter ccall() to perform runtime dlsym() caching, a la @runtime_ccall from LLVM.jl. We can auto-detect the user using a non-constant library or symbol name, then emit equivalent code as the @runtime_ccall macro would create.
  • Change JLL packages to require users to change their ccall((func_name, libfoo), ...) to something like ccall(libfoo(func_name), ....). This way, we can perform the function pointer caching ourselves. This has the minor added benefit of allowing us to lazily-initialize libraries in JLL packages, as hinted at in this comment.

In my opinion, none of these are things we can put into 1.5. And (3) may even be off the table for 1.6, as it would be a breaking change for all packages, everywhere. (Even if we provided a compatibility fallback, it means that all packages that don't do the new way won't work on musl). I think (2) is my favorite solution, as the const requirement on ccall() is something we should get rid of.

@staticfloat staticfloat added the triage This should be discussed on a triage call label Jun 29, 2020
@staticfloat staticfloat removed the triage This should be discussed on a triage call label Jul 16, 2020
@giordano giordano changed the title On Musl systems, can't ccall when the library is given with its basename On Musl systems, can't ccall into a library with the soname Jul 28, 2020
@staticfloat
Copy link
Member

@vtjnash and I discussed this a little while back, and we decided the following would work:

Change ccall() to allow for non-const library names. In detail, a ccall() with a non-const library path will, upon first call, dlopen() the library path, then dlsym() the given symbol name, and cache the function pointer. The resultant function pointer will be stored and used in future ccall() invocations. I think we should just not handle invalidation and say "if you want to be able to hot-swap your ccall() library names, you need to manage the dlsym() calls yourself". An example where someone might want to do this is in swapping out BLAS libraries, but I don't think we should hold this syntax up for something that specific.

@StefanKarpinski
Copy link
Member

One concern when we were discussing this was that currently if someone tries to do something dynamic with the name it will fail with a clear error indicating that you can't do that. The concern with allowing non-const library names was that someone might do that and not realize that it won't invoke a different function name/library when the value changes. The safest thing we could do would be to evaluate the string every time and check that it hasn't changed, but then again that might be too slow. If it's just a global variable lookup, it might be ok?

@staticfloat
Copy link
Member

While the underlying technology is now available for this to work on musl, we are going to need to alter BinaryBuilder to generate JLLs that make use of it.

@jpsamaroo
Copy link
Member

I'd like to see this functionality become available for JLLs; what's needed for this to work, other than making the library path non-const? I tried testing on Linux Alpine (musl) with MiniFB by making MiniFB_jll.libminifb a Ref{String}, and then doing:

julia> ccall((:mfb_open, MiniFB.libminifb[]), Ptr{MiniFB.mfb_window}, (Cstring, UInt32, UInt32), "lol", 120, 120)
ERROR: could not load library "libminifb.so"
Error loading shared library libminifb.so: No such file or directory

I'm using Julia built from #37383, and have MiniFB_jll and MiniFB dev'd.

@giordano
Copy link
Contributor Author

@jpsamaroo what does MiniFB.libminifb[] return? If it's the soname, I guess this is the problem with calling libraries by soname reported in #36458 (comment). What happens if you call into MiniFB.libminifb_path?

@jpsamaroo
Copy link
Member

@giordano that worked! So I guess now packages which use JLLs should (at least for Musl) now instead pass the path to ccall?

@giordano
Copy link
Contributor Author

I assume that's @staticfloat's plan for the next iteration of the JLL wrappers

@ecsx1
Copy link

ecsx1 commented Sep 14, 2020

Would it help to get a run of PkgEval (or similar) on musl and open a meta-issue to track those?
Base Julia works great on musl but the packages ecosystem has problems reported across repos and issues (JuliaIO/HDF5.jl#577 (comment), JuliaStats/Rmath.jl#70 (comment), #32636 (comment), etc)
I don't know PkgEval internals so I don't know if it's the best tool for the job, but it looks like it could be helpful at a quick glance.

@KristofferC
Copy link
Member

Someone with a musl machine could run https://github.com/JuliaComputing/NewPkgEval.jl and compare it to a standard linux run. It has a pretty simple interface.

@ecsx1
Copy link

ecsx1 commented Sep 14, 2020

Great that's really straight-forward. I'll do it on the weekend if nobody beats me to it.

@ecsx1
Copy link

ecsx1 commented Sep 20, 2020

I gave a go but some points make me scratch my head.
First here's the steps I attempted today after compiling the latest Julia from master.

cd NewPkgEval.jl/
julia --project -e 'import Pkg; Pkg.instantiate()'
julia --project

using NewPkgEval
julia_version = NewPkgEval.obtain_julia("master")
result = NewPkgEval.run([julia_version], ["HDF5"])

To my surprise the result is a gree ok and given as success.
But on local tests HDF5 fail JuliaIO/HDF5.jl#577 (comment)
I'm guessing PkgEval attemps 'Pkg.add("HDF5")' plus some tests but don't attempt 'using HDF5'?

I also got surprised that running the same NewPkgEval.run for package 'Plots' took 30 minutes.
I'm not sure my machine is beefy enought for a full PkgEval of the entire Julia environment that we need.

@giordano
Copy link
Contributor Author

HDF5_jll doesn't currently support any musl platform

@ecsx1
Copy link

ecsx1 commented Sep 24, 2020

Then shouldn't PkgEval return failure? That's what I'd expect but 'NewPkgEval.run([julia_version], ["HDF5"])' gives success.
In any case that's just one example I bring up that caught my eyes because we want to use PkgEval in the entire ecosystem to see what needs to be done for musl support. But one case we'd expect failure is giving success. Do we need to tweak something in PkgEval maybe? How would we evaluate musl?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:musl Support for musl linked binaries on linux instead of glibc needs decision A decision on this change is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants