-
Notifications
You must be signed in to change notification settings - Fork 556
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Packaging shared libraries with built py_binary #2562
Comments
Does the above work with |
I doubt bootstrap=script will Just Work, either. The script bootstrap isn't going to do anything special with LD_LIBRARY_PATH or how the linker finds libraries. The only suggestion I have is to use cc_binary to build a shared library. Bazel should populate the resulting shared library with an appropriate rpath to find other shared libraries. The shared library should have DT_NEEDED entries to the other shared libs it needs. Copies of (non-system provided) shared libs should be in Add this shared library to the data runfiles of the py_binary. All the shared libs it depends on should automatically get added to the runfiles. In your py_binary, dlopen that shared library. |
this fixes |
Aspect rules-py https://github.com/aspect-build/rules_py/blob/main/docs/py_binary.md do export the env as is in the bootstrap script, I've verified this in Mac. Currently I have one of my services onboarded to it, and the other is using the native py_binary targets. I'm going to move forward with aspect-rules right now The problem I feel here is that there is no documented process to deterministically link shared libraries in such a way that the linker always searches the application provided paths (the system libraries are overridden). Bazel gives a lot of control in how our targets are built, but it assumes that we are going to build every dependency on earth in the monorepo itself (including the python interpreter, which would then finally allow me to link any library I want by using the RPATH correctly). I believe there might be a standardised way for users to link shared libraries, along with some basic linting automation which works with the rules you mentioned above (DT_NEEDED, etc.). |
Have you considered shipping a docker container with your non-bazel built native binaries? Out of interest, how do you ship a conda environment? Are you zipping them up? I thought conda was an environment manager (a bit like docker, but with less isolation). I didn't personally view it as a build tool like bazel. |
Another thing that comes to mind is considering how this would be solved outside bazel. I don't think there is a PEP Standard that addresses it that could be leveraged. I think it's left as an exercise for downstream distributors, but I could be wrong. My own understanding of the problem space is that options for distributing self-contained python applications vary a lot and there isn't an obvious single best solution? A very common scenario is a container image (via rules_oci) of a webapp that gets deployed, but yes, acknowledge that's probably common, but not the exclusive or best way to distribute a Python application for all scenarios or platforms. There are options like:
It's a broad problem space that would be awesome to solve. But maybe also by downstream rules that consume a py_binary is also an option worth considering. It might not make sense in the core rules,but it also could if there was a contribution that made sense. |
So we already have docker images for our cloud deployments, and it works okay-ish (we have specified all the system dependencies in the base image and simply py_layer everything else, rules-docker) I have two end goals right now:
I also do not know how to package a Conda environment as something we can ship. I'm trying to research this right now. But there seem to be some standard ways python environment is importing wheels
I am hoping that since I control the whole build pipeline with bazel, I would reasonably be able to create a self contained environment. I was mainly planning to use a combination of For now, I'm hoping to create shim macros which would collect all the shared-lib dependencies, put them in I haven't tried the zip solution right now. I am assuming I need to crack the shared library shipping and linking logic before trying the zip package route Hypothetically though, I'm hoping it is possible to simply use the full runfiles directory and ship it to any machine (of the same platform), call it from the correct location and make it work. I do have control over the installation process, I can basically run arbitrary scripts during installation (my employer ships our product in on-prem systems, so we have full control here before it reaches the customer) |
I'm curious about this, too, because AFAIK, linux doesn't provide facilities that make any solution particularly nice. I suspect the de-facto solution is to roll your own linker search path type behavior (pytorch essentially does this). With LD_LIBRARY_PATH, subprocesses are contaminated. Setting it also requires doing some setup in e.g. a shell script before executing the process (the python interpreter). With rpath, you don't want to e.g. modify the system python for a particular binary. You could build your own shared library with your own rpath, but:
Like I mentioned before, you can have Bazel build a shared library, and you can have its rpath set to what you need. You'll still have the problem of (1), though. (2) will be solved if all dynamically loaded libraries your program needs are marked as DT_NEEDED of your shared library. This is because your shared library will eager load e.g. libfoo.so, and then when later python code does With the system configs in If you want to get really creative, then there are some hooks for intercepting the linker at runtime: https://man7.org/linux/man-pages/man7/rtld-audit.7.html. I haven't tried them, so not sure how well they work. In theory, one could e.g. LD_PRELOAD a library and then search a runfiles-relative location for things.
Honestly, this seems like one of the better options. It should be possible to make rules_python do this. All it really requires is having I'd also be OK with |
So, from what I understand, it's best to use rpaths when we can, the toolchain shipped with rules-python does not work for this I think, is there a way around this? From an implementation perspective, I'm not very great at the whole star lark work. But I imagine if I need to do this generically, I would need to provide some other kind of Provider (SharedLibProvider) which can be added to both I'm not that great at starlark right now, but I might write small macro wrappers which will do this thing above (but I'll simply update the env for now, since RPATH option is not available) There is one thing I could not figure out right now though. Let's say I have an external pip dependency, this would in the end generate a Will the |
The two ways to do this are:
In any case, the key part is that the interpreter binary has an rpath entry like Note that (2) isn't special to rules_python. One can define their own toolchain and perform the same manipulation of the binary. This would be a good route to use for prototyping and experimentation. (This is, essentially, what you said you're doing with install_tool earlier; I'm just describing how to more neatly fit it into the build process). For linux, the tool to use is Beyond that, I don't have a well thought out vision or design. A new provider shouldn't be necessary, though; PyInfo and CcInfo should be able to carry the necessary information, and are both collected and propagated. Integration with pypi will be tricky, unreliable, annoying -- there aren't any standards for how package expose their C information. The best generic thing we can probably do is just glob up the C-looking things and stick them in a cc_library and hope for the best.
Only if it strikes somebody's fancy -- this is a volunteer project, after all. I find it interesting, but my next main focus is to make bootstrap=script the default so that don't have like 6 different ways for programs to bootstrap. This is the sort of thing, given a PR that does the basics, I could help shepherd it through, but isn't something I'm likely to start from scratch on my own soon. There are a few others who do more C integration type stuff which might have more interest, though I don't remember their names. They pop up in the #python Bazel Slack channel every now and then.
Yeah, this is why I mentioned the
Being able to affect the environment the interpreter receives is an existing feature request, so it's useful regardless. |
I don't think we need to tackle the whole pypi standardisation business. It should be enough to allow users to attach shared libraries to # pseudocode
def weasyprint():
pylib = get_generated_py_library_for("weasyprint")
return py_library(**pylib, shared_libs=["./relative-path-to-my-shared-lib"] The above should basically allow users to use the defined target instead of directly using The point here is to let user understand and handle the complexity of linking the libraries correctly. With proper documentation on the exact steps, this should not be very difficult I don't mind doing some basic work on this in my free time, but I don't know where the contribution should go. Currently I use |
I'm really sorry if this has been correctly answered somewhere else, I have looked at a lot of issues in this repo but haven't been able to come to a concrete solution.
So the question is pretty simple, I want to finally build my python binaries which are self runnable (no system dependencies, I plan to keep track of all the complicated packages and maintain this constraint). My end goal is to provide this behavior for both
bazel run
for development, and for the finally shipped built binaryThis includes vendoring shared libraries which the targets depend on. I'm describing the issue I've faced in detail for a specific library, but the whole gist is that I want to provide deterministic ways for searching shared libraries (depending on different platforms, It can be env vars like
DYLD_LIBRARY_PATH
, or hardcoding the search path using @executable_path in linked python interpreter and dumping all libs in that path)The scenario
I have a very concrete scenario right now, installing weasyprint. I'll boil down the situation here:
libcairo.dylib
andlibgobject.dylib
. There might be more but I'm trying to figure this out with two of them first.dlopen
system callLD_LIBRARY_PATH
in linuxDYLD_LIBRARY_PATH
in Mac, and some predefined system foldersDYLD_LIBRARY_PATH
if its used to invoke a system interpreter (which happens cuzpy_binary
depends on system python)Again, it's not just Mac that Im concerned about, but I want to provide users with all system packages (as much as I can) when they do a. bazel run. I also have an end goal of creating a self deployable packaged binary without any system dependencies for windows (without docker support)
Stuff I tried
DYLD_LIBRARY_PATH
inpy_binary.env
and it does not work, the generated python script for bootstrapping the process also doesn't seem to support thisDYLD_LIBRARY_PATH
inmain.py
, but it seems it should be set before launching for dlopen to use that pathhow conda does it
After some research, this is how Conda does it: https://docs.conda.io/projects/conda-build/en/stable/resources/use-shared-libraries.html#
This seems like a pretty good approach, in essence, they do this:
@rpath
in the interpreter installed using Conda to point to@loader_path/../lib/
@rpath
to@loader_path/
, thus making sure everyone tries to load everything fromlib
folder.For windows they use
PATH
(slightly un-ideal but yea), linux seems to follow something similarpatching the built interpreter
My next strategy was to try to replicate conda, but the packaged interpreter does not have @rpath set, patching it gives an error
Now is there any way to do this other than creating my own toolchain rules? How does google do it?
Now I'm pretty stumped, all I can do right now, is just give users some directory containing all shared libs for some target and ask them to set it to a lib path search location, this is error prone and manual.
dlopen
andctypes.utils.find_library
are.Thanks for the help
The text was updated successfully, but these errors were encountered: