-
-
Notifications
You must be signed in to change notification settings - Fork 14.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Solve the libGL ABI problem #31189
Comments
AFAIK glibc is very good at keeping ABI-compatible, so we would better make it shared, IMO, as the risk of e.g. using |
I agree. I guess in principle the same PLT patching approach would work. Except for exported global variables, which libcapsule currently doesn't know how to redirect. |
Is there any progress about this issue? :( |
No new progress. This one:
is still the next plan (at least for me). |
The shared libc approach is in progress, patches are currently awaiting review on libc-alpha. |
Neat! I hope to have time to take a look at this again sometime this year. Especially since (IIRC) in my fork I added some potentially useful things like DT_RUNPATH support. |
This issue has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/xorgserver-upgrade-and-startx/6834/10 |
This comment has been minimized.
This comment has been minimized.
still an issue |
I marked this as stale due to inactivity. → More info |
Well still an issue unfortunately.. |
I have no expertise in this but could our graphics drivers just be statically linked perhaps? (Note: |
The malloc problem from OP would still apply. (especially when you hint at combining musl and glibc) EDIT: maybe, if the drivers were built against quite an old glibc and linked everything except glibc statically. |
IIUC, this problem is the cost to pay to support closed source drivers (AMD, NVIDIA), right? Otherwise, mesa covers all the free software drivers available at least on GNU/Linux and thus should be safe to rely on for libGL.so. Is my understanding correct? |
Yes, I think so. The other part (I know) would be to have mesa drivers in way more closures, though that's a price we'd be willing to pay, I expect. |
I would not expect the libcapsule approach work with libwayland-client. While libwayland-client is developed in a backward compatible way to not break the ABI, I do not think it ever considered having multiple different versions of itself to be interoperable. What I mean is, an application calls libwayland-client to create a Now, EGL implementation is different from the GL implementations and drivers but close enough that I thought of mentioning it. With X11 you have the workaround that the EGL or whatever implementation could just open more X11 connections itself and just pass the window id around, but that won't work with Wayland where you must use the same connection where all the protocol objects live, and your windows simply do not exist on additional connections. |
This issue has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/could-we-robustly-protect-against-errors-version-glibc-2-33/18343/4 |
This issue has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/problems-with-using-packages-from-unstable/18999/10 |
I don't know much of the internals of glibc, so I appologize if my comment makes no sense… But I can imagine a generic solution that combines the best of both world (runnable on newer systems, and purity when running on older systems): if we wrap all packages, we could check in the wrapper the GLIBC version used by the drivers of the system. If the version is newer than the GLIBC used by the current system, then we Note that this comes at the price of wrapping all executables or the loaders (wrapping loaders would not work for statistically linked binaries, see also #150841), but I don't see any other solution and I'm not skilled enough to understand and compare this approach to the solution provided above. My hope is that it may work with libwayland-client, but I know nothing about libwayland-client, so I'm curious to know what you think @ppaalanen |
I have no idea about libc. All I was saying is that you do not want to link two different versions of libwayland-client into the same process. |
This issue has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/do-gui-applications-not-work-on-non-nixos-using-nixpkgs-only/19070/13 |
Since NVIDIA appears to be opening up their driver (not quite fully yet, just the kernel module atm), this may actually become viable I guess. |
The issue here is user-space. The things I've heard so far haven't raised my hopes wrt. this ticket. EDIT: well OK, perhaps in the sense that it might help improving the nouveau drivers over the following years. |
This issue has been mentioned on NixOS Discourse. There might be relevant details there: |
Duplicate of #9415? There's a significant amount of hisory in both issues, but I feel like it should really be consolidated into one. |
They're quite similar and have the same root cause but this one is more general as this is an issue you can run into on NixOS aswell. |
The problem
The design of libGL drivers is such that the userspace part of the driver consists of a libGL.so that gets loaded in each process using OpenGL. That is, each driver vendor (Mesa, NVIDIA, AMD) ships their own libGL.so that we select dynamically (and impurely) by having NixOS set:
and having the NixOS module set the symlinks pointing to the proper packages depending on the system configuration. Now while the OpenGL ABI itself is stable, a major pain point for us that the impurity causes are conflicting library versions between any libraries that the driver itself and the application depends on.
Issue #16779 shows a manifestation of this problem: applications built on NixOS 16.03 would stop working on NixOS 16.09, because of a version conflict between libwayland.so used both by the application and Mesa: the application itself causes version X of libwayland.so being loaded to the process, but Mesa requires version Y of libwayland.so being loaded, thus the application cannot start up and fails with:
Note that this problem is not inherently specific to NixOS -- the same problem is known to happen on other distros as well when the libstdc++ version provided by the Steam runtime conflicts with the libstdc++ that Mesa requires.
A (potential) solution
An attempt of solving this has been done in the libcapsule project (https://git.collabora.com/cgit/user/vivek/libcapsule.git/tree/README) by a Collabora employee. The approach taken there is to build a stub libGL.so that uses the little-known
dlmopen()
function to create a completely new symbol namespace for dynamic linking, and load the real libGL.so of the graphics driver there, and then redirect all exported symbols from the stub libGL.so to the entry points in the real libGL.so living in the segregated dynamic linker namespace. This is implemented via a clever hack of patching the PLT table of the stub libGL.so to point to the real libGL.so's entry points, so there is zero overhead for function calls to libGL!The problems in practice
I attempted to package and use libcapsule during NixCon 2017, with not-so-great success (https://github.com/dezgeg/libcapsule, https://github.com/dezgeg/nixpkgs/tree/libcapsule). While approach taken by libcapsule seems theoretically sound, one problem seems to be that the proxied libGL driver needs to also provide exports for libX11.so among some other xcb libraries. I'm not totally sure why, but I'm guessing the X11 client driver keeps some per-process state on which GLX client-side library is associated with which X screen, so having two different libX11.so's in the main symbol namespace and inside the capsuled symbol namespace would break things.
Now, that causes a problem because libraries like libXi (probably accidentally) allocate memory with
malloc()
from outside the capsule but free it withXFree()
, which crashes becauseXFree()
calls thefree()
inside the capsule, and those two glibc of course have their independent heaps. AFAICT, there's currently no way to have certain libraries loaded only once and shared by both the maindlmopen()
namespace and the in-capsuledlmopen()
namespace.A potential way to avoid that problem might be to try would be to use libcapsule between libglvnd and and the driver, which shouldn't require the hack of exporting symbols from libX11. Though what worries me a bit is whether having multiple glibcs loaded in will work either, given that there are some per-process and per-thread kernel APIs where the two glibcs might step on each others' feet. (
set_robust_list()
andsbrk()
come to mind). But presumably the glibc people have given at least some thought to that, or the entiredlmopen()
would become pretty much useless...cc: @vcunat, @abbradar
The text was updated successfully, but these errors were encountered: