-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What are the soundness requirements for dlopen
?
#525
Comments
What exactly are we trying to protect against? Let me play devil's advocate here: Due to lack of stable ABI you will most probably be using C ABI anyway, and no name mangling. You might be using stabby or similar (which builds on top of the C ABI), but arguably they are off doing their own thing.
So, assuming extern C API what can we even protect against? C ABI is fundamentally not safe due to lack of name mangling. Is that not the usage scenario then what is? Both possible alternatives (stabby and abi_stable) already solve the safety concerns at a higher level. Is the list of things that such a layer needs to deal with what you want to end up with in this issue? It would probably help to come up with some use cases, describing what could go wrong in order to figure this out. As it is, this issue seems broad and vague, or perhaps I'm misunderstanding it. |
I'm not sure what the usual usecases here are.^^ If people only ever Cc @bjorn3 |
Rustc dlopens codegen backends and uses the rust abi for this. The fact that rust has an unstable abi doesn't matter when you ensure that you use the same rustc version to compile the host and the plugin. For codegen backends using something like stabby or abi_stable is impractical as codegen backends are expected to use the exact same api's as rustc uses internally. Conversion of values at the abi boundary would result in unacceptable overhead. |
How does this deal with name mangling and dlsym though? Does it still use no_mangle or does it compute the expected mangled names and pass those to dlsym? |
For the functions in the plugin to be called by the host |
What bjorn3 said makes sense to me, when doing dlopen you need to use no_mangle. And
What are the ways this can fail in? Additional note: The Windows/Mac equivalents to dlopen may also have special consideration. I know that symbol resolution works differently for those (not a single global namespace) but I'm not an expert by any means, especially on those platforms. Not sure how any of this could affect the opsem angle, and if people who don't care about portability will want to make use of the semantics of their platform of choice. I'm not entirely sure what the opsem angle on this even is, how does the AM represent dlopen/LoadLibrary even? |
The concern is if dylib C depends on crate E, but E happens to have the same StableCrateId as B. Then the symbols of the two crates will get mixed up and everything explodes, even though it doesn't look like the |
@RalfJung from a pragmatic point of view two questions come to mind:
|
I don't know that we can really express these soundness requirements in any tangible manner. It's like saying that you must not use |
See rust-lang/rust#10389 and rust-lang/rust#129030 for more of these discussions. In this issue, I am interested in exploring what could be done to fix this, not in discussing threat models. (This doesn't mean I think we must fix this, I just want to know what the options would be.) |
For symbol name collisions I believe you have to collide both the |
Also re: "what's the opsem angle", I think the title question is clear enough: What are the requirements for a programmer to be able to call I think a threat model only comes up when it comes to prioritizing the safety requirements in (2) for human consumption, but abstractly it should be possible to come up with an objective answer to the question. My knowledge of dynamic linking protocols is pretty low so I can't answer the question itself, though. Brainstorming some things based on what has been brought up:
|
Thank you, we now have a concrete issue that can lead to unsoundness, which is much easier to dicuss than the general issues with dlopen (which obviously have more, such as general no_mangle collisions etc). Some thoughts: One thing that comes to mind is that dlopen has flags that affect name resolution of later dlopen as well. In particular Does eager binding (RTLD_NOW and the corresponding ld flags) help at all? I think the newly loaded library will still get messed up in case of a collision, but existing code will be unaffected. So not good enough. On glibc it looks like |
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
I'm interested to see the complete list though, not just some example issues. We need to be exhaustive regarding sources of UB. |
That is actually fine at the top level, and standard practice for plugins: the plugin exports one or more entry points with well known names (no-mangled). These are then accessed using dlsym. The key point here is that you resolve these with dlsym in a handle to the specific dlopened library. You don't attempt to call them directly. Also already resolved symbols stay resolved (I don't know how dlopen interacts with lazy binding actually, that is an interesting question!). |
Is there a way to use dlopen such that all the names stay in a separate universe, as it were, and you only use dlsym to look up function pointers in the library by name instead of having name coincidences cause random things to be merged? That sounds like a much saner linking strategy to me, much easier to ensure soundness that way. |
Well Bevy has that scheme to improve compile times where the Bevy crate itself is compiled into a dynamic library, bypassing the slow (static) linker to link at runtime. I don't know if it uses Also: if Rust's name mangling is a soundness issue for |
I was wondering the same thing. However, in the common case the crates you depend on will be known at compile-time to the Rust compiler, they are just not linked-in. In that case the compiler will still check for I don't know if there are situations where regular dynamic linking can bypass the hash collision check. |
That seems to be what the dlmopen glibc extension is (but see caveats in my above comment on it). Also as I understand it, you would get separate copies of every loaded transitive dependency, which might interact poorly with assuming statics or functions have address stability (if you get two different instances of libstd.so for example) (see #522, I guess). Also on non-ELF (so, Windows PE and MacOS MachO) I believe resolution doesn't use a global name space to begin with (you don't ask for symbol X, you ask for symbol X from library Y). I don't know if those are "fool-proof" though, e.g. what happens if you have two LibA.dll from two different directories. |
Bevy doesn't use dlopen for loading libbevy_dynamic.so, nor can it. Bevy has support for loading dylib plugins though using bevy_dynamic_plugin, but it has been marked as deprecated in bevyengine/bevy#13080 as there were several crashes caused by it. |
And with 64k crates, it's a 50% chance. With a whole lot of things being dlopened (my own project, lccc, comes to mind here, where each frontend, backend, and optimizer are separate dsos), hitting 64k aggregate doesn't seem unrealistic. |
64k crates all with the same crate name is very unlikely to happen. |
On windows, DLLs are more like executables that happen to be running in the same address space rather than shared libraries. You have to make sure you're loading the correct DLL in the first place, but there is no danger of symbol conflicts between DLLs. When loading a DLL by name, there is a specific search order (assuming an absolute path is not used). Manifests can be used to ensure correct versions of dependencies and transitive dependencies - https://learn.microsoft.com/en-us/windows/win32/sbscs/about-side-by-side-assemblies- |
To clarify: StableCrateId is a 64-bit hash. |
Given 64-bit hashes, one would get a collision probability of 10-15 when having 190 versions of the same crate in the crate graph (according to the table here). |
Turns out there are actually two levels of hashes here, with a theoretical chance of collision in each level: cargo hashes some stuff into I don't think that changes anything about the probabilities, but it seems to make it harder to actually check for collisions, since we'd need the original data cargo hashes together to ensure it is all globally unique. |
Cargo could probably switch to using 256 bit cryptographic hashes to reduce the concern here. |
The "I believe" is giving me pause, would be good to have that checked. :) Also, v0 symbol mangling is still not stable... |
Yes. The demangled version of a crate reference in the v0 symbol mangling scheme is
It is stable, but not the default. You can enable it using |
We could probably harden v0 symbol names against collisions by adding a single (but wide) hash to each symbol that includes more information about the all crate-ids occurring in the name. So instead of
we could have something like
where Some other kind of means that does not rely on hashing at all might be preferable. I don't know if something like |
That would actually break the existing collision detection for the non-dlopen case. It also means if you have two versions of the same crate, you can no longer know which crate was used after demangling without parsing the crate metadata of all crates and trying every combination of StableCrateId to see if it matches the given combined hash. |
In what way?
Yes, that's true. But that information is already pretty opaque, right? |
Rustc is guaranteed to error when
In the case of two |
If you do that with a cryptographic hash, the chances of a collision are astronomically low so we may disregard that possibility. Like, there's not a single known example of a collision for SHA256 or other comparable hashes. |
Actually, the proposal would be something like:
There would only be one level of hashing (disregarding that Cargo feeds hashes into
This could be mitigated by using short prefixes of the |
I think that dylibs are just a pain in the ass in general for opsem - so are rlibs to an extent, because you can "guess" (or determine) the mangled name of a symbol and deliberately produce a function with that rustc could probably fix at least some here by using |
I agree we should use protected visibility for non- |
Using
dlopen
is a subtle art. On top of the usual requirements around symbol conflicts and ABI compatibility, Rust's handling of symbols adds certain extra assumptions that can lead to UB here: ideally, we'd make sure that symbols from "different" crates can never clash. During normal builds, this is ensured by checking that theStableCrateId
is globally unique (and hashing everything into theStableCrateId
that is considered as relevant for crate identity), but this check is bypassed bydlopen
.At the very least, this potential risk of collisions in
dlopen
seems worth documenting somewhere. On top of that, is there anything we could do to mitigate this problem? MakingStableCrateId
an actual cryptographic hash and 256 bits large is probably going to be prohibitively expensive, but maybe there is an alternative where onlydlopen
users have to pay for extra checks, and if you don't usedlopen
it doesn't cost anything. One could imagine arust_checked_dlopen
or so that performs the crate ID uniqueness check at runtime, somehow. Is that realistic? Is it useful?The text was updated successfully, but these errors were encountered: