-
Notifications
You must be signed in to change notification settings - Fork 353
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Support for automatically shimming dynamic libraries #813
Conversation
Wow, that's pretty impressive! I called @oli-obk crazy when he said this might even be possible. :D In terms of "how to do this", I don't have many things to say as I haven't tought about this very much. It's also not currently a priority for me, so I am basically fine with anything that works. ;) However, there's the usual caveat that this should have as little impact as possible on the remaining code. An interpreter is a subtle thing to write, getting good test coverage is hard, and we defend against that by striving for maximal code clarity. We don't always achieve that though.^^ But good abstractions are key, and I feel our place/operand system is currently working very well. I think this is a feature that I'd like to see off-by-default; by allowing the interpreted code to do FFI calls it can cause arbitrary UB, that seems like something that the user should opt-in to.
Why does this need a new
Yeah, this is hard. My feeling is it is better to consider this out-of-scope for now. And as you noted, for such allocation there is basically no hope of doing any checks -- I am not just talking about Stacked Borrows, even uninitialized memory cannot be tracked any more. That seems in direct contradiction with Miri's goals as a tool for detecting UB. If you need that kind of FFI, maybe a sanitizer or valgrind are better tools? |
That prevents us from observing any other writes made by C code. As a real-world example, consider let errno_ptr = unsafe { libc::__errno_location(); }
unsafe { libc::read(/* something */); }
let read_errno = unsafe { *errno_ptr };
unsafe { libc::write(/* something else */); }
let write_errno = unsafe { *errno_ptr }; With your approach, |
I agree that this should be considered out of scope for now. However, I think it would make sense to consider adding it at some point in the future. Given that Miri is currently the only way of detecting UB in rust code, I think it would be useful to eventually be able to run Miri on arbitary rust code (even if the overhead can be very high). Specifically, I think suporting 'input pointers' would still be compatible with detecting UB in the rest of a Rust program. While FFI usage necessary 'infects' (in the sense of preventing us from doing UB checks) anything we take a pointer to, it need not propagate further through the program. If we have code like: #[repr(C)]
struct MyFFIType { ... }
struct MyWrapper {
normal_field: u8,
ffi_field: MyFFIType
} We won't be able to run proper UB checks on From what I've seen, most FFI code in libstd and other crates tends to already be written in this fashion - normal Tl;dr - I think it could be very useful to have Miri support 'input' pointers - even if we can never check everything, we can still check a lot of things. This would enable usages like running a Tokio echo server and client through Miri - while it would probably be incredibly slow, it would allow us to check an end-to-end test of the whole async io stack. EDIT: This would also allow for us to incrementally improve what FFI could Miri can check. We can continue to add exlicit shims for well-known C functions (e.g. libc::getrandom), while using the automatic shimming as a fallback. While we would still need automatic shims in the general case, we could gradually expand the fraction of |
Oh, so this is two-way then. But once it's two-way, how does that not have all the same difficulties as 'input pointers'? It's C writing into memory that Miri "controls". Once you have that mechanism, you can implement 'input pointers' by enabling Miri to turn a "normal" allocation into a Your proposal suggested to have And I think raw allocations could be implemented with just a few extra machine hooks in librustc_mir, so the complexity would be very manageable.
That's not true, there's also valgrind and various sanitizers.
The plan we (Niko Matsakis and me) had during my internship was that eventually there would be a valgrind mode that implements Stacked Borrows, and that should be able to handle FFI. That sounds more realistic to me than, say, being able to run the part of Firefox that is written in Rust in Miri to check it for UB...
Agreed! |
@Aaron1011 are you still working on this? My thinking is that some more design work is needed before implementation can start in earnest. In that case, I'd prefer to close the PR (to keep the PR list clean) and discuss the design on the issue. |
Closing for now to keep the PR list clean; @Aaron1011 feel free to reopen if you want to pick this up again. |
I've opened this PR to get some feedback on my approach for automatically shimming dynamic libraries (as per #11)
The current implementation uses libffi to generate a call to the underlying function, using the type information provided by the arguments. Using
ripgrep
as a test, this code successfully shimmed a call to 'libc::readlink'The major unresolved question is how to deal with pointers. There are two cases we need to handle:
To handle 'output pointers', we could add a new variant to
rustc_mir::interpret::place::Place
(e.g. `Place::Raw') to handle ffi pointers. We wouldn't be able to track any extra information for these kinds of pointers, since they would be a raw address pointing somewhere into the Miri process.A
Place::Raw
would be created for any pointers returned by a foreign function (either directly or wrapped in another type). All reads/writes to aPlace::Raw
would be implemented by directly writing/reading from the underlying pointer, since we can't know when C code might write/read from it.'Input pointers' will be much tricker. We need to be able to support C code writing directly into Miri-controlled memory. Obviously, this will need to bypass all of the normal StackedBorrows code, since we can't control how the pointer is written to. We might want to create a separate struct in Miri to hold the backing allocations for all such pointers - this would reduce the chance of C code corrupting internal Miri state, and would make it easier to reason about the lifetimes of the raw pointers we create into such allocations.
However, there are tricky scenarios that we need to handle. Consider code like this:
What can we say about this code, especially w.r.t how it should interact with stacked borrows?
As soon as we call 'extern_one', we have to assume that
my_struct
can be modified by C code at any point, for the rest of its lifetime. That is, we need to assume that the raw pointer it receives might be stored somewhere, and later used byextern_two
. This means that any reads/writes from normal Miri will need to go through the same memory used by C code, so that we properly observe/propogate modifications to it. This reminds me a little ofintptrcast
- when we expose a pointer to foreign code, we force it to take on a 'definite' representation for the rest of its existence.Notes:
libffi
crate has a high-level API for making function calls. However, it requires that the return type be specfified at compile-time, which makes it useless for our purposes. As a result, I use a combination of the higher-level API and the raw bindings to the Clibffi
library to create and call the shims.dlopen
any dynamic libraries that the target application links to. We should be able to extract this information from the arguments passed tomiri
bycargo