-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alignment of the bytes
of Allocation
to match align
parameter
#100467
Conversation
Punch a hole to allow direct access to bytes - this API should be changed to provide a `Size` value corresponding to an `AllocId`. Additionally, force bytes to be appropriately aligned - the current trick is a hack, and over-aligns. We should use the `Align` parameter to appropriately align these buffers, as otherwise they'll never be able to be used by native code. The "adjust" callback used on deserialized allocations might not have that information - if it doesn't, just overalign to max alignment
…he check make sense (should be len of bytes) for real addrs
Some changes occurred to the CTFE / Miri engine cc @rust-lang/miri The Miri submodule was changed cc @rust-lang/miri Some changes occurred to the CTFE / Miri engine cc @rust-lang/miri |
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @davidtwco (or someone else) soon. Please see the contribution instructions for more information. |
|
Please take the submodule update out of this MR. :) That will be done separately. |
r? @RalfJung |
let size = Size::from_bytes(slice.len()); | ||
let align_usize: usize = align.bytes().try_into().unwrap(); | ||
let layout = std::alloc::Layout::from_size_align(slice.len(), align_usize).unwrap(); | ||
let bytes = unsafe { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think my biggest concern is that this introduces unsafe code deep in the core of the interpreter engine. This is a rather significant cost and risk that the interpreter has to carry, even the compile-time interpreter that definitely does not want or need this.
At the least, we should find a way to factor this such that all the unsafe code lives in the Miri repository. Even that is still not great though, it is exactly the kind of global cost I was worried about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see your point, but I don't think we can align the bytes of the allocation without unsafe
code.
If we want to factor this out into Miri, then one option could be to refactor Allocation
to expose bytes
so that the bytes
alignment can be done on the Miri side. Although, this would be a bigger change to Allocation
.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No we certainly don't want to expose bytes
. But we can make Allocation
generic over how bytes are allocated.
You can add a new type parameter for that:
pub struct Allocation<Prov = AllocId, Extra = (), Bytes = Box<[u8]>> {
bytes: Bytes
and then define a suitable trait
trait AllocationBytes
and add a bound Bytes: AllocationBytes
whereveer needed.
The trait needs to then contain all the operations you need. The implementation in rustc can use Box<[u8]>
and be completely safe, and that way only Miri needs unsafe code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah that's smart -- I had tried earlier to make Allocation
generic over the allocator for bytes which didn't work. But this makes sense!
So this is a first step of a many-step plan -- would be good to have at least some idea of the overall plan, rather than going blindly into some direction without knowing whether it will work out in the end. |
Some changes occurred in src/tools/clippy cc @rust-lang/clippy Some changes occurred in src/tools/rustfmt cc @rust-lang/rustfmt |
pub trait AllocBytes: | ||
Clone + core::fmt::Debug + Eq + PartialEq + PartialOrd + Ord + core::hash::Hash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you make this also have Deref<Target = [u8]>
, you can remove everything but adjust_to_align
, from_bytes
and uninit
and should have to adjust much less code as it would pick up the slice methods that Box<[u8]>
also exposes via Deref
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah good point! I was hoping to keep AllocBytes
more general though, so I can use it to represent foreign memory in Miri, that isn't represented by a [u8]
. Given that, do you think it's ok to keep these extra functions in AllocBytes
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What kind of memory are you thinking of? You already support get_slice_from_range
, which returns a &[u8]
. Most of what I'm proposing is to get rid of that method and require the implementor of AllocBytes
to put the logic into the Deref::deref
method instead. Or are you planning to change the return type of the get_slice
methods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, if we can make a custom get_slice
method with the same return type then that should work! In my implementation these methods still return &[u8]
-- my current representation of foreign memory is represented as a pair of machine address and length, and the get_slice
methods build the slice from std::slice::from_raw_parts
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, so you could implement Deref
and DerefMut
for your foreign memory representation and have the deref
and deref_mut
methods use from_raw_parts
. Then you can remove the get_slice*
methods and others from the trait, as you get them for free once Deref
is a super trait
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see -- that's smart, thanks for the suggestion! (Will push these changes soon)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree the trait has a lot more methods than I was expecting so this sounds good.
/// Get the base address for the bytes in an `Allocation` specified by the | ||
/// `AllocID` passed in; error if no such allocation exists. | ||
pub fn get_alloc_base_addr(&self, id: AllocId) -> InterpResult<'tcx, usize> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Get the base address for the bytes in an `Allocation` specified by the | |
/// `AllocID` passed in; error if no such allocation exists. | |
pub fn get_alloc_base_addr(&self, id: AllocId) -> InterpResult<'tcx, usize> { | |
/// Get the base address for the bytes in an `Allocation` specified by the | |
/// `AllocID` passed in; error if no such allocation exists. The address will | |
/// be exposed (on the host system!). | |
/// | |
/// It is up to the caller to take sufficient care when using this address: | |
/// there could be provenance or uninit memory in there, and other memory | |
/// accesses could invalidate the exposed pointer. | |
pub fn expose_alloc_base_addr(&self, id: AllocId) -> InterpResult<'tcx, usize> { |
Also note that this only allows read-only access to the bytes. If C code writes to this memory, it's writing to a shared reference! So I think you probably want to make this mut
.
|
||
// Default `bytes` for `Allocation` is a `Box<[u8]>`. | ||
impl AllocBytes for Box<[u8]> { | ||
fn uninit<'tcx, F: Fn() -> InterpError<'tcx>>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not uninit though, it returns a zeroed buffer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh that's true -- I just took this code from here where it was originally in the file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The entire allocation is considered uninit based on its InitMask
, yeah. But looking just at the bytes, which is the level of abstraction you have here, this is zeroed. The bytes cannot be uninit.
/// Adjust the bytes to the specified alignment -- by default, this is a no-op. | ||
fn adjust_to_align(self, _align: Align) -> Self { | ||
self | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this about? Alignment should be set at allocation time.
@emarteca FYI: when a PR is ready for review, send a message containing |
Note: the build is failing b/c the Miri machine doesn't specify its @rustbot ready |
Miri is a subtree now so yes you are meant to fix Miri in the same PR. :)
|
☔ The latest upstream changes (presumably #104370) made this pull request unmergeable. Please resolve the merge conflicts. |
@rustbot author |
// REEEEEEE TODO ELLEN | ||
// make a new constructor that just takes a Bytes | ||
// then, make the allocation this way in ffi_support, and add it to the memory with allocate_raw_ptr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't look like it should land in this form ;)
/// The address of the bytes. | ||
fn expose_addr(&self) -> u64; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to make this return a raw pointer, and leave exposing and things like that to the caller.
And in fact... why is this even needed? One can use the deref
to get such a pointer, right?
+ core::fmt::Debug | ||
+ Eq | ||
+ PartialEq | ||
+ PartialOrd | ||
+ Ord | ||
+ core::hash::Hash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove the Ord
bounds, they should not be needed. No idea why Allocation
derives PartialOrd
, but #104933 removes that.
@emarteca any updates on this? |
Closing this due to inactivity. Feel free to create a new pr if you wish to continue with this |
Support allocations with non-Box<[u8]> bytes This is prep work for allowing miri to support passing pointers to C code, which will require `Allocation`s to be correctly aligned. Currently, it just makes `Allocation` generic and plumbs the necessary changes through the right places. The follow-up to this will be adding a type in the miri interpreter which correctly aligns the bytes, using that for the Miri engine, then allowing Miri to pass pointers into these allocations to C calls. Based off of rust-lang#100467, credit to `@emarteca` for the code
Support allocations with non-Box<[u8]> bytes This is prep work for allowing miri to support passing pointers to C code, which will require `Allocation`s to be correctly aligned. Currently, it just makes `Allocation` generic and plumbs the necessary changes through the right places. The follow-up to this will be adding a type in the miri interpreter which correctly aligns the bytes, using that for the Miri engine, then allowing Miri to pass pointers into these allocations to C calls. Based off of rust-lang#100467, credit to ``@emarteca`` for the code
Support allocations with non-Box<[u8]> bytes This is prep work for allowing miri to support passing pointers to C code, which will require `Allocation`s to be correctly aligned. Currently, it just makes `Allocation` generic and plumbs the necessary changes through the right places. The follow-up to this will be adding a type in the miri interpreter which correctly aligns the bytes, using that for the Miri engine, then allowing Miri to pass pointers into these allocations to C calls. Based off of rust-lang#100467, credit to ```@emarteca``` for the code
Aligning the
bytes
field ofAllocation
so that it is aligned matching thealign
parameter.Main changes are to:
Allocation
constructorsfrom_bytes
anduninit
adjust_from_tcx
method inAllocation
Instead of just converting the relevant
[u8]
to aBox<[u8]>
directly we now:std::alloc::Layout
of the alignment theAllocation
is specified to have[u8]
for this bufferThen we write the relevant bytes to it, check for uninit, etc -- same process as previously.
NOTE
With this current code, when the
bytes
field is deallocated, even though the size is right, in the layout the alignment might be under-required: for example we might have allocated it with alignment 4 and deallocate it with alignment 2. We know this is undefined behaviour violating the memory fitting requirements ofdealloc
-- we're going to work around this by adding a wrapper struct for theBox<[u8]>
that stores the alignment it was allocated with, and manually implements deallocation with the right alignment.Why are we doing this?
We want the address of the
bytes
field of anAllocation
to be accessible, and represent the actual address of the[u8]
. This is part of an effort to pass pointers in Miri to the C FFI.