Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

const-eval: loading or overwriting parts of a pointer is not supported #87184

Closed
RalfJung opened this issue Jul 16, 2021 · 11 comments
Closed

const-eval: loading or overwriting parts of a pointer is not supported #87184

RalfJung opened this issue Jul 16, 2021 · 11 comments
Labels
A-const-eval Area: Constant evaluation, covers all const contexts (static, const fn, ...) A-miri Area: The miri tool T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@RalfJung
Copy link
Member

RalfJung commented Jul 16, 2021

The Miri core engine can only represent "full" pointers in memory, no parts of a pointer. This leads to strange behavior on code like

    let mut p = &42;
    unsafe {
        let ptr: *mut _ = &mut p;
        *(ptr as *mut u8) = 123; // overwrite the first byte of the pointer
    }
    let x = *p; //~ ERROR this operation requires initialized memory

If overwriting a part of a pointer happens during CTFE, we halt execution (since #87248). In Miri, instead we de-initialize the entire pointer, so a write will affect the bytes "next to it". (Halting execution is not an option here.)

If loading a part of a pointer (including as the source of a mem-to-mem copy) happens during CTFE or Miri, we halt execution.

Long-term, it would be great to implement support for having just a few bytes of a pointer in a Miri core engine Allocation. However, this might be hard to do without a perf regression.

Cc @rust-lang/wg-const-eval

@RalfJung RalfJung added A-const-eval Area: Constant evaluation, covers all const contexts (static, const fn, ...) A-miri Area: The miri tool T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jul 16, 2021
@oli-obk
Copy link
Contributor

oli-obk commented Jul 16, 2021

* We throw a "not supported" error, halting execution. This at least avoids silently doing the wrong thing. However, Miri might not get away with this if there's code out there relying on this to work (which seems not unlikely).

I prefer the status quo over this, even if it would make debugging easier. We could consider emitting a lint though.

* Throw such an error in CTFE, but in Miri just remove the provenance of the other bytes. With CTFE/Miri engine Pointer type overhaul #87123 this is actually not terribly wrong, but it is not really right either -- losing provenance like this does not match what LLVM does, where individual bytes of a pointer can maintain their provenance separately.

That seems like we're treating this as UB or at least unspecified behaviour and doing something random that we can change in the future. I don't like it much. It will also cause differences between the runtime and compile-time behaviour of const fns

Implement support for having just a few bytes of a pointer in a Miri core engine Allocation. However, this might be hard to do without a perf regression.

preferrable, but yea, a perf problem. If we want per-byte provenance, we either can't do ByRef->ByVal conversion based on types anymore, but need to also consider the value's representability. That could be ok, but makes things more complex in various places. In addition, as you noted, we'd need to adjust the Allocation to allow to represent such things. We could probably get away with adding a u8 field to each relocation entry in the Allocation that specifies the length of the relocation area. This would probably be unlikely to affect the performance of behaviour other than when such partial pointer things are performed.

That said... I think we should at least wait until someone has a use case for such things. It is quite some work to implement and the benefits are unclear without having a use case

@RalfJung
Copy link
Member Author

RalfJung commented Jul 16, 2021

That seems like we're treating this as UB or at least unspecified behaviour and doing something random that we can change in the future. I don't like it much. It will also cause differences between the runtime and compile-time behaviour of const fns

Well, we already treat it as UB to overwrite some bytes of a pointer and then inspect the remaining bytes. So we already have a difference between runtime and compile-time behavior of const fn. The goal of this proposal is to make it more obvious when the issue arises: throwing a "const-unsupported" error seems like a reasonable signal to indicate things that work differently between compile-time and runtime.

preferrable, but yea, a perf problem. If we want per-byte provenance, we either can't do ByRef->ByVal conversion based on types anymore,

One step after the other. :) For now I am just talking about what can be represented inside Allocation. This is relevant for writes. You seem to think of what happens when one loads a value that has partial provenance, and wants to represent this in a ScalarMaybeUninit. This is relevant for reads. We currently throw an "unsupported" error when this happens. (This can already happen now when one does a read that contains a pointer but does not exactly match the pointer.) This partial provenance read case is very similar to supporting reads of partially initialized data, so I think we should discuss that in #69488.

So, let's focus just on what happens on a write now in this issue, I'd say.

We could probably get away with adding a u8 field to each relocation entry in the Allocation that specifies the length of the relocation area.

This is not enough. If we overwrite the first byte of a pointer, we need to be able to represent "bytes 2-7 of this pointer are stored here".

I was thinking of implementing this by adding a second "relocation" table for single-byte relocations: so a relocation at position x with value (tag, offset): (Tag, u8) would mean that at position x we have the byte #offset of a pointer tagged tag. This matches the PtrFragment type in my memory interface notes.

8 adjacent bytes with single-byte relocations [(tag, 0), ..., (tag, 7)] would then be equivalent to having relocation tag in the current relocation table. Basically that table would be a more efficient representation of the common case, and the new table would hopefully usually be empty so that we don't have a big perf impact.

That said... I think we should at least wait until someone has a use case for such things. It is quite some work to implement and the benefits are unclear without having a use case

My usecase is: I want Miri to be a faithful implementation of the Rust semantics.
If nobody has a usecase for this I think we can just make it an error to do these partial overwrites, then at least we will know for sure if/when a usecase materializes in the future. Currently we'd probably not even know...

@oli-obk
Copy link
Contributor

oli-obk commented Jul 16, 2021

My usecase is: I want Miri to be a faithful implementation of the Rust semantics.

well... we can keep CTFE doing simple things and give Miri the ability to go crazy via more Machine shenanigans.

@RalfJung
Copy link
Member Author

Propagating the Machine so far down into Allocation is non-trivial. It certainly will mean a lot more work to implement than just doing it for both CTFE and Miri.^^

@oli-obk
Copy link
Contributor

oli-obk commented Jul 16, 2021

don't we already do that with AllocExtra?

@RalfJung
Copy link
Member Author

I removed that with my previous PR. ;) (And I am glad I did.)

@RalfJung
Copy link
Member Author

RalfJung commented Jul 16, 2021

We could just always have that new fine-grained relocation table, and have a boolean flag in Provenance to determine if it is used. So CTFE would have zero perf cost (just a tiny bit of space cost).

But honestly I think there's no good reason CTFE shouldn't implement this the right way (if we have the code anyway for Miri) -- as long as that doesn't cause slowdown for code not needing this feature.

@oli-obk
Copy link
Contributor

oli-obk commented Jul 16, 2021

But honestly I think there's no good reason CTFE shouldn't implement this the right way (if we have the code anyway for Miri) -- as long as that doesn't cause slowdown for code not needing this feature.

yea that's true. I'm not going to stop you from pursuing this, it just seems like quite a bit of work and lots of performance tuning

@RalfJung RalfJung changed the title const-eval: overwriting parts of a pointer makes the rest uninitialized const-eval: overwriting parts of a pointer is not supported Jul 31, 2021
bors added a commit to rust-lang-ci/rust that referenced this issue Aug 2, 2021
…i-obk

CTFE: throw unsupported error when partially overwriting a pointer

Currently, during CTFE, when a write to memory would overwrite parts of a pointer, we make the remaining parts of that pointer "uninitialized". This is probably not what users expect, so if this ever happens they will be quite confused about why some of the data just vanishes for seemingly no good reason.
So I propose we change this to abort CTFE when that happens, to at last avoid silently doing the wrong thing.
Cc rust-lang#87184

Our CTFE test suite still seems to pass. However, we should probably crater this, and I want to do some tests with Miri as well.
@RalfJung RalfJung changed the title const-eval: overwriting parts of a pointer is not supported const-eval: loading or overwriting parts of a pointer is not supported Feb 25, 2022
@RalfJung
Copy link
Member Author

RalfJung commented Apr 5, 2022

FWIW, even with #94527 landed this issue means #94371 is not fully resolved, but the remaining problem causes an error rather than silent data corruption:

#![feature(const_swap)]
#![feature(const_mut_refs)]

#[repr(C, packed)]
struct Demo(u32, &'static i32, u32, i64, i64);

const C: (Demo, Demo) = {
    let mut x = Demo(0, &1, 2, -1, -1);
    let mut y = Demo(3, &4, 5, -1, -1);
    std::mem::swap(&mut x, &mut y);
    (x, y)
};

fn main() {
    let (d1, d2) = C;
}

says

error: any use of this value will cause an error
    --> /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:1096:9
     |
1096 |           copy_nonoverlapping(src, tmp.as_mut_ptr(), 1);
     |           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     |           |
     |           unable to turn pointer into raw bytes
     |           inside `std::ptr::read::<MaybeUninit<u8>>` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:1096:9
     |           inside `mem::swap_simple::<MaybeUninit<u8>>` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/core/src/mem/mod.rs:762:17
     |           inside `ptr::swap_nonoverlapping_simple::<MaybeUninit<u8>>` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:913:9
     |           inside `swap_nonoverlapping::<Demo>` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:861:33
     |           inside `std::mem::swap::<Demo>` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/core/src/mem/mod.rs:726:29
     |           inside `C` at swap.rs:10:5
     |
    ::: swap.rs:7:1
     |
7    | / const C: (Demo, Demo) = {
8    | |     let mut x = Demo(0, &1, 2, -1, -1);
9    | |     let mut y = Demo(3, &4, 5, -1, -1);
10   | |     std::mem::swap(&mut x, &mut y);
11   | |     (x, y)
12   | | };
     | |__-
     |
     = note: `#[deny(const_err)]` on by default
     = warning: this was previously accepted by the compiler but is being phased out; it will become a hard error in a future release!
     = note: for more information, see issue #71800 <https://github.com/rust-lang/rust/issues/71800>

@RalfJung
Copy link
Member Author

This affects swap_nonoverlapping, which no longer supports copying a pointer by using type u8 and a large enough size to cover the entire pointer. That used to work basically by chance since the old implementation did some optimizations to swap data in larger chunks; #94212 removed that optimization and thus broke this case in Miri.

@RalfJung
Copy link
Member Author

RalfJung commented Jun 3, 2022

My plans for how I want to implement this in Miri changed, but that new plan I think cannot work in CTFE.

For Miri, I mentioned using the PtrFragment representation of my memory model. But that memory model changed, and now my notion of an AM Byte looks like this: an initialized byte is a u8 with some optional provenance. This greatly simplifies the abstract machine, and also its implementation in Miri. In a pointer, that u8 will be the actual byte on the real machine.

However, for CTFE, we don't know the actual byte on the real machine, so the u8 will be some byte of the relative offset. So, we cannot take a pointer apart into its raw bytes and its provenance the same way we can for the runtime handling of pointers.

We also have the problem that I don't think LLVM supports 'partial relocations', so if the final value of a constant contains some partial pointers we have to throw an error.

All put together, I no longer have plans to fix this for CTFE -- no support for partial pointers is just an inherent limitation of compile-time evaluation. We could mitigate it with a lot of work but it'll always remain half-baked. I will hence close this issue, and open new issues for partial pointer support in Miri and for the swap_nonoverlapping problem.

@RalfJung RalfJung closed this as not planned Won't fix, can't repro, duplicate, stale Jun 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-const-eval Area: Constant evaluation, covers all const contexts (static, const fn, ...) A-miri Area: The miri tool T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

2 participants