Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression in consteval: error[E0080]: could not evaluate static initializer (unable to turn pointer into raw bytes) #99923

Closed
Manishearth opened this issue Jul 29, 2022 · 56 comments
Labels
A-const-eval Area: Constant evaluation, covers all const contexts (static, const fn, ...) P-critical Critical priority regression-from-stable-to-beta Performance or correctness regression from stable to beta. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Milestone

Comments

@Manishearth
Copy link
Member

Manishearth commented Jul 29, 2022

Unfortunately I'm not able to come up with a minimal working example for this, but running cargo check -p icu_datetime --examples on https://github.com/unicode-org/icu4x (commit unicode-org/icu4x@021a92e should be fine)

ends up throwing a ton of errors that look like this:

    Checking icu_testdata v0.6.0 (/home/manishearth/dev/icu4x/provider/testdata)
error[E0080]: could not evaluate static initializer
   --> /home/manishearth/dev/icu4x/utils/zerovec/src/zerovec/mod.rs:260:14
    |
260 |           let (data, mut metadata): (usize, usize) = core::mem::transmute(bytes);
    |                ^^^^
    |                |
    |                unable to turn pointer into raw bytes
    |                inside `ZeroVec::<(EraStartDate, tinystr::ascii::TinyAsciiStr<16>)>::from_bytes_unchecked` at /home/manishearth/dev/icu4x/utils/zerovec/src/zerovec/mod.rs:260:14
    |
   ::: /home/manishearth/dev/icu4x/provider/testdata/data/baked/calendar/japanese_v1_u_ca.rs:11:9
    |
11  | /         ::zerovec::ZeroVec::from_bytes_unchecked(&[
12  | |             76u8, 7u8, 0u8, 0u8, 9u8, 8u8, 109u8, 101u8, 105u8, 106u8, 105u8, 0u8, 0u8, 0u8, 0u8,
13  | |             0u8, 0u8, 0u8, 0u8, 0u8, 0u8, 0u8, 120u8, 7u8, 0u8, 0u8, 7u8, 30u8, 116u8, 97u8, 105u8,
14  | |             115u8, 104u8, 111u8, 0u8, 0u8, 0u8, 0u8, 0u8, 0u8, 0u8, 0u8, 0u8, 0u8, 134u8, 7u8, 0u8,
...   |
19  | |             0u8, 0u8,
20  | |         ])
    | |__________- inside `japanese_v1_u_ca::UND_U_CA_JAPANESE` at /home/manishearth/dev/icu4x/provider/testdata/data/baked/calendar/japanese_v1_u_ca.rs:11:9

(full errors; they're long since the "baked" stuff is a bunch of internationalization data that has been "serialized" to Rust code using https://docs.rs/databake)

We have a lot of consteval using unsafe code, which is what's tripping up here. The code that's breaking is code that's currently manually constructing a DST of &[T::ULE] from &[u8], and in a const environment that requires some pointer manipulation:

    pub const unsafe fn from_bytes_unchecked(bytes: &'a [u8]) -> Self {
        // &[u8] and &[T::ULE] are the same slice with different length metadata.
        let (data, mut metadata): (usize, usize) = core::mem::transmute(bytes);
        metadata /= core::mem::size_of::<T::ULE>();
        Self::Borrowed(core::mem::transmute((data, metadata)))
    }

Bisecting the regression points to #97684 (cc @oli-obk, @RalfJung), which seems to be about integer provenance. The code here isn't ideal; there are slightly better ways to do this but nothing is const, so this is what we've got.

Either way, this is a regression.

Bisection:

searched nightlies: from nightly-2022-06-01 to nightly-2022-06-30
regressed nightly: nightly-2022-06-07
searched commit range: fee3a45...50b0025
regressed commit: 9d20fd1

bisected with cargo-bisect-rustc v0.6.3

Host triple: x86_64-unknown-linux-gnu
Reproduce with:

cargo bisect-rustc --start=2022-06-01 --end=2022-06-30 --script=./test.sh 
@Manishearth Manishearth added the regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. label Jul 29, 2022
@rustbot rustbot added the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Jul 29, 2022
@Manishearth Manishearth added the A-const-eval Area: Constant evaluation, covers all const contexts (static, const fn, ...) label Jul 29, 2022
@eddyb
Copy link
Member

eddyb commented Jul 29, 2022

You should've never needed to transmute the pointer side to usize, this works fine:

pub const unsafe fn from_bytes_unchecked<T>(bytes: &[u8]) -> &[T] {
    let (data, mut metadata): (*const u8, usize) = core::mem::transmute(bytes);
    metadata /= core::mem::size_of::<T>();
    core::mem::transmute((data, metadata))
}

@Manishearth
Copy link
Member Author

Yeah, I managed to make it work with this instead:

        // &[u8] and &[T::ULE] are the same slice with different length metadata.
        let data = bytes.as_ptr();
        let mut metadata = bytes.len();
        metadata /= core::mem::size_of::<T::ULE>();
        Self::Borrowed(core::mem::transmute((data, metadata)))

@RalfJung
Copy link
Member

RalfJung commented Jul 30, 2022 via email

@Manishearth
Copy link
Member Author

I understand, but I will point out that currently what is and isn't UB is in flux (and almost completely undocumented), and also a lot of the APIs to do this safely are not yet available (const slice::from_raw_parts just stabilized1).

I don't think "this is UB" is sufficient to ignore breaking changes when there is no documentation on what Rust considers UB, and Rust is still pinning down what it considers UB. unsafe code writers are currently in a tricky position; where everything is in flux; and the Right Way to do things has not necessarily been figured out yet, so we basically are forced to try something and hope it sticks. I don't think it's particularly fair to put unsafe writers in this position2.

This is not to say that we shouldn't make changes like this, but I think it would be good for things like this to go through a period of warning with better diagnostics, where possible. I had to bisect the compiler to understand what went wrong here, which is fine in my case since I'm used to it, but I don't really think it's acceptable for us to decide that this is an okay user experience for others.

(To be clear, my reaction here is to the statement that this isn't a regression, not to the breakage happening in the first place. Breakages happen, it's hard to predict it all, but I would prefer breakages like this to be handled more gentler once discovered)

Footnotes

  1. Yes, I know that the broken bit of code was the piece of code before that with the transmute; however the lack of slice::from_raw_parts is what ended up in me writing the code this way, since it led to a clearer symmetry in the code.

  2. Especially with the pointer provenance experiment stuff seems to be changing rapidly and that does not give us enough time to catch up to the exciting new forms of UB that are getting introduced. As far as I can tell, this particular case is an instance of that; I was somewhat aware that usize casts were going to become UB but haven't considered it a problem yet because the "safe" APIs for doing such things haven't been made available in Rust yet, and I figured Rust wouldn't cause problems on UB it has introduced that cannot be worked around yet. I would very much like to continue operating under that assumption, anything else basically means we cannot write unsafe code, period, until the UCG has figured stuff out.

@workingjubilee
Copy link
Member

At minimum we need to start emitting a lot better diagnostics about how const + unsafe is extra-"fun" territory. If something happens inside a const unsafe context that is busted in that context and we detect it, but it would work in a non-const unsafe context, we should start signaling appropriately. If it would be UB in both contexts, all the more reason to emit a very clear error message! As is, This is a... cryptic... notice which lacks much context. People need extra help with the transmuting rules in const unsafe because it does not necessarily allow all the transmutes that non-const unsafe does.

And if people start hearing about how const unsafe is a Bad Time, then they at least can know when to sigh in despair about how they have encountered the Bad Time. Much like occasionally someone decides to write a function with five different lifetimes and then questions their fn life_choices<'up, 'to, 'the, 'current, 'moment'>(...) when they see the resulting error message featuring Esteban in crab cosplay trying to explain how busted their code is. Then they either soldier on with fixing their code or take a different tack. Ideally we should be there with a useful suggestion at that point.

We do have more latitude to make something stop compiling if it's UB, of course. Rust programmers will probably be unsurprised if they hear "well, that unsafe block was actually never defined behavior, whoops" and get shuffled around to writing different code. But we need to have some directionality in mind when we actively break some extant code.

@eddyb
Copy link
Member

eddyb commented Jul 30, 2022

Keep in mind the provenance changes pretty much only affect runtime execution.
CTFE has its own much stricter rules that AFAIK are unaffected by provenance variants, as it's by necessity/design at least as restrictive as the most restrictive provenance model for runtime, with symbolic pointers into allocations that behave more like disjoint address spaces.

The irony here is that where before miri was (wrongly?) doing nothing (resulting in an usize variable holding a non-integer value), it now has to do something in order to make the ptr2int transmute not UB, and that is enough to unlock its ability to detect UB it was previously not seeing (because it used to just let a value pass through unmodified).


Now, why did CTFE miri not complain about ptr2int transmutes before? Is it only UB now?
There's an easy way to be sure - get it to run its validity checks godbolt:

union Transmute<T: Copy, U: Copy> {from:T,to:U}

pub static PTR_TO_USIZE: usize = unsafe { Transmute{from:&0}.to };

(using an union to allow testing on versions old enough to not have const fn transmute)

Since 1.33, and still the case now, in 1.62 (wow, half the lifetime of Rust stable), it errors with:

error[E0080]: it is undefined behavior to use this value
 --> <source>:3:1
  |
3 | pub static PTR_TO_USIZE: usize = unsafe { Transmute{from:&0}.to };
  | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ type validation failed: encountered a pointer, but expected initialized plain (non-pointer) bytes

(the error has gotten more detailed since, but this is what it used to look like)


Okay, but why does the "more local" one not error? Is there somehow "less UB" deeper in CTFE? Nah:

#[inline(always)]
fn enforce_validity(_ecx: &InterpCx<$mir, $tcx, Self>) -> bool {
false // for now, we don't enforce validity
}

It's just... turned off. It has been turned off ever since the functionality was added in:

It's only enabled by miri (the tool):

But for rustc we don't even have a -Z flag to turn it on so I can't even check what it would've done for old versions.

Presumably it would be very expensive for large arrays, but it seems like an increasing hazard (as more people are building all sorts of abstractions using const fn) to just let it sit disabled.
Perf-wise we might just need some fast paths for allocations that contain zero relocations and I guess arrays of types with few validity invariants? It would be interesting to see how far we could get without making it easy for unsoundness to creep in.

@RalfJung did we ever do perf/crater runs with enforce_validity returning true for CTFE miri?
Even if we can't land it right away, we should probably be tracking its viability?

@RalfJung
Copy link
Member

RalfJung commented Jul 30, 2022

I do agree with the concern that our UB rules can be hard to follow. However, note that the UB involved here is much more clearly documented than most other kinds of UB that we have. Quoting from the transmute docs:

Transmuting pointers to integers in a const context is undefined behavior. Any attempt to use the resulting value for integer operations will abort const-evaluation.

So, if this is not enough, then I don't know what is. We also have an RFC saying that the compiler is not required to detect all UB. (I was arguing for mandating some amount UB detection, but the consensus was we shouldn't do that.) Put together, I am not quite sure what we could have done differently here. It is very easy for interpreter changes to "accidentally" improve or regress UB detection, and we need the ability to continue evolving that interpreter without going through a month-or-year-long deprecation process each time.

Do you have concrete suggestions for something that should have been done differently when landing #97684? Or is your concern that we need more docs elsewhere (where, and which kinds of docs?), and perhaps some way for crate authors to opt-in to stricter UB checks for CTFE? The irony is that for runtime UB we have Miri, but there is no way to have compile-time UB checks that are as strict as what Miri is doing...

Transmutation is a hornet's nest, and it is super hard to even talk about it precisely without first giving an entire lecture on Rust operational semantics (the details of which do not have consensus yet). This is true for runtime transmute and even more true for CTFE transmute.


Keep in mind the provenance changes pretty much only affect runtime execution.
CTFE has its own much stricter rules that AFAIK are unaffected by provenance variants, as it's by necessity/design at least as restrictive as the most restrictive provenance model for runtime, with symbolic pointers into allocations that behave more like disjoint address spaces.

Yeah, this is only tangentially related to the strict provenance discussion. (The relationship is that as an outcome of that discussion I was able to make the handling of pointers in Miri a lot more coherent, and one side-effect of that refactor is improved UB detection even for CTFE.) The rules for what you are allowed to do with pointers during CTFE have never changed, we are now just able to detect rule violations a bit better.

did we ever do perf/crater runs with enforce_validity returning true for CTFE miri?

There recently was an experiment to selectively enable it for some types/situations, and even that was too expensive: #95377.

@Manishearth
Copy link
Member Author

Manishearth commented Jul 30, 2022

I do agree with the concern that our UB rules can be hard to follow. However, note that the UB involved here is much more clearly documented than most other kinds of UB that we have. Quoting from the transmute docs:

So, if this is not enough, then I don't know what is.

Yeah, I'm aware (since opening this issue and digging further) that this is documented. Thing is, it's on a function that has existed for ages; users aren't going to recheck that documentation each time they use it once they're familiar with it.

If Rust had a complete story for UB it would be totally worth it to go and read through the UCG docs, the reference, the nomicon, and the stdlib docs to learn what is and isn't UB for once and for all. While stuff is in flux and largely undocumented it is not practical to do such a literature survey each time, for the slim chance you actually find an answer for your specific kind of UB.

I don't know if it's possible for it to ever be "enough" until the work of the UCG group is finished. I absolutely would appreciate more docs like this, redundantly placed on the stdlib and UCG docs, but the problem here is a more general one. And it's a completely understandable problem — of course UCG can't wave a magic wand and be done in a week — I'd just like more understanding extended towards lib authors dealing with it.

I think of this Douglas Adams quote when thinking about how we document UB today:

“But the plans were on display…”
“On display? I eventually had to go down to the cellar to find them.”
“That’s the display department.”
“With a flashlight.”
“Ah, well, the lights had probably gone.”
“So had the stairs.”
“But look, you found the notice, didn’t you?”
“Yes,” said Arthur, “yes I did. It was on display in the bottom of a locked filing cabinet stuck in a disused lavatory with a sign on the door saying ‘Beware of the Leopard.”

In a situation where most UB is not documented, users do not expect UB to be documented, and there are four places to go look for UB documentation (hell, the transmute docs link to two of them), a documentation change on a function that has existed for ages and hasn't really had changes to UB-ness is not actually that prominent IMO.

We also have an RFC saying that the compiler is not required to detect all UB.

I'm aware. My point is based on what we ought to do, not what we can do.

Do you have concrete suggestions for something that should have been done differently when landing #97684? Or is your concern that we need more docs elsewhere (where, and which kinds of docs?), and perhaps some way for crate authors to opt-in to stricter UB checks for CTFE? The irony is that for runtime UB we have Miri, but there is no way to have compile-time UB checks that are as strict as what Miri is doing...

As I said earlier my pushback is more about the reaction here rather than #97684 itself, since mistakes happen, but here you go:

If it had been known at the time of #97684 that it would break code, I would like to have seen some discussion into whether or not a future incompat warning could have been introduced, with better diagnostics. I get that this is not always possible, but it should have been discussed.

Furthermore as @workingjubilee said, given that const is spicy when it comes to UB we should probably be adding better diagnostics in general, but especially around any kind of error.

More docs are always good. However, while UCG is still figuring stuff out, I personally have almost zero expectation of seeing more docs on everything y'all figure out. I'm happy to let y'all figure stuff out first and doc it later, my pushback is against UB being exploited or being errory till we're in that final stage.

It would also be nice if there were generally more communication on "UB in const is spicy". Stuff helping unsafe code writers learn that UB in const is different and how to reason about UB in const would be great (perhaps a blog post?).

Which I don't necessarily expect, I know y'all have a lot on your plate I just think it would be nice to have, and I would kinda consider that having been adequately communicated as a prerequisite to const code breaking.

But basically, I would like to see some thought given to the user experience of unsafe writers around stuff like this. I'm more plugged in than most to this stuff work; I've been paying some attention to the UCG discussions and I've had chats with various friends who are involved or knowledgeable. If this bit me, this kind of thing will quite likely happen again with more unsuspecting users, and I'd rather they had ample warning and a good user experience for things like this.


To give an idea of why I'm pushing back here; the status quo of writing unsafe Rust is to try and avoid UB as much as possible based on what you know is definitely UB, what you have seen documented, running miri a bunch, and for some people paying attention to what UCG is discussing. The flipside of this is that all of us are basically assuming that we have a fair amount of UB somewhere since it's just not possible to know for sure, and that we will have ample warning to fix that.

Let me repeat that: we are all assuming we have UB, and that assumption is probably correct. Which means when you say "UB fixes are not regressions" with no caveats, in our case, just means "you don't get stability at all". It doesn't mean "be careful and you can avoid breakages". It doesn't mean "be careful there may be breakages around weird niche stuff". It just means "no stability for you".

Of course we cannot expect long term stability on this, rustc should not be prevented from eventually being stricter, none of us expect that it will. However the "ample warning" is something we can expect. It's the only way to make a situation like this work at all.

In general we have been operating under the assumption that UB , especially kinds of UB not found in C, will start breaking once UCG has actually reached a point where there's documentation that covers most UB that we can basically read a couple times, internalize, and apply. We have not been expecting these things to break until then; we have been expecting that one day we will sit down with the Complete Docs and go through our unsafe code and fix it alll, and later the old code will start breaking. And until then, we pay some attention and if Exciting New kinds of UB are announced, we go fix those; we can expect some time pass before that UB starts breaking.

Again, as far as I can tell, this is the only practical way to write unsafe code right now. So a position that ample warning is not necessary because it's a soundness change and thus allowed to be breaking is very scary, since I have a hard time reading that as anything but "if you write unsafe code, all bets are off on the stability of your software and it might break tomorrow". That's a bit of a rug pull for a language where the fact that you can write unsafe code is a major reason it works for many kinds of software at all.

While the assumption of ample warning is the only practical way to make this situation work, a lot of this assumption also comes from the fact that for Exciting New kinds of Rust UB, there's basically no chance of LLVM suddenly exploiting it, and it's extra work for Rust folks to write optimizations that depend on that UB, so it won't happen until after there's a full story.

const is an interesting wrinkle to that, one which I hadn't thought about before but @workingjubilee and @eddyb got me to connect the dots: UB in const isn't an LLVM thing, it's a miri thing, and while const typically has no need to perform optimizations, the strong soundness requirements on something that plugs into the type system, and just general design concerns of miri, mean that it is far more likely that Exciting New kinds of Rust UB will trip up const even though the regular compiler will take years to exploit them.

I do think it's worth spending some time communicating the way UB in const is different (and worth making the diagnostics for it better).

Pairing that with an understanding that UB-breakages are more likely in const (since it's a compile time error) than in actual compiled code could work. I would prefer ample warning for const code as well; and given that it's compile time the "warning" can actually come in the form of a future incompat warning, but "if you're writing unsafe in const, expect breakages" is actually workable (unlike "if you're writing unsafe, expect breakages", which is extremely not workable). I'd expect it to be clearly communicated before the breakages start, but it's a workable position to have.

Ultimately, a compile time error in const with no warning is far less scary than subtle "miscompiles" suddenly happening due to UB, so I'm kinda glad we're having this discussion here instead of later 😄

@Manishearth
Copy link
Member Author

It is very easy for interpreter changes to "accidentally" improve or regress UB detection, and we need the ability to continue evolving that interpreter without going through a month-or-year-long deprecation process each time.

Oh, also, like I said before, I totally understand stuff like this slipping; there are many times when it's too subtle to realize and this case might be one of them. But, in general, future incompat cycles aren't that long and it would be nice to at least consider them on a case by case basis.

@saethlin
Copy link
Member

There recently was an experiment to selectively enable it for some types/situations, and even that was too expensive: #95377.

The largest regression in the entire rustc-perf benchmark suite was 1.94% on ctfe-stress, 0.8% on any real crates. The only way to invent a large regression was with a completely contrived program. Perhaps it is worth asking, @Manishearth would you have accepted a 0.8% compile time regression to know about this sooner, or to have more thorough UB detection in const in general?

@Manishearth
Copy link
Member Author

@saethlin Probably, yeah, I'd even be happy to run a build under some rustflag to opt in if I knew about it (but stuff here changes so often it's hard for me to know about these things)

However in this case it's unclear to me why there would have to be a perf regression for this. We currently already detect this UB, what I would have preferred is if we threw a future incompat warning for a stable release cycle and then moved on. It's understandable if that's not compatible with the architecture of miri, but so far I haven't seen any discussion of this.

@oli-obk
Copy link
Contributor

oli-obk commented Jul 30, 2022

what I would have preferred is if we threw a future incompat warning for a stable release cycle and then moved on.

What @saethlin did was try out how expensive adding any checking would be without changing the miri logic. What Ralf did was a cleanup that accidentally opted into some of that checking by just not having a code path for the UB case anymore.

So yes, for this specific case we could have written a future incompat lint had we known about it. Had we known about it we'd have cratered it and discussed it, too. The change just seemed innocent to me as it didn't actually make things simpler, but just avoided some relaxed logic that I though only miri ever used. Turns out CTFE also went through that path in the presence of UB.

I guess it all boils down to the fact that we don't even have full branch coverage via out test suite.

I guess the only answer I have here is that i'll be more careful in future reviews. Not sure what else to do with the current capacity we have for CTFE work.

@Manishearth
Copy link
Member Author

I guess the only answer I have here is that i'll be more careful in future reviews. Not sure what else to do with the current capacity we have for CTFE work.

@oli-obk to try and be abundantly clear, I do not think that this was a problem at review time. Mistakes happen, it's fine, and I've stated that multiple times before above. Even with more stringent reviews it's hard to catch all of this. It's going to happen. I've worked on rustc before, I understand how hard it is to catch all the things.

My reaction here is entirely against the response to this bug that "it's not a regression because it's UB".

I'm okay with stuff breaking by accident. The accident is not the problem. Stating that such things are not accidents in the first place is what I'm bristling against.

If your position is that if y'all had known then y'all would have considered a future incompat lint I'm quite happy with that. That was not the impression I got from Ralf's original comments, however, hence my reaction.

@eddyb eddyb added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. I-compiler-nominated Nominated for discussion during a compiler team meeting. regression-from-stable-to-beta Performance or correctness regression from stable to beta. and removed regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. labels Jul 30, 2022
@RalfJung
Copy link
Member

RalfJung commented Jul 30, 2022

Yeah, I'm aware (since opening this issue and digging further) that this is documented. Thing is, it's on a function that has existed for ages; users aren't going to recheck that documentation each time they use it once they're familiar with it.

To be fair, the comment has been there ever since the function was callable on CTFE in stable.

The problem is, nobody expects there to be a difference between CTFE UB and run-time UB, and so nobody thinks of re-checking the docs when calling a function in CTFE for the first time. CTFE and run-time are two pretty fundamentally different ways of running code, bu we make them mostly the same, and that seems to work well enough that most people are not even aware of the distinction. And for safe code that's just fine, but in unsafe the seams do show (and I don't think we can entirely avoid that, we can just work harder on making this less jarring).

But basically, I would like to see some thought given to the user experience of unsafe writers around stuff like this. I'm more plugged in than most to this stuff work; I've been paying some attention to the UCG discussions and I've had chats with various friends who are involved or knowledgeable. If this bit me, this kind of thing will quite likely happen again with more unsuspecting users, and I'd rather they had ample warning and a good user experience for things like this.

Yeah, I don't disagree, and I appreciate the feedback!

I just wonder what the concrete next steps are that we could do here, given our resource constraints. I can't currently think of a feasible way that would make better CTFE UB detection into future-incompat lints. So on the technical level the best I can think of is to have a flag for extra CTFE UB detection. But that won't be nearly enough. As you said we need docs to ensure const unsafe code authors know that const unsafe has stricter rules than regular unsafe, and I don't know where to put those docs.

In general we have been operating under the assumption that UB , especially kinds of UB not found in C, will start breaking once UCG has actually reached a point where there's documentation that covers most UB that we can basically read a couple times, internalize, and apply. We have not been expecting these things to break until then; we have been expecting that one day we will sit down with the Complete Docs and go through our unsafe code and fix it alll, and later the old code will start breaking. And until then, we pay some attention and if Exciting New kinds of UB are announced, we go fix those; we can expect some time pass before that UB starts breaking.

Thanks for sharing this; I was not aware that this is a common sentiment.

I think it reflects well how we have been treating runtime UB, but not CTFE UB, as this issue shows.

"if you're writing unsafe in const, expect breakages"

Yeah that's kind of where we are at, I think.

Ultimately, a compile time error in const with no warning is far less scary than subtle "miscompiles" suddenly happening due to UB

I'm happy we agree on that. :) This is why we are less careful with CTFE UB raising more errors than with LLVM exploiting more runtime UB. But I think I failed to appreciate just how much of a black box CTFE UB is for many people.

However in this case it's unclear to me why there would have to be a perf regression for this. We currently already detect this UB, what I would have preferred is if we threw a future incompat warning for a stable release cycle and then moved on. It's understandable if that's not compatible with the architecture of miri, but so far I haven't seen any discussion of this.

(Also replying to @oli-obk's reply to this) I don't think a future incompat lint would have been feasible. Currently a CTFE query can either succeed or fail, there's no infrastructure for "succeed with a result but show a warning". Worse, the codepath that would produce the successful result doesn't even exist any more, and getting rid of that codepath was the entire point of the PR that caused this problem. (And I have at least one more similar PR planned. I guess we'll crater that one.^^)

My understanding is that RFC 3016 was passed with the explicit knowledge that when CTFE UB detection becomes better, that will cause some code with CTFE UB to fail to compile. Had I known there is an expectation that each time we do this, we put up a future incompat lint, I would have pushed back a lot more -- this will seriously impact our ability to evolve the CTFE & Miri shared interpreter. So I am fully onboard with working on better docs, opting-in to more checks, all of that, but if the expectation is a future incompat warning then I am concerned.

@eddyb
Copy link
Member

eddyb commented Jul 30, 2022

EDIT: please ignore, see #99923 (comment) for why I was wrong

(click to open me being wrong)

Nominating for @rust-lang/compiler discussion for the below proposal
(hoping @RalfJung and @oli-obk agree to it)

1. Undoing #97684's "transmute<*T, usize> is now lossy instead of UB", in CTFE

[EDIT: just noticed this is regressed on beta! - the beta that will be stable next week]
(cc @rust-lang/release is it too late to backport a fix?)


Minimal repro (on Scalar - original was either Scalar or ScalarPair, not sure):

pub static FOO: u8 = unsafe {
    let r: &char = &'x';
    let r2i: usize = std::mem::transmute(r);
    let r2i2r: &u8 = std::mem::transmute(r2i);
    *r2i2r
};

Repro builds on stable w/o warnings, but fails on beta on the second transmute, i.e. the first one is now not UB, but instead extracts the offset(?).

On beta, this is the code responsible for determining whether memory reads are lossless:

let read_provenance = |s: abi::Primitive, size| {
// Should be just `s.is_ptr()`, but we support a Miri flag that accepts more
// questionable ptr-int transmutes.
let number_may_have_provenance = !M::enforce_number_no_provenance(self);
s.is_ptr() || (number_may_have_provenance && size == self.pointer_size())
};

We can make it always true in CTFE for beta, effectively reverting #97684 for CTFE, but not e.g. cargo miri.

The code looks a bit different currently, but the fix should identical (always true in CTFE):

.read_scalar(alloc_range(Size::ZERO, size), /*read_provenance*/ s.is_ptr())?;

/*read_provenance*/ a.is_ptr(),

.read_scalar(alloc_range(b_offset, b_size), /*read_provenance*/ b.is_ptr())?;

To get CTFE diagnostics here when lossiness would occur (either an error or a future-incompat warning), I believe we would have to pick a different type than bool here, with a third case of "diagnose lossiness".

I haven't seen the discussion so far about ptr::addr in CTFE but I would expect we would never expose the intra-CTFE-allocation offset of a CTFE pointer (if possible within reason).

2. Future-incompat warnings for CTFE validation

Note that the "lossiness diagnostics" discussion above is superseded by simply always doing lossless reads in CTFE, and then always validating the read value, which should catch misuses with greater accuracy.

We should start doing CTFE validation in as many cases as possible (and ideally e.g. crater the remaining ones and provide a -Z flag that aggressively checks as much as possible), but we transform the errors produced by the additional validation, into future-incompat warnings, in CTFE.

I think we can move pretty fast with the warnings, but it may depend on crater results and/or user feedback.

@workingjubilee
Copy link
Member

re: "do we need future incompat lints for every such change?"

Yes, I think that's the expectation in the absence of thorough expectation management, which is why my earliest remarks were "we need to be doing a lot more expectation management". If it was thoroughly understood by Rust programmers that UB in const is... explosive... and any unsafe blocks they open in it had better be sound or they will suddenly combust at any random future point, they might have different expectations and we might be able to move a bit faster on this.

@Mark-Simulacrum
Copy link
Member

With regards to backports, the release is August 11th I believe, which is a little under 2 weeks out. Backports over the next week (before August 5th) are fine, past that please communicate with t-release (pinging me on Zulip in #t-release is best) to make sure the release includes the relevant changes.

@RalfJung
Copy link
Member

RalfJung commented Jul 30, 2022

Repro builds on stable w/o warnings, but fails on beta on the second transmute, i.e. the first one is now not UB, but instead extracts the offset(?).

Nah, the first transmute is UB. We just only detect this when its result is used in the 2nd transmute. Here is similar code that errors already on stable:

pub static FOO: usize = unsafe {
    let r: &char = &'x';
    let r2i: usize = std::mem::transmute(r);
    r2i + 0
};

And no it doesn't extract the offset, it just returns the pointer value at integer type. The 2nd transmute errors because what would happen here at runtime is that it'd strip provenance, but that is not possible in CTFE (because we only have an offset, not an absolute address), so we have to error.

The change in behavior is due to the fact that loads at type usize now always strip provenance immediately. usize + 0 already did that before, now even a load with no further operation will do it, hence the new errors.

I haven't seen the discussion so far about ptr::addr in CTFE but I would expect we would never expose the intra-CTFE-allocation offset of a CTFE pointer (if possible within reason).

Indeed it would be a critical bug if that offset was ever observable by CTFE code, including unsafe CTFE code.

Note that the "lossiness diagnostics" discussion above is superseded by simply always doing lossless reads in CTFE, and then always validating the read value, which should catch misuses with greater accuracy.

Scalar/ScalarMaybeUninit is the wrong type for lossless reads. The entire point is for the read to be lossy as indicated by the type of the read. This was always the case wrt being initialized (ScalarMaybeUninit cannot represent partially initialized values), and wrt. mixing multiple provenances in a single pointer-sized region of memory; now it is the case for integers with provenance as well. I feel very strongly that this is the design we want for the interpreter engine.

Allowing values with provenance at integer type in the interpreter means we have to consider what happens to them everywhere, which is a test matrix explosion and source of subtle bugs that we should avoid. And what you say about accuracy is backwards; it is the check-on-creating-a-Scalar that leads to greater accuracy, as this issue shows (we now have better UB detection accuracy than we used to).

re: "do we need future incompat lints for every such change?"

Yes, I think that's the expectation in the absence of thorough expectation management

Comments like this make me feel like the discussion for RFC 3016 was lead without stating all the assumptions people were making. I would not have agreed to the RFC in that form with the expectation of future incompat lints for every case of better CTFE UB detection. I don't see how that can be workable, with the way the interpreter works, for cases of UB like this. This is not UB because some check says "false", it is UB because the interpreter simply doesn't have a (feasible and tested and conforming to interpreter invariants) codepath to continue executing the program.

In fact the RFC explicitly states

This can change from compiler version to compiler version: CTFE code that causes UB could build fine with one compiler and fail to build with another.

So parts of this discussion feel to me like an attempt to alter the RFC decision.

So, @eddyb no I don't think I agree to that plan. I don't see how it can work. We can add some kind of temporary work-around and hope that having integers with provenance will still work for other parts of the interpreter, but we need to get rid of that codepath ASAP so that this does not become a burden for the rest of the interpreter. And I don't see how we can have a future incompat warning here.

If it was thoroughly understood by Rust programmers that UB in const is... explosive... and any unsafe blocks they open in it had better be sound or they will suddenly combust at any random future point, they might have different expectations and we might be able to move a bit faster on this.

Yeah I guess that is where things went wrong.

@eddyb eddyb removed T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. I-compiler-nominated Nominated for discussion during a compiler team meeting. labels Jul 30, 2022
@eddyb
Copy link
Member

eddyb commented Jul 30, 2022

Unnominating... (and edited the original comment).

Alright, that was really pointless on my part, I thought I understood what effect the PR change had, but I missed the interaction between read_target_uint and CTFE relocations, that makes way more sense now (including explaining why the confusing error is happening in the way that it does). (oh and I did show an + 0 example to @Manishearth at some point but I didn't have the mental model for int2ptr transmute's input also being in a similar "read as integer" scenario)

Well, I don't personally have any stake in this (and would prefer if CTFE UB was maximally diagnosed sooner than later), I just got simultaneously excited by "hey this is a simple change" and jump-scared by "oh no this will hit stable any day now", sorry for wasting your time, should've PM'd on Zulip instead.

@RalfJung
Copy link
Member

RalfJung commented Jul 30, 2022

I am working on a patch that should defuse the problem at least for beta, hopefully giving us the time to discuss this a bit more. Your suggestion for mitigation made sense @eddyb, I just don't think this is a long-term solution. That's why enforce_number_no_provenance does not even exist any more on master (it is always true now).

@RalfJung
Copy link
Member

RalfJung commented Aug 4, 2022

"=stricter" would be roughly everything but aliasing (things relying on SB that miri flags as experimental); "=pedantic" would include aliasing, AIUI.

Hm, right now I don't think we'd want to put Stacked Borrows into rustc. But once we have a better idea for the final aliasing model, that might be worth doing. However note that doing this will have a performance cost even when the flag is disabled -- right now we have a lot of careful engineering ensure that the representation of an Allocation in CTFE is the same as what rustc uses outside the interpreter, so that we can use cheap references when transitioning data into / out of the interpreter. Miri's Allocation, OTOH, has all the extra Stacked Borrows state, and even if Stacked Borrows is disabled, the fact that this is a different type means a whole bunch of extra copies. For Miri it's at most one copy per const-allocation, but in CTFE this could be a copy per const-alloc per constant that uses the alloc, which seems bad. This is a compile-time choice, I don't know how to make this a run-time choice (while sticking to safe code).

So, definitely not part of the short-term plan. :)

@workingjubilee do you have concrete cases, besides aliasing violations, that you think should error in paranoid/pedantic mode but not in strict mode? A lot of the ambiguity is around the exact contract of library functions, which Miri is not able to check (Miri detects language UB, not library UB). There are some lingering questions around validity invariants but Miri already takes almost the strictest possible interpretation here (except for recursive validation of references, but that seems unlikely to be our final choice, and if it is we can just make the flag stricter long before any change that would affect CTFE without the flag).

There are some things around padding that we cannot detect yet (padding being unstable on copies); once I implement that for Miri, the CTFE flag will also get those checks.

@workingjubilee
Copy link
Member

I agree that it might be more useful from the library dev (hello, I am a contributor to the library) perspective. Would it be possible to have that flag allow a const_debug_assert! or something to trigger?

@RalfJung
Copy link
Member

RalfJung commented Aug 4, 2022

Hm, that doesn't seem entirely obvious but we should definitely explore that possibility. It has the usual problem that debug assertions in the standard library have -- the stdlib that people actually download is built without debug assertions...

How close is -Zbuild-std to providing a solution for that?

@Manishearth
Copy link
Member Author

@RalfJung the proposed plan sounds pretty great!

@RalfJung
Copy link
Member

RalfJung commented Aug 7, 2022

#100229 implements the first step of that plan.

@Mark-Simulacrum Mark-Simulacrum modified the milestones: 1.63.0, 1.64.0 Aug 9, 2022
@Mark-Simulacrum
Copy link
Member

Re-tagging to 1.64; I believe we reverted the behavior change that prompted this in the 1.63 release (#99965).

@apiraino apiraino added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Aug 11, 2022
Dylan-DPC added a commit to Dylan-DPC/rust that referenced this issue Aug 12, 2022
…lcnr

add -Zextra-const-ub-checks to enable more UB checking in const-eval

Cc rust-lang#99923
r? `@oli-obk`
@estebank
Copy link
Contributor

estebank commented Aug 15, 2022

From the crater run, regression on transmute_copy was found, which I believe is the same kind of issue as this.

Also, the crater run didn't bring up any other cases of this in particular, and icu4x does not have dependents, so I would feel comfortable not backing out of this change (but announcing it very loudly in the blog just in case).

@RalfJung
Copy link
Member

@estebank I don't see how they are even related. The transmute_copy regression is caused by #98839, this here is caused by #97684. Seems entirely unrelated to me?

@estebank
Copy link
Contributor

It seemed to me to be the same category of issue, where we break backwards compatibility in const unsafe code that currently compiles. Whatever decision is made about how to deal with such cases applies for that case. Am I wrong in my reading?

@RalfJung
Copy link
Member

#100545 is more about runtime than const UB I think? Certainly it also applies at runtime, which this one here does not.

And anyway the question in this issue currently isn't whether we'd back out #97684, we already pretty much decided against that. The question is what else we can do to help people. And we have a plan for that.

@RalfJung
Copy link
Member

#101101 implements the next step in the plan I outlined -- giving more context for "read pointer as bytes" errors.

matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Aug 30, 2022
…oli-obk

interpret: make read-pointer-as-bytes a CTFE-only error with extra information

Next step in the reaction to rust-lang#99923. Also teaches Miri to implicitly strip provenance in more situations when transmuting pointers to integers, which fixes rust-lang/miri#2456.

Pointer-to-int transmutation during CTFE now produces a message like this:
```
   = help: this code performed an operation that depends on the underlying bytes representing a pointer
   = help: the absolute address of a pointer is not known at compile-time, so such operations are not supported
```

r? `@oli-obk`
Dylan-DPC added a commit to Dylan-DPC/rust that referenced this issue Aug 30, 2022
…oli-obk

interpret: make read-pointer-as-bytes a CTFE-only error with extra information

Next step in the reaction to rust-lang#99923. Also teaches Miri to implicitly strip provenance in more situations when transmuting pointers to integers, which fixes rust-lang/miri#2456.

Pointer-to-int transmutation during CTFE now produces a message like this:
```
   = help: this code performed an operation that depends on the underlying bytes representing a pointer
   = help: the absolute address of a pointer is not known at compile-time, so such operations are not supported
```

r? ``@oli-obk``
bjorn3 pushed a commit to rust-lang/rustc_codegen_cranelift that referenced this issue Sep 4, 2022
interpret: make read-pointer-as-bytes a CTFE-only error with extra information

Next step in the reaction to rust-lang/rust#99923. Also teaches Miri to implicitly strip provenance in more situations when transmuting pointers to integers, which fixes rust-lang/miri#2456.

Pointer-to-int transmutation during CTFE now produces a message like this:
```
   = help: this code performed an operation that depends on the underlying bytes representing a pointer
   = help: the absolute address of a pointer is not known at compile-time, so such operations are not supported
```

r? ``@oli-obk``
bors added a commit to rust-lang-ci/rust that referenced this issue Sep 9, 2022
…=pnkfelix

beta-backport of provenance-related CTFE changes

This is all part of dealing with rust-lang#99923.

The first commit backports the effects of rust-lang#101101. `@pnkfelix` asked for this and it turned out to be easy, so I think this is uncontroversial.

The second commit effectively repeats rust-lang#99965, which un-does the effects of rust-lang#97684 and therefore means rust-lang#99923 does not apply to the beta branch. I honestly don't think we should do this; the sentiment in rust-lang#99923 was that we should go ahead with the change but improve diagnostics. But `@pnkfelix` seemed to request such a change so I figured I would offer the option.

I'll be on vacation soon, so if you all decide to take the first commit only, then someone please just force-push to this branch and remove the 2nd commit.
@RalfJung
Copy link
Member

RalfJung commented Sep 21, 2022

All steps in the plan laid out above have been taken, hence I'll close this issue.

antoyo pushed a commit to rust-lang/rustc_codegen_gcc that referenced this issue Jun 3, 2023
interpret: make read-pointer-as-bytes a CTFE-only error with extra information

Next step in the reaction to rust-lang/rust#99923. Also teaches Miri to implicitly strip provenance in more situations when transmuting pointers to integers, which fixes rust-lang/miri#2456.

Pointer-to-int transmutation during CTFE now produces a message like this:
```
   = help: this code performed an operation that depends on the underlying bytes representing a pointer
   = help: the absolute address of a pointer is not known at compile-time, so such operations are not supported
```

r? ``@oli-obk``
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-const-eval Area: Constant evaluation, covers all const contexts (static, const fn, ...) P-critical Critical priority regression-from-stable-to-beta Performance or correctness regression from stable to beta. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests