-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Defuse the bomb that is mem::uninitialized
#87032
Conversation
r? @kennytm (rust-highfive has picked a reviewer for you, use r? to override) |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
I think it would be a good idea to keep this panic - that is, we would panic if |
@Aaron1011 I'm not particularly opposed to leaving that panic check in, but is it necessary?
With this PR there won't be any undefined behavior in their code to find, at least not due to uninitialized memory (because there won't actually be any uninitialized memory), so there won't be any consequences that can occur. And as for helping users fix their code, I think the only thing to be done at this point is to point them towards MaybeUninit, which is what the deprecation notice has been doing for going on two years. |
It looks like
In a large dependency graph, this could easily go unnoticed (especially if it's an upstream crate triggering the deprecation warning). Of course, this is a moot point for this PR, as we already get the proper panic. |
I suppose that might be one advantage in favor of the "just remove this API entirely" camp: if this were a future-compatibility warning, it would penetrate the usual silencing of upstream warnings. |
This looks to me like it's strictly an improvement over the current situation. Removing the function would probably be best in the long run, but that raises serious questions about rust's stability policy. At least this is a change we can make right now. |
#[rustc_diagnostic_item = "mem_uninitialized"] | ||
pub unsafe fn uninitialized<T>() -> T { | ||
// SAFETY: the caller must guarantee that an unitialized value is valid for `T`. | ||
unsafe { | ||
intrinsics::assert_uninit_valid::<T>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you leave this call in place? That will allow us to keep the "attempted to leave type T
uninitialized, which is invalid" panics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was under the impression from your previous comments that you were content with only panicking on can't-be-zero types. If your concern is just the diagnostic regression, I would much rather solve this some other way than reintroducing this check, since one of the points of this PR is to deliberately make it possible to use this function with types like bool
, so as to "un-break" code that was broken when these checks were added in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I may have misunderstood what was happening.
These panics were intentionally added in #66059, as a way of catching misuses of uninitialized memory. Even though std::mem::uninitialized
now 'happens' to zero out these types, users should not be relying on this behavior (as your doc comment states). That is, writing std::mem::uninitialized::<bool>()
is still a bug - any code doing so is broken, and should not be 'un-broken' by removing the legitimate panic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Aaron1011 I thought the whole point was that mem::uninitialized
is broken for any type. Why are types with niches being singled out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See also @thomcc's comments about how the panics have broken several of his old crates; I would love for this to fix them. https://rust-lang.zulipchat.com/#narrow/stream/219381-t-libs/topic/disarming.20mem.3A.3Auninitialized/near/244037069
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is in fact not at all broken for types that are allowed to be uninitialized, such as MaybeUninit.
Are there any types that are allowed to be uninitialized other than MaybeUninit (and arrays and tuples of MaybeUninit)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any types that are allowed to be uninitialized other than MaybeUninit (and arrays and tuples of MaybeUninit)?
Yes:
- The type
()
(or really any inhabited zero-sized type... if you only consider the validity invariant, not the safety invariant) - Any user-defined
union
type... well to be fair this is not ratified yet, that's just what I would propose. But I think there is consensus that anyunion
with a field of size zero (essentially, a user defining their ownMaybeUninit
) is okay to be left uninitialized.
Now, I am not saying that this is all that useful, I just want to get our facts straight so we have a solid basis to make decisions on. :) The only legitimate reason I can think of to call mem::uninitialized
is to create an array of MaybeUninit
, and we should probably just offer a better API for that (possibly using const generics, like other stdlib APIs are already doing AFAIK).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth noting here too that even with this PR removing the panic and changing the implementation to zeroed
, rustc still independently produces a scary warning when attempting to use uninitialized
with any type with invalid values:
warning: the type `bool` does not permit being left uninitialized
--> un.rs:3:22
|
3 | let x = unsafe { std::mem::uninitialized::<bool>() };
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| |
| this code causes undefined behavior when executed
| help: use `MaybeUninit<T>` instead, and only call `assume_init` after initialization is done
|
= note: `#[warn(invalid_value)]` on by default
= note: booleans must be either `true` or `false`
So even though the panic no longer exists (to the benefit of people using libraries that do this), people writing this code themselves (who will therefore be seeing warnings) will still be made aware that this isn't a good idea (even if, for the moment, it may not actually be invoking UB from the compiler's perspective).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That lint however has a lot of false negatives. For example, it will not fire when the code that calls mem::uninitialized
is generic.
I've also had to update a UI test, as the panic message from attempting to use |
cc @RalfJung |
This is not correct.
What is a "type with uninhabited values"? Values cannot be (un)inhabited; it is types that are (un)inabited: a type is inhabited if and only if there is a value of that type. For example, |
Cc #66151 |
I don't see how it could be UB, as
Whatever the jargon is for "a type where not all possible bit representations are valid", where an example of an invalid value is e.g. 3 for |
It's library UB. The implementation happens to implement the UB as a panic, but that is certainly not guaranteed. Moreover, if you use
Ah, that has nothing to do with "inhabited" then. We don't really have any short jargon term for such types. (Some people say "type with a niche", but "niche" usually refers to niches the rustc layout algorithm actually uses.) Also, "all possible bit representations" is doing a lot of work here: are "all possible bit representations" valid for |
Yes, I am seeking to draw a distinction here between "the result of this operation is unspecified" and "the result of the operation is uninitialized". It is safe for an API to provide an unspecified value, and AFAIK not safe to provide an uninitialized value. I'll change the original PR description to weaken its claims. However, I still think this is worth considering. |
I don't know what you mean by "safe" here, but the usual definition of "soundness of a safe API" is certainly not satisfied by any of the suggestions you are making here. A library might define a type like pub fn check(x: &NotBothZero) {
if x.0 == 0 && x.1 == 0 { unsafe { unreachable_unchecked(); } }
} It is impossible to make any of the functions we are considering here "safe" or "sound". The best we can hope for is to make them not cause immediate (language) UB, but that is a much weaker property. I agree this might be a good idea, but I also feel like there are a lot of misunderstandings in this thread. (This is leaving aside the issue that "unspecified result" is IMO a meaningless term until you define more precisely from which possible set of options the result is picked. If "uninitialized" is in that set, it is surely no less harmful than the alternative. But we need not go there, it is unnecessary to even talk about "unspecified result" in this context.) |
So taking a step back: first of all I fully agree that this PR is "correct" in the sense that as per the documentation of Secondly, I agree that it might be a good idea to reduce the risk of old code breaking due to incorrect use of However, as already mentioned in the subthread, I think it might be a bit early to reduce the panic: we're being rather conservative with ramping that up, so e.g. code using Also I think if we are doing this, it might be worth to document. People might wonder why |
My apologies, I'm being rather informal and am using "safe" not to mean "this becomes a safe function" (as I am trying to repeatedly emphasize, I'm not seeking to remove the I also think, with this PR, the semantics of Finally, I think it's likely (but I could be convinced otherwise) that any code still using
Sure, I'm not opposed to other methods of ameliorating the brokenness of this function, but I'll reiterate my opinion that any code using
Sure, I can add more language to this effect. |
I agree with the second part. For the first part, note that we are not changing the contract provided by this function, so all these previously incorrect uses are still incorrect. We just avoid letting this blow up in people's faces. This is likely what you mean, but I want to emphasize that "using this unsafe function in a safe manner" still means following the documentation, not following whatever the current implementation happens to do. We are not changing the contract between (I'm sorry if I come across as pedantic here. The terminology around safely using unsafe code is subtle but important.)
I disagree. This is not the correct assumption for The way the invariants work out is exactly the same for both
That seems plausible, and it is why I like the thrust of this PR. I just think it comes too early since we haven't yet ramped up the panics around |
Instead of all zeros should |
This must surely be spurious, can a reviewer retry? |
@bors retry not our ref cargo |
I think there's a real risk that this could do more damage than it prevents in the cases that the values are not valid for an all-zeroes bitpattern (which isn't unrealistic at all). In that case the compiler backend will see us definitively initializing the value to an impossible bitpattern, rather than leaving it uninitialized (which, given that many C/C++ variables start out uninitialized, seems unlikely to be exploited in any way). I would love in the future for us to "defuse" mem::uninitialized, but I don't really feel like In practice, I think the "defused" version of mem::uninitialized would need to consider which bitpatterns are valid in the output type, and if initialize to one of those (this is likely to be a lot of work, and requires knowledge of the exact type being output by the call). Short of something like that I don't really see this actually making anything less dangerous... Note: I'm someone who has had their old code broken by this, and I'd love to see the situation with mem::uninitialized improved. I've called the story of happened to mem::uninitialized a violation of Rust's stability guarantees, in a way that hasn't happened to me in many other languages (both conceptually — there was the tacit promise that this API existing meant it could be used, and in practice — I literally can't run some old code of mine because of a bad mem::uninitialized in a transitive dependency that triggers early enough in the program's execution to be effectively during startup). |
@thomcc Out of curiosity, can you tell us the name of the library that caused your code to break when these panics were added? |
It was inside an old version of glium, I believe. |
This is a claim that I do not see enough evidence for. I think the opposite is true: legacy users assumed that there is no such thing as a validity invariant that needs to be upheld at all times, and instead the invariant would only be relevant when one There is evidence for my claim: there is/was a lot of legacy code (that I saw while going through crater regressions for these panics), some of it written by veteran Rust experts in the Rust 1.0 days, that uses To be fair, we don't panic for code that uses |
Yes, I know, this is not one of Rust's moments of glory. We screwed up in providing the original API that is not compatible with how the language works elsewhere, and then it took way too long to get If you think the logic for the panic should be changed such that old code can at least try to run again (with UB, but there's a high chance that current rustc will be fine -- though future rustc will exploit this more and more), then please speak up at #66151. I honestly expected more backlash than we actually got so far (which is one reason why I cranked up the panic so slowly). |
This has not been I'll close and reopen the PR to hopefully retrigger PR CI. |
I base my assertion in that even the earliest rough drafts of the Rustonomicon mention that merely "producing" (not inspecting) invalid values is undefined behavior. Here's a version from June 2015: https://github.com/rust-lang/nomicon/blob/96efb3776ddef687bb56e8ccb0af1978df677b40/intro.md#what-does-unsafe-mean . You surely know better than I do whether or not people managed to uphold this in the wild, especially for types like |
Interesting, I did not know the Nomicon said this from the start. This does not match what I saw in code out there. But what I saw is heavily biased by what the panic is checking for, so it is not suited to make any representative statement. |
I can certainly imagine that people used to C might not have treated the This PR is, of course, still putting a lot of faith in the |
/// | ||
/// The reason for deprecation is that the function basically cannot be used | ||
/// **This function is deprecated with extreme prejudice.** | ||
/// It is replaced by [`MaybeUninit<T>`], which must be used instead of this function. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you looked at how this renders in rustdoc? The standard is one summary sentence, trying to keep it succinct so it renders on, in this case, the std::mem
page. The previous one was already a teeny bit long.
@bstrie Ping from triage, what's next steps here? |
@crlf0710 That's a good question. I think for now it makes sense to withdraw this PR while we wait for #66151 to explore tightening |
I've filed an issue at #87675 for the general idea of "initialized" |
mem::uninitialized: mitigate many incorrect uses of this function Alternative to rust-lang#98966: fill memory with `0x01` rather than leaving it uninit. This is definitely bitewise valid for all `bool` and nonnull types, and also those `Option<&T>` that we started putting `noundef` on. However it is still invalid for `char` and some enums, and on references the `dereferenceable` attribute is still violated, so the generated LLVM IR still has UB -- but in fewer cases, and `dereferenceable` is hopefully less likely to cause problems than clearly incorrect range annotations. This can make using `mem::uninitialized` a lot slower, but that function has been deprecated for years and we keep telling everyone to move to `MaybeUninit` because it is basically impossible to use `mem::uninitialized` correctly. For the cases where that hasn't helped (and all the old code out there that nobody will ever update), we can at least mitigate the effect of using this API. Note that this is *not* in any way a stable guarantee -- it is still UB to call `mem::uninitialized::<bool>()`, and Miri will call it out as such. This is somewhat similar to rust-lang#87032, which proposed to make `uninitialized` return a buffer filled with 0x00. However - That PR also proposed to reduce the situations in which we panic, which I don't think we should do at this time. - The 0x01 bit pattern means that nonnull requirements are satisfied, which (due to references) is the most common validity invariant. `@5225225` I hope I am using `cfg(sanitize)` the right way; I was not sure for which ones to test here. Cc rust-lang#66151 Fixes rust-lang#87675
mem::uninitialized: mitigate many incorrect uses of this function Alternative to rust-lang/rust#98966: fill memory with `0x01` rather than leaving it uninit. This is definitely bitewise valid for all `bool` and nonnull types, and also those `Option<&T>` that we started putting `noundef` on. However it is still invalid for `char` and some enums, and on references the `dereferenceable` attribute is still violated, so the generated LLVM IR still has UB -- but in fewer cases, and `dereferenceable` is hopefully less likely to cause problems than clearly incorrect range annotations. This can make using `mem::uninitialized` a lot slower, but that function has been deprecated for years and we keep telling everyone to move to `MaybeUninit` because it is basically impossible to use `mem::uninitialized` correctly. For the cases where that hasn't helped (and all the old code out there that nobody will ever update), we can at least mitigate the effect of using this API. Note that this is *not* in any way a stable guarantee -- it is still UB to call `mem::uninitialized::<bool>()`, and Miri will call it out as such. This is somewhat similar to rust-lang/rust#87032, which proposed to make `uninitialized` return a buffer filled with 0x00. However - That PR also proposed to reduce the situations in which we panic, which I don't think we should do at this time. - The 0x01 bit pattern means that nonnull requirements are satisfied, which (due to references) is the most common validity invariant. `@5225225` I hope I am using `cfg(sanitize)` the right way; I was not sure for which ones to test here. Cc rust-lang/rust#66151 Fixes rust-lang/rust#87675
EDIT: I think for now it makes sense to withdraw this PR while we wait for #66151 to explore tightening intrinsics::assert_zero_valid. I could also imagine an even better version of this PR that uses some value other than 0 (amusingly, 1 might be the most permissive/safe value?). Furthermore, while I still think mem::uninitialized deserves to be as discouraged as we can possibly make it, I am slightly less dim on its existence now that I understand the model of uninitialized memory better and that it actually is possible to use this function safely in some contrived circumstances (let x: () = mem::uninitialized();), which means that it isn't completely ridiculous to expect a hypothetical Rust specification to specify its behavior.
Original below:
The deficiencies with
mem::uninitialized
have long been well-known. It has been long since replaced bymem::MaybeUninit
, and was officially deprecated in November 2019.The problem with
mem::unitialized
is that, as originally conceived, it is fundamentally incompatible with Rust's stance on memory safety. The API as of Rust 1.0 presented a promise that this function could be safe; however, that promise could not be kept. There is no safe way for something with this API to safely provide uninitialized memory.We can see this contradiction reflected in the current implementation: the return value is simply
MaybeUninit::uninit().assume_init()
. And yet if we look at theMaybeUninit
docs, we see the following listed as a misuse of the API:It isn't safe to use uninitialized memory like this, and yet the API of
mem::uninitialized
makes it unavoidable.When a function is marked
unsafe
in Rust, that does not mean that that function is fundamentally memory-unsafe; it means that the caller must manually uphold certain invariants in order to ensure memory safety. This is why allunsafe
functions in libstd are supposed to have a "Safety" section in their documentation to inform the user how the function may be safely used. And yet formem::uninitialized
there is no "Safety" section, because, as implemented, the caller-upheld invariant is essentially this: the return value of this function must never be used.So at best this function is "just" useless. And when it is not just useless, it is immediate undefined behavior.
Rust's stability guarantee permits breakage in the event of unsoundness. In fact, a year prior to the deprecation of
mem::uninitialized
, a check was added that guarantees that the function will panic at runtime when used with many types that are known to have uninhabited values. This caused some breakage in the wild, and yet it was not sufficient to totally solve the issue (self-evidently, as otherwiseMaybeUninit
would not have later been stabilized and this function would not have been deprecated).However, although it is arguably within Rust's right to outright remove this function from the stdlib for the sake of soundness, it is possible to be more conservative. Since we must choose between being unsafe and being safe-but-useless, this PR chooses the latter.
This PR replaces the internals of
mem::uninitialized
with a call tomem::zeroed
. Semantically this is legal: if your code was written to expect any imaginable value, then it must work with any specific value (relevant XKCD). Philosophically this is also defensible: although this new implementation not strictly "uninitialized", the alternative is undefined behavior; since undefined behavior allows any behavior at all, up to and including nasal demons, an interpretation of undefined behavior wherein your uninitialized memory is replaced with initialized memory is legal.This PR has three chief advantages:
mem::uninitialized
that has not been updated since the introduction ofMaybeUninit
is no longer all-but guaranteed to exhibit undefined behavior.mem::zeroed
has its own, more narrowly-scoped check that rejects a subset of the types of the current check.)This PR does not remove the
unsafe
marker on this function, nor does it reverse its deprecation, nor does it suggest that either of those ever be done. We're just building the sarcophagus on Chernobyl here.This was discussed in #libs on Zulip, and there were no outright objections (although some desired to just see this removed from std entirely).