-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC for an operator to take a raw reference #2582
Changes from 5 commits
fd4b4cd
e250bed
af78d46
1a62d82
879fa5c
0b496ee
0b9df7a
61c2e82
5f7e9ea
96ddb7c
35b7810
9a853d5
6cdac4d
4b60fb1
315bb31
cb2dd0f
1f1e690
8eea0bf
24aad9e
9889619
6a803f1
92042f6
e95df5c
9942dad
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,189 @@ | ||
- Feature Name: raw_reference_operator | ||
- Start Date: 2018-11-01 | ||
- RFC PR: (leave this empty) | ||
- Rust Issue: (leave this empty) | ||
|
||
# Summary | ||
[summary]: #summary | ||
|
||
Introduce a new primitive operator on the MIR level: `&[mut|const] raw <place>` | ||
to create a raw pointer to the given place (this is not surface syntax, it is | ||
just how MIR might be printed). Desugar the surface syntax `&[mut] <place> as | ||
*[mut|const] _` to use this operator, instead of two MIR statements (first take | ||
normal reference, then cast). | ||
|
||
# Motivation | ||
[motivation]: #motivation | ||
|
||
Currently, if one wants to create a raw pointer pointing to something, one has | ||
no choice but to create a reference and immediately cast it to a raw pointer. | ||
The problem with this is that there are some invariants that we want to attach | ||
to references, that have to *always hold*. (This is not finally decided yet, | ||
but true in practice because of annotations we emit to LLVM. It is also the | ||
next topic of discussion in the | ||
[Unsafe Code Guidelines](https://github.com/rust-rfcs/unsafe-code-guidelines/).) | ||
In particular, references must be aligned and dereferencable, even when they are | ||
created and never used. | ||
|
||
One consequence of these rules is that it becomes essentially impossible to | ||
create a raw pointer pointing to an unaligned struct field: `&packed.field as | ||
*const _` creates an immediate unaligned reference, triggering undefined | ||
behavior because it is not aligned. Similarly, `&(*raw).field as *const _` is | ||
not just computing an offset of the raw pointer `raw`, it also asserts that the | ||
intermediate shared reference is aligned and dereferencable. In both cases, | ||
that is likely not what the author of the code intended. | ||
|
||
To fix this, we propose to introduce a new primitive operation on the MIR level | ||
that, in a single MIR statement, creates a raw pointer to a given place. No | ||
intermediate reference exists, so no invariants have to be adhered to. We also | ||
add a lint for cases that seem like the programmer wanted a raw reference, not a | ||
safe one, but did not use the right syntax. | ||
|
||
# Guide-level explanation | ||
[guide-level-explanation]: #guide-level-explanation | ||
|
||
When working with unaligned or potentially dangling pointers, it is crucial that | ||
you always use raw pointers and not references: References come with guarantees | ||
that the compiler assumes are always upheld, and these guarantees include proper | ||
alignment and not being dangling. Importantly, these guarantees must be | ||
maintained even when the reference is created and never used! The following is | ||
UB (assuming `packed` is a variable of packed type): | ||
|
||
```rust | ||
let x = unsafe { &packed.field }; // `x` is not aligned -> undefined behavior | ||
``` | ||
|
||
There is no situation in which the above code is correct, and hence it is a hard | ||
error to write this. This applies to most ways of creating a reference, i.e., | ||
all of the following are UB if `X` is not properly aligned and dereferencable: | ||
|
||
```rust | ||
fn foo() -> &T { | ||
&X | ||
} | ||
|
||
fn bar(x: &T) {} | ||
bar(&X); // this is UB at the call site, not in `bar` | ||
|
||
let &x = &X; // this is actually dereferencing the pointer, certainly UB | ||
let _ = &X; // throwing away the value immediately changes nothing | ||
&X; // different syntax for the same thing | ||
|
||
let x = &X as &T as *const T; // this is casting to raw "too late" | ||
``` | ||
|
||
The only way to create a pointer to an unaligned or dangling location without | ||
triggering undefined behavior is to *immediately* turn it into a raw pointer. | ||
All of the following are valid: | ||
|
||
```rust | ||
let packed_cast = unsafe { &packed.field as *const _ }; | ||
let packed_coercion: *const _ = unsafe { &packed.field }; | ||
let null_cast: *const _ = unsafe { &*ptr::null() } as *const _; | ||
let null_coercion: *const _ = unsafe { &*ptr::null() }; | ||
``` | ||
|
||
The intention is to cover all cases where a reference, just created, is | ||
immediately explicitly used as a value of raw pointer type. | ||
|
||
These two operations (taking a reference, casting/coercing to a raw pointer) are | ||
actually considered a single operation happening in one step, and hence the | ||
invariants incurred by references do not come into play. | ||
|
||
# Reference-level explanation | ||
[reference-level-explanation]: #reference-level-explanation | ||
|
||
When translating HIR to MIR, we recognize `&[mut] <place> as *[mut|const] ?T` | ||
(where `?T` can be any type, also a partial one like `_`) as well as coercions | ||
from `&[mut] <place>` to a raw pointer type as a special pattern and turn them | ||
into a single MIR `Rvalue` that takes the address and produces it as a raw | ||
pointer -- a "take raw reference" operation. This might be a variant of the | ||
existing `Ref` operation (say, a boolean flag for whether this is raw), or a new | ||
`Rvalue` variant. The borrow checker should do the usual checks on `<place>`, | ||
but can just ignore the result of this operation and the newly created | ||
"reference" can have any lifetime. (Currently this will be some form of | ||
unbounded inference variable because the only use is a cast-to-raw, the new "raw | ||
reference" operation can have the same behavior.) When translating MIR to LLVM, | ||
nothing special has to happen as references and raw pointers have the same LLVM | ||
type anyway; the new operation behaves like `Ref`. | ||
|
||
When interpreting MIR in the miri engine, the engine will recognize that the | ||
value produced by this `Rvalue` has raw pointer type, and hence must not satisfy | ||
any special invariants. | ||
|
||
When doing unsafety checking, we make references to packed fields that do *not* | ||
use this new "raw reference" operation a *hard error even in unsafe blocks* | ||
(after a transition period). There is no situation in which this code is okay; | ||
it creates a reference that violates basic invariants. Taking a raw reference | ||
to a packed field, on the other hand, is a safe operation as the raw pointer | ||
comes with no special promises. "Unsafety checking" is thus not even a good | ||
term for this, maybe it should be a special pass dedicated to packed fields | ||
traversing MIR, or this can happen when lowering HIR to MIR. This check has | ||
nothing to do with whether we are in an unsafe block or not. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This compiles without errors on stable Rust today: #[repr(packed)]
pub struct A {
x: u8,
y: u64
}
pub fn foo(mut _x: A) {
let y = &mut _x.y;
*y = 3;
} but the text does not explicitly mention whether this will be an error too. It hints that this will be the case when it says (emphasis mine): "hard error even in unsafe blocks", but it should probably spell this out. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No it doesn't - this code is UB even on stable.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes it does? That's a warning, which is not the same thing as an error. This RFC hints that this warning will be turned into an error, but does not say so explicitly, so I'm left wondering whether this is the case or not. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, we tell LLVM that references are aligned, so this is already UB and AFAIK has always been. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's a future compat warning, as the text quite clearly says. The only thing blocking us from making it an error is resolving this RFC :) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This RFC wouldn't make it a compiler error, but it is still UB and will continue to be. In the future, we intend to make it a hard error. It has always been the case that taking a reference to a field that is not properly aligned is UB, so we cannot soundly allow this in safe code. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I don't know why you and @arielb1 keep bringing up that this is UB. Who is saying otherwise ?
This RFC proposes to make There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
It does so by defining this to be a special operation that never creates a reference to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Re-reading my own RFC text, the changes around what is and is not permitted in safe and unsafe code for taking references to packed fields are actually part of the normative text of the RFC. Should I move them out? I think it makes sense to consider this entire somewhat messy situation together, but I am not sure if the RFC should actually be a decision on all of them. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it is better to leave this out and to better focus on this change only. You can mention that there is an issue open about turning a particular warning into an error, that's currently blocked on having a way to construct a pointer to a field without going through a reference, and that this RFC would solve that blocker. But I wouldn't make the text about that. |
||
|
||
Moreover, to prevent programmers from accidentally creating a safe reference | ||
when they did not want to, we add a lint that identifies situations where the | ||
programmer likely wants a raw reference, and suggest an explicit cast in that | ||
case. One possible heuristic here would be: If a safe reference (shared or | ||
mutable) is only ever used to create raw pointers, then likely it could be a raw | ||
pointer to begin with. The details of this are best worked out in the | ||
implementation phase of this RFC. | ||
|
||
# Drawbacks | ||
[drawbacks]: #drawbacks | ||
|
||
It might be surprising that the following two pieces of code are not equivalent: | ||
```rust | ||
// Variant 1 | ||
let x = unsafe { &packed.field }; // Undefined behavior! | ||
let x = x as *const _; | ||
// Variant 2 | ||
let x = unsafe { &packed.field as *const _ }; | ||
``` | ||
|
||
# Rationale and alternatives | ||
[rationale-and-alternatives]: #rationale-and-alternatives | ||
|
||
This is a compromise: I see no reasonable way to translate the first variant | ||
shown in the "Drawbacks" section to a raw reference operation, and the second | ||
variant is so common that we likely do not want to rule it out. Hence the | ||
proposal to make them not equivalent. | ||
|
||
One alternative to introducing a new primitive operation might be to somehow | ||
exempt "references immediately cast to a raw pointer" from the invariant. | ||
However, we believe that the semantics of a MIR program, including whether it | ||
has undefined behavior, should be deducible by executing it one step at a time. | ||
Given that, it is unclear how a semantics that "lazily" checks references should | ||
work, and how it could be compatible with the annotations we emit for LLVM. | ||
|
||
Instead of compiling `&[mut] <place> as *[mut|const] ?T` to a raw reference | ||
operation, we could introduce new surface syntax and keep the existing HIR->MIR | ||
lowering the way it is. However, that would make lots of carefully written | ||
existing code dealing with packed structs have undefined behavior. (There is | ||
likely also lots of code that forgets to cast to a raw pointer, but I see no way | ||
to make that legal -- and the proposal would make such uses a hard error in the | ||
long term, so we should catch many of these bugs.) Also, no good proposal for a | ||
surface syntax has been made yet -- and if one comes up later, this proposal is | ||
forwards-compatible with also having explicit syntax for taking a raw reference | ||
(and deprecating the safe-ref-then-cast way of writing this operation). | ||
|
||
We could be using the new operator in more cases, e.g. instead of having a smart | ||
lint that tells people to insert casts, we could use that same analysis to infer | ||
when to use a raw reference. This would make more code use raw references, thus | ||
making more code defined. However, if someone *relies* on this behavior there | ||
is a danger of accidentally adding a non-raw-ptr use to a reference, which would | ||
then rather subtly make the program have UB. That's why we proposed this as a | ||
lint instead. | ||
|
||
# Prior art | ||
[prior-art]: #prior-art | ||
|
||
I am not aware of another language with both comparatively strong invariants for | ||
its reference types, and raw pointers. The need for taking a raw reference only | ||
arise because of Rust having both of these features. | ||
|
||
# Unresolved questions | ||
[unresolved-questions]: #unresolved-questions | ||
|
||
We could have different rules for when to take a raw reference (as opposed to a | ||
safe one). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC:
should also work. Might be worth it to add a couple of examples that do not use
unsafe
to take a reference to a packed struct field, the RFC does mention below that this is safe.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The RFC mentions this (and a hard error even in unsafe blocks when not using
as
) as future possibilities building on top of this RFC. Are you suggesting they should be explicitly part of this RFC? I was trying to take baby steps here, not sure what is preferred.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RalfJung When the warning was added (Rust 1.24), the warning recommended that the correct way to fix this safe Rust code that compiled, run, and often worked correctly, was to just "use an
unsafe { ... }
block here instead".In Rust 1.29 the warning was changed to "This is always UB" (independently of whether this happens in an
unsafe
block or not).In fact, the RFC correctly argues below that if
&packed.field as *const _
constructs a raw pointer without creating a reference, then the code is always safe.What value does requiring
unsafe
add here?To me it seems that requiring
unsafe
for this is just a historical artifact of a decision that we are trying to revert.Otherwise, it feels like this new MIR construct would only work properly in unsafe code. That makes "the ability to construct a raw pointer without going through a reference" a new unsafe superpower, and leaves open the question whether
&x as *const_
in safe code works differently.I don't know, I think
&packed.field as *const _
should work independently of whether its used in safe or unsafe code. That removes the need of usingunsafe
to create a raw pointer to a packed struct field, but that did not solve any problems anyways.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @gnzlbg that it would make sense for this to be safe. This would basically "fall out", I suppose, if we have a distinct MIR operation for it, since the unsafety checker is presently running after MIR construction.