Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC for an operator to take a raw reference #2582

Merged
merged 24 commits into from
Sep 15, 2019
Merged
Changes from 5 commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
fd4b4cd
RFC for an operator to take a raw reference
RalfJung Nov 1, 2018
e250bed
there is some design space in terms of when and how we emit the new o…
RalfJung Nov 4, 2018
af78d46
more examples by ubsan
RalfJung Nov 4, 2018
1a62d82
coercions to raw ptrs also trigger the new operation
RalfJung Nov 5, 2018
879fa5c
make a lint part of this RFC, clarify when we use a raw ref
RalfJung Nov 20, 2018
0b496ee
expand description: this new operation entirely replaces ref-to-raw c…
RalfJung Dec 9, 2018
0b9df7a
dereferencability
RalfJung Dec 22, 2018
61c2e82
more on dereferencability
RalfJung Dec 22, 2018
5f7e9ea
RFC 2582: Fix typo (must not -> need not)
haslersn Jan 6, 2019
96ddb7c
Merge pull request #1 from haslersn/raw-reference-operator
RalfJung Jan 7, 2019
35b7810
update summary and be more explicit
RalfJung Jan 22, 2019
9a853d5
make example more concrete
RalfJung Jan 23, 2019
6cdac4d
rename to raw_reference_mir_operator
RalfJung Jan 24, 2019
4b60fb1
interaction with auto-deref
RalfJung Jan 26, 2019
315bb31
propose we might have dedicated syntax
RalfJung Feb 21, 2019
cb2dd0f
mention more drawbacks, unresolved questions and future possibilities
RalfJung Feb 21, 2019
1f1e690
half-rewrite RFC to center around a proposed syntax for raw references
RalfJung Mar 21, 2019
8eea0bf
clarify
RalfJung Mar 21, 2019
24aad9e
mention copying around
RalfJung Mar 21, 2019
9889619
adjust lint level open question
RalfJung Mar 30, 2019
6a803f1
move SUGAR to its own subsection under future possibilities
RalfJung Jul 26, 2019
92042f6
editing, and moving some other things to future possibilities
RalfJung Jul 26, 2019
e95df5c
no more ref-to-ptr casts in MIR
RalfJung Jul 26, 2019
9942dad
RFC 2582
Centril Sep 15, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
189 changes: 189 additions & 0 deletions text/0000-raw-reference-operator.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,189 @@
- Feature Name: raw_reference_operator
- Start Date: 2018-11-01
- RFC PR: (leave this empty)
- Rust Issue: (leave this empty)

# Summary
[summary]: #summary

Introduce a new primitive operator on the MIR level: `&[mut|const] raw <place>`
to create a raw pointer to the given place (this is not surface syntax, it is
just how MIR might be printed). Desugar the surface syntax `&[mut] <place> as
*[mut|const] _` to use this operator, instead of two MIR statements (first take
normal reference, then cast).

# Motivation
[motivation]: #motivation

Currently, if one wants to create a raw pointer pointing to something, one has
no choice but to create a reference and immediately cast it to a raw pointer.
The problem with this is that there are some invariants that we want to attach
to references, that have to *always hold*. (This is not finally decided yet,
but true in practice because of annotations we emit to LLVM. It is also the
next topic of discussion in the
[Unsafe Code Guidelines](https://github.com/rust-rfcs/unsafe-code-guidelines/).)
In particular, references must be aligned and dereferencable, even when they are
created and never used.

One consequence of these rules is that it becomes essentially impossible to
create a raw pointer pointing to an unaligned struct field: `&packed.field as
*const _` creates an immediate unaligned reference, triggering undefined
behavior because it is not aligned. Similarly, `&(*raw).field as *const _` is
not just computing an offset of the raw pointer `raw`, it also asserts that the
intermediate shared reference is aligned and dereferencable. In both cases,
that is likely not what the author of the code intended.

To fix this, we propose to introduce a new primitive operation on the MIR level
that, in a single MIR statement, creates a raw pointer to a given place. No
intermediate reference exists, so no invariants have to be adhered to. We also
add a lint for cases that seem like the programmer wanted a raw reference, not a
safe one, but did not use the right syntax.

# Guide-level explanation
[guide-level-explanation]: #guide-level-explanation

When working with unaligned or potentially dangling pointers, it is crucial that
you always use raw pointers and not references: References come with guarantees
that the compiler assumes are always upheld, and these guarantees include proper
alignment and not being dangling. Importantly, these guarantees must be
maintained even when the reference is created and never used! The following is
UB (assuming `packed` is a variable of packed type):

```rust
let x = unsafe { &packed.field }; // `x` is not aligned -> undefined behavior
```

There is no situation in which the above code is correct, and hence it is a hard
error to write this. This applies to most ways of creating a reference, i.e.,
all of the following are UB if `X` is not properly aligned and dereferencable:

```rust
fn foo() -> &T {
&X
}

fn bar(x: &T) {}
bar(&X); // this is UB at the call site, not in `bar`

let &x = &X; // this is actually dereferencing the pointer, certainly UB
let _ = &X; // throwing away the value immediately changes nothing
&X; // different syntax for the same thing

let x = &X as &T as *const T; // this is casting to raw "too late"
```

The only way to create a pointer to an unaligned or dangling location without
triggering undefined behavior is to *immediately* turn it into a raw pointer.
All of the following are valid:

```rust
let packed_cast = unsafe { &packed.field as *const _ };
let packed_coercion: *const _ = unsafe { &packed.field };
let null_cast: *const _ = unsafe { &*ptr::null() } as *const _;
let null_coercion: *const _ = unsafe { &*ptr::null() };
```
Copy link
Contributor

@gnzlbg gnzlbg Nov 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC:

let packed_cast = &packed.field as *const _;

should also work. Might be worth it to add a couple of examples that do not use unsafe to take a reference to a packed struct field, the RFC does mention below that this is safe.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RFC mentions this (and a hard error even in unsafe blocks when not using as) as future possibilities building on top of this RFC. Are you suggesting they should be explicitly part of this RFC? I was trying to take baby steps here, not sure what is preferred.

Copy link
Contributor

@gnzlbg gnzlbg Nov 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RalfJung When the warning was added (Rust 1.24), the warning recommended that the correct way to fix this safe Rust code that compiled, run, and often worked correctly, was to just "use an unsafe { ... } block here instead".

In Rust 1.29 the warning was changed to "This is always UB" (independently of whether this happens in an unsafe block or not).

In fact, the RFC correctly argues below that if &packed.field as *const _ constructs a raw pointer without creating a reference, then the code is always safe.

What value does requiring unsafe add here?

To me it seems that requiring unsafe for this is just a historical artifact of a decision that we are trying to revert.

Otherwise, it feels like this new MIR construct would only work properly in unsafe code. That makes "the ability to construct a raw pointer without going through a reference" a new unsafe superpower, and leaves open the question whether &x as *const_ in safe code works differently.

I don't know, I think &packed.field as *const _ should work independently of whether its used in safe or unsafe code. That removes the need of using unsafe to create a raw pointer to a packed struct field, but that did not solve any problems anyways.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @gnzlbg that it would make sense for this to be safe. This would basically "fall out", I suppose, if we have a distinct MIR operation for it, since the unsafety checker is presently running after MIR construction.


The intention is to cover all cases where a reference, just created, is
immediately explicitly used as a value of raw pointer type.

These two operations (taking a reference, casting/coercing to a raw pointer) are
actually considered a single operation happening in one step, and hence the
invariants incurred by references do not come into play.

# Reference-level explanation
[reference-level-explanation]: #reference-level-explanation

When translating HIR to MIR, we recognize `&[mut] <place> as *[mut|const] ?T`
(where `?T` can be any type, also a partial one like `_`) as well as coercions
from `&[mut] <place>` to a raw pointer type as a special pattern and turn them
into a single MIR `Rvalue` that takes the address and produces it as a raw
pointer -- a "take raw reference" operation. This might be a variant of the
existing `Ref` operation (say, a boolean flag for whether this is raw), or a new
`Rvalue` variant. The borrow checker should do the usual checks on `<place>`,
but can just ignore the result of this operation and the newly created
"reference" can have any lifetime. (Currently this will be some form of
unbounded inference variable because the only use is a cast-to-raw, the new "raw
reference" operation can have the same behavior.) When translating MIR to LLVM,
nothing special has to happen as references and raw pointers have the same LLVM
type anyway; the new operation behaves like `Ref`.

When interpreting MIR in the miri engine, the engine will recognize that the
value produced by this `Rvalue` has raw pointer type, and hence must not satisfy
any special invariants.

When doing unsafety checking, we make references to packed fields that do *not*
use this new "raw reference" operation a *hard error even in unsafe blocks*
(after a transition period). There is no situation in which this code is okay;
it creates a reference that violates basic invariants. Taking a raw reference
to a packed field, on the other hand, is a safe operation as the raw pointer
comes with no special promises. "Unsafety checking" is thus not even a good
term for this, maybe it should be a special pass dedicated to packed fields
traversing MIR, or this can happen when lowering HIR to MIR. This check has
nothing to do with whether we are in an unsafe block or not.
Copy link
Contributor

@gnzlbg gnzlbg Nov 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This compiles without errors on stable Rust today:

#[repr(packed)]
pub struct A {
    x: u8,
    y: u64
}

pub fn foo(mut _x: A) {
    let y = &mut _x.y;
    *y = 3;
}

but the text does not explicitly mention whether this will be an error too. It hints that this will be the case when it says (emphasis mine): "hard error even in unsafe blocks", but it should probably spell this out.

Copy link
Contributor

@arielb1 arielb1 Nov 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it doesn't - this code is UB even on stable.

warning: borrow of packed field is unsafe and requires unsafe function or block (error E0133)
 --> src/lib.rs:8:13
  |
8 |     let y = &mut _x.y;
  |             ^^^^^^^^^
  |
  = note: #[warn(safe_packed_borrows)] on by default
  = warning: this was previously accepted by the compiler but is being phased out; it will become a hard error in a future release!
  = note: for more information, see issue #46043 <https://github.com/rust-lang/rust/issues/46043>
  = note: fields of packed structs might be misaligned: dereferencing a misaligned pointer or even just creating a misaligned reference is undefined behavior

Copy link
Contributor

@gnzlbg gnzlbg Nov 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it doesn't

Yes it does? That's a warning, which is not the same thing as an error. This RFC hints that this warning will be turned into an error, but does not say so explicitly, so I'm left wondering whether this is the case or not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we tell LLVM that references are aligned, so this is already UB and AFAIK has always been.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a future compat warning, as the text quite clearly says. The only thing blocking us from making it an error is resolving this RFC :)

Copy link
Member

@cramertj cramertj Nov 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

his RFC does not turn that warning into an error

This RFC wouldn't make it a compiler error, but it is still UB and will continue to be. In the future, we intend to make it a hard error. It has always been the case that taking a reference to a field that is not properly aligned is UB, so we cannot soundly allow this in safe code.

Copy link
Contributor

@gnzlbg gnzlbg Nov 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This RFC wouldn't make it a compiler error, but it is still UB and will continue to be.

I don't know why you and @arielb1 keep bringing up that this is UB. Who is saying otherwise ?

so we cannot soundly allow this in safe code.

This RFC proposes to make &mut t as *mut T always safe, independently of whether the object t: T is properly aligned or not. Do you disagree with this proposal?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This RFC proposes to make &mut t as *mut T always safe,

It does so by defining this to be a special operation that never creates a reference to t. Creating references to an unaligned value will still be UB. Other ways of writing this code that don't exactly match &mut <expr> as *mut _ would still be errors (currently future-compatibility lints).

Copy link
Member Author

@RalfJung RalfJung Nov 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reading my own RFC text, the changes around what is and is not permitted in safe and unsafe code for taking references to packed fields are actually part of the normative text of the RFC. Should I move them out? I think it makes sense to consider this entire somewhat messy situation together, but I am not sure if the RFC should actually be a decision on all of them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to leave this out and to better focus on this change only.

You can mention that there is an issue open about turning a particular warning into an error, that's currently blocked on having a way to construct a pointer to a field without going through a reference, and that this RFC would solve that blocker. But I wouldn't make the text about that.


Moreover, to prevent programmers from accidentally creating a safe reference
when they did not want to, we add a lint that identifies situations where the
programmer likely wants a raw reference, and suggest an explicit cast in that
case. One possible heuristic here would be: If a safe reference (shared or
mutable) is only ever used to create raw pointers, then likely it could be a raw
pointer to begin with. The details of this are best worked out in the
implementation phase of this RFC.

# Drawbacks
[drawbacks]: #drawbacks

It might be surprising that the following two pieces of code are not equivalent:
```rust
// Variant 1
let x = unsafe { &packed.field }; // Undefined behavior!
let x = x as *const _;
// Variant 2
let x = unsafe { &packed.field as *const _ };
```

# Rationale and alternatives
[rationale-and-alternatives]: #rationale-and-alternatives

This is a compromise: I see no reasonable way to translate the first variant
shown in the "Drawbacks" section to a raw reference operation, and the second
variant is so common that we likely do not want to rule it out. Hence the
proposal to make them not equivalent.

One alternative to introducing a new primitive operation might be to somehow
exempt "references immediately cast to a raw pointer" from the invariant.
However, we believe that the semantics of a MIR program, including whether it
has undefined behavior, should be deducible by executing it one step at a time.
Given that, it is unclear how a semantics that "lazily" checks references should
work, and how it could be compatible with the annotations we emit for LLVM.

Instead of compiling `&[mut] <place> as *[mut|const] ?T` to a raw reference
operation, we could introduce new surface syntax and keep the existing HIR->MIR
lowering the way it is. However, that would make lots of carefully written
existing code dealing with packed structs have undefined behavior. (There is
likely also lots of code that forgets to cast to a raw pointer, but I see no way
to make that legal -- and the proposal would make such uses a hard error in the
long term, so we should catch many of these bugs.) Also, no good proposal for a
surface syntax has been made yet -- and if one comes up later, this proposal is
forwards-compatible with also having explicit syntax for taking a raw reference
(and deprecating the safe-ref-then-cast way of writing this operation).

We could be using the new operator in more cases, e.g. instead of having a smart
lint that tells people to insert casts, we could use that same analysis to infer
when to use a raw reference. This would make more code use raw references, thus
making more code defined. However, if someone *relies* on this behavior there
is a danger of accidentally adding a non-raw-ptr use to a reference, which would
then rather subtly make the program have UB. That's why we proposed this as a
lint instead.

# Prior art
[prior-art]: #prior-art

I am not aware of another language with both comparatively strong invariants for
its reference types, and raw pointers. The need for taking a raw reference only
arise because of Rust having both of these features.

# Unresolved questions
[unresolved-questions]: #unresolved-questions

We could have different rules for when to take a raw reference (as opposed to a
safe one).