Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standard library API for immovable types #2349

Merged
merged 14 commits into from
Mar 18, 2018
381 changes: 381 additions & 0 deletions text/0000-pin_and_move.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,381 @@
- Feature Name: pin_and_move
- Start Date: 2018-02-19
- RFC PR: (leave this empty)
- Rust Issue: (leave this empty)

# Summary
[summary]: #summary

Introduce new APIs to libcore / libstd to serve as safe abstractions for data
which cannot be safely moved around.

# Motivation
[motivation]: #motivation

A longstanding problem for Rust has been dealing with types that should not be
moved. A common motivation for this is when a struct contains a pointer into
its own representation - moving that struct would invalidate that pointer. This
use case has become especially important recently with work on generators.
Because generators essentially reify a stackframe into an object that can be
manipulated in code, it is likely for idiomatic usage of a generator to result
in such a self-referential type, if it is allowed.

This proposal adds an API to std which would allow you to guarantee that a
particular value will never move again, enabling safe APIs that rely on
self-references to exist.

# Explanation
[explanation]: #explanation

## The `Move` auto trait

This new auto trait is added to the `core::marker` and `std::marker` modules:

```rust
pub unsafe auto trait Move { }
```

A type implements `Move` if in its stack representation, it does not contain
internal references to other positions within its stack representation. Nearly
every type in Rust is `Move`.

Positive impls of `Move` are added for types which contain pointers to generic
types, but do not contain those types in their stack representation, e.g:

```rust
unsafe impl<'a, T: ?Sized> Move for &'a T { }
unsafe impl<'a, T: ?Sized> Move for &'a mut T { }
unsafe impl<'a, T: ?Sized> Move for Box<T> { }
unsafe impl<"a, T> Move for Vec<T> { }
// etc
```

This trait is a lang item, but only to generate negative impls for certain
generators. Unlike previous `?Move` proposals, and unlike some traits like
`Sized` and `Copy`, this trait does not impose any particular semantics on
types that do or don't implement it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That can't literally be true, if it had no implications it would be useless. Indeed under unresolved questions the definition "it's safe to convert between &mut T and Pin<T>" is given. Is this paragraph trying to say the trait isn't "language magic" that impacts fundamental rules the compiler checks?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes I meant it doesn't interact with the compiler in any special way. Its just used by Pin to enforce invariants.


## The stability marker traits

A hierarchy of marker traits for smart pointer types is added to `core::marker`
and `std::marker`. These exist to provide a shared language for talking about
the guarantees that different smart pointers provide. This enables both the
kinds of self-referential support we talk about later in this RFC and other
APIs like rental and owning-ref.

### `Own` and `Share`

```rust
unsafe trait Own: Deref { }
unsafe trait Share: Deref + Clone { }
```

These two traits are for smart pointers which implement some form of ownership
construct.

- **Own** implies that this type has unique ownership over the data which it
dereferences to. That is, unless the data is moved out of the smart pointer,
when this pointer is destroyed, so too will that data. Examples of `Own`
types are `Box<T>`, `Vec<T>` and `String`.
- **Share** implies that this type has shared ownership over the data which it
dereferences to. It implies `Clone`, and every type it is `Clone`d it must
Copy link
Member

@tmandry tmandry Feb 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This tripped me up.. s/type/time

continue to refer to the same data; it cannot perform deep clones of that
data. Examples of `Share` types are `Rc<T>` and `Arc<T>`.

These traits are mutually exclusive - it would be a logic error to implement
both of them for a single type. We retain the liberty to assume that no type
ever does implement both - we could upgrade this from a logic error to
undefined behavior, we could make changes that would break any code that
implements both traits for the same type.

### `StableDeref` and `StableDerefMut`

```rust
unsafe trait StableDeref: Deref { }
unsafe trait StableDerefMut: StableDeref + DerefMut { }
```

These two traits are for any pointers which guarantee that the type they
dereference to is at a stable address. That is, moving the pointer does not
move the type being addressed.

- **StableDeref** implies that the referenced data will not move if you move
this type or dereference it *immutably*. Types that implement this include
`Box<T>`, both reference types, `Rc<T>`, `Arc<T>`, `Vec<T>`, and `String`.
Pretty much everything in std that implements `Deref` implements
`StableDeref`.
- **StableDerefMut** implies the same guarantees as `StableDeref`, but also
guarantees that dereferencing a *mutable* reference will not cause the
referenced data to change addresses. Because of this, it also implies
`DerefMut`. Examples of type that implement this include `&mut T`, `Box<T>`,
and `Vec<T>`.

Note that `StableDerefMut` does not imply that taking a mutable reference to
the smart pointer will not cause the referenced data to move. For example,
calling `push` on a `Vec` can cause the slice it dereferences to to change
locations. Its only obtaining a mutable reference to the target data which is
guaranteed not to relocate it.

Note also that this RFC does not propose implementing `StableDerefMut` for
`String`. This is to be forward compatible with the static string optimization,
an optimization which allows values of `&'static str` to be converted to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strike an optimization?

`String` without incurring a heap allocation. A component of this optimization
would cause `String` to allocate when dereferencing to an `&mut str` if the
backing data would otherwise be in rodata.

### Notes on existing ecosystem traits

These traits supplant certain traits in the ecosystem which already provide
similar guarantees. In particular, the [stable_deref_trait][stable-deref]
crate provides a similar bit different hierarchy. The differences are:

- That crate draws no distinction between `StableDeref` and `StableDerefMut`.
This does not leave forward compatibility for the static string optimization
mentioned previously.
- That crate has no equivalent to the `Own` trait, which is necessary for some
APIs using internal references.
- That crate has a `StableDerefClone` type, which is equivalent to the bound
`Share + StableDeref` in our system.

If the hierarchy proposed in this RFC becomes stable, all users are encouraged
to migrate from that crate to the standard library traits.

## The `Pin` type

The `Pin` type is a wrapper around a mutable reference. If the type it
references is `!Move`, the `Pin` type guarantees that the referenced data will
never be moved again. It has a relatively small API. It is added to both
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change to?

Pin has a relatively small API that is added to both std::mem and core::mem.

`std::mem` and `core::mem`.

```rust
struct Pin<'a, T: ?Sized + 'a> {
data: &'a mut T,
}

impl<'a, T: ?Sized + Move> Pin<'a, T> {
pub fn new(data: &'a mut T) -> Pin<'a, T>;
}

impl<'a, T: ?Sized> Pin<'a, T> {
pub unsafe fn new_unchecked(data: &'a mut T) -> Pin<'a, T>;

pub unsafe fn get_mut(this: Pin<'a, T>) -> &'a mut T;

pub fn borrow<'b>(this: &'b mut Pin<'a, T>) -> Pin<'b, T>;
}

impl<'a, T: ?Sized> Deref for Pin<'a, T> {
type Target = T;
}

impl<'a, T: ?Sized + Move> DerefMut for Pin<'a, T> { }
```

For types which implement `Move`, `Pin` is essentially the same as an `&mut T`.
But for types which do not, the conversion between `&mut T` and `Pin` is
unsafe (however, `Pin` can be easily immutably dereferenced, even for `!Move`
types).

The contract on the unsafe part of `Pin`s API is that a Pin cannot be
constructed if the data it references would ever move again, and that it cannot
be converted into a mutable reference if the data might ever be moved out of
that reference. In other words, if you have a `Pin` containing data which does
not implement `Move`, you have a guarantee that that data will never move.

The next two subsections describe safe APIs for constructing a `Pin` of data
which cannot be moved - one in the heap, and one in the stack.

### Pinning to the heap: The `Anchor` type

The `Anchor` wrapper takes a type that implements `StableDeref` and `Own`, and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you mean StableDerefMut, based on the code below

prevents users from moving data out of that unless it implements `Move`. It is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This paragraph is a bit hard to read. Suggestions:

  • s/moving data out of that unless it/moving data out of the anchor unless data/
  • s/It is/Anchor is/

added to `std::mem` and `core::mem`.

```rust
struct Anchor<T> {
ptr: T,
}

impl<T: StableDerefMut + Own> Anchor<T> {
pub fn new(ptr: T) -> Anchor<T>;

pub unsafe fn get_mut(this: &mut Anchor<T>) -> &mut T;

pub unsafe fn into_inner_unchecked(this: Anchor<T>) -> T;

pub fn pin<'a>(this: &'a mut Anchor<T>) -> Pin<'a, T::Target>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you consider making this a method, rather than an associated function taking this? We generally avoid adding methods to Deref types, but in this case, it seems like pin would be called more than any other function on an Anchor type (and if you do call pin, you can then call any &self method, as Pin is also Deref).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, pin seems like a misleading name for this function, since it isn't actually pinning the data anywhere. Anchor::new pins the data to the heap-- pin just gives you a reference to the object as a Pin. Perhaps as_pin would be better?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no commitment on either of these questions!

}

impl<T: StableDerefMut + Own> Anchor<T> where T::Target: Move {
pub fn into_inner(this: Anchor<T>) -> T;
}

impl<T: StableDerefMut + Own> Deref for Anchor<T> {
type Target = T;
}

impl<T: StableDerefMut + Own> DerefMut for Anchor<T> where T::Target: Move { }
```
Copy link
Member

@cramertj cramertj Feb 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be an unsafe impl<T: StableDerefMut + Own> Move for Anchor {}? (or did I miss it?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. It's conceivable that a smart pointer could itself be !Move.

Because the smart pointers themselves implement Move even if their referent doesn't, something like Anchor<Box<T: !Move>> should implement Move.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, so is Anchor going to be #[fundamental] to allow for Move impls for Anchor<MyBox<T>> for custom MyBoxes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should just implement Move for MyBox, T: Move implies Anchor<T>: Move. I don't know how Anchor<T> could be Move if T wasn't, since its just a newtype.

Copy link
Member

@cramertj cramertj Feb 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should just implement Move for MyBox

I can't, even for Box. Box<MyImmovableGenerator> is still !Move.

I don't know how Anchor could be Move if T wasn't, since its just a newtype.

Box<MyImmovableGenerator> is !Move, but Anchor<Box<MyImmovableGenerator>> is Move, right?

In order to make this work, you need something like unsafe impl<T> Move for Anchor<Box<T>> {}. In order for that to be coherent for external crates which define MyBox, you'd need Anchor to be #[fundamental].

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In your code, you call poll on a Pin<'a, Box<impl Generator>>. This relies on there being an impl of Future for Box<impl Generator>. I imagine this is going through the F: Future => Box<F>: Future impl. But that impl hasn't accounted for immovable types, and should be constrained to F: Future + Move => Box<F>: Future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought Move itself only meant "moving this type doesn't move anything that shouldn't be moved" - which is true for smart pointers regardless of their contents? And that it doesn't say anything about what operations on that type, such as deref, might do.

I imagine this is going through the F: Future => Box<F>: Future impl. But that impl hasn't accounted for immovable types, and should be constrained to F: Future + Move => Box<F>: Future.

This confused me for a moment and I think I might've figured it out, but I'll write it down anyway. The plan is that an immovable generator, after combinators and so on had been applied to it, would be put into a single heap allocation (e.g. a Box) to be run. In this case clearly F: !Move, so Box<F>: !Future, which sounds bad. But, it's not a problem because it's going to be run by derefing the Box into an &mut and pinning it, rather than through the Box<F>: Future impl directly?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@glaebhoerl

I thought Move itself only meant "moving this type doesn't move anything that shouldn't be moved"

My understanding is that Move means "functions that require Pin<Self> don't stop this type from being able to move in the future". Box<T> can't be Move without a T: Move bound because you could call poll, move the T out of the box (moving it), and then rebox it (in a different location) and re-poll it. This breaks the rule that T: !Move cannot be polled, moved, and then re-polled.

@withoutboats

In your code, you call poll on a Pin<'a, Box>. This relies on there being an impl of Future for Box. I imagine this is going through the F: Future => Box: Future impl. But that impl hasn't accounted for immovable types, and should be constrained to F: Future + Move => Box: Future.

You're saying that there's no longer an impl<T: Future + ?Sized> Future for Box<T>? I don't understand how that solves the problem. We want to have Anchor<Box<Future>>: Future + Move, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Box<T> can't be Move without a T: Move bound because

It says in the RFC that

unsafe impl<'a, T: ?Sized> Move for Box<T> { }

with the motivation:

Positive impls of Move are added for types which contain pointers to generic types, but do not contain those types in their stack representation

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@withoutboats Can you clarify this point?


Because `Anchor` implements `StableDeref` and `Own`, and it is not safe to get
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/StableDeref/StableDerefMut/

an `&mut T` if the target of `T ` does not implement `Move`, an anchor
guarantees that the target of `T` will never move again. This satisfies the
safety constraints of `Pin`, allowing a user to construct a `Pin` from an
anchored pointer.

Because the data is anchored into the heap, you can move the anchor around
without moving the data itself. This makes anchor a very flexible way to handle
immovable data, at the cost of a heap allocation.

An example use:

```rust
let mut anchor = Anchor::new(Box::new(immovable_data));
let pin = Anchor::pin(&mut anchor);
```

### Pinning to the stack: `Pin::stack` and `pinned`

Data can also be pinned to the stack. This avoids the heap allocation, but the
pin must not outlive the data being pinned, and the API is less convenient.

First, the pinned function, added to `std::mem` and `core::mem`:

```rust
pub struct StackPinned<'a, T: ?Sized> {
_marker: PhantomData<&'a mut &'a ()>,
data: T
}

pub fn pinned<'a, T>(data: T) -> StackPinned<'a, T> {
StackPinned {
data, _marker: PhantomData,
}
}
```

Second, this constructor is added to `Pin`:

```rust
impl<'a, T: ?Sized> Pin<'a, T> {
pub fn stack(data: &'a mut StackPinned<'a, T>) -> Pin<'a, T>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar question to above: what's the rationale for having this be a function on Pin rather than a method on StackPinned? When RFC 66 is fully implemented, we could write mem::pinned(data).stack(), which looks nicer to me than the two-liner below.

Copy link
Member

@cramertj cramertj Feb 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: the names of the functions are a bit misleaing here, too. pinned gives you a StackPinned, which isn't actually pinned to the stack-- you're free to write Box::new(mem::pinned(data)). What it does do is move your T and tie it to an invariant lifetime 'a. The stack function is what actually prevents the StackPinned from ever being moved out of its current location-- it's the thing that actually does the pinning.

I don't have any great suggestions here. Maybe LifetimePinned instead of StackPinned?

It's unfortunate how much of the complexity in these APIs winds up being passed on to the end user, especially including all of the lifetime annotations. It might be that an inbuilt &pin reference (which binds to the stack for !Move types, and is equal to &mut for Move types) would be less scary. This does increase the complexity of Rust as a language, but I think it might be a net decrease in the complexity of learning to use these APIs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unfortunate how much of the complexity in these APIs winds up being passed on to the end user, especially including all of the lifetime annotations. It might be that an inbuilt &pin reference (which binds to the stack for !Move types, and is equal to &mut for Move types) would be less scary.

Importantly, the library version should be forward compatible with the language version.

(I also think stack pinning is a niche use case compared to heap pinning though).

Copy link
Member

@cramertj cramertj Feb 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I also think stack pinning is a niche use case compared to heap pinning though).

I agree-- I think that &pin could allow the heap-pinning case to be more ergonomic: rather than my_anchored_boxed_future.as_pin().poll(cx), &pin could allow writing my_anchored_boxed_future.poll(cx), as Future::poll would take &pin self, rather than self: Pin<'a, Self>.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true. This RFC should be forward compatible with doing that someday by just changing Pin to this:

type Pin<'a, T> = &'a pin T;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than just renaming StackPinned to LifetimePinned, I think the *Pinned type is the one that should be named Pin, and the Pin one should be named *Pinned.

You put something in a Pin which you can then move, and Pin::pin then return a Pinned pointing to a value that can't be moved anymore.

I hope that makes sense.

}
```

Because the lifetime of the `StackPinned` and the lifetime of the reference to
it are bound together, the StackPinned wrapper is functionally moved (and with
it the data inside it) into the Pin. Thus, even though the data is allocated on
the stack, it is pinned to its location for the remainder of its scope.

```rust
let mut data = mem::pinned(immovable_data);
let pin = Pin::stack(&mut data);
```

## Immovable generators

Today, the unstable generators feature has an option to create generators which
contain references that live across yield points - these are, in effect,
internal references into the generator's state machine. Because internal
references are invalidated if the type is moved, these kinds of generators
("immovable generators") are currently unsafe to create.

Once the arbitrary_self_types feature becomes object safe, we will make three
changes to the generator API:

1. We will change the `resume` method to take self by `self: Pin<Self>` instead
of `&mut self`.
2. We will implement `!Move` for the anonymous type of an immovable generator.
3. We will make it safe to define an immovable generator.

This is an example of how the APIs in this RFC allow for self-referential data
types to be created safely.

## Stabilization planning

This RFC proposes a large API addition, and so it is broken down into four
separate feature flags, which can be stabilized in stages:

1. `stable_deref` - to control the smart pointer trait hierarchy - StableDeref,
StableDerefMut, Own, and Share.
2. `pin_and_move` - to control the `Move` auto trait and the `Pin` type. These
two components only make sense working together.
3. `anchor` - to control the `Anchor` struct, pinning to the heap.
4. `stack_pinning` - to control the APIs related to stack pinning.

# Drawbacks
[drawbacks]: #drawbacks

This adds additional APIs to std, including several marker traits and an auto
trait. Such additions should not be taken lightly, and only included if they
are well-justified by the abstractions they express.

# Rationale and alternatives
[alternatives]: #alternatives

## Comparison to `?Move`

One previous proposal was to add a built-in `Move` trait, similar to `Sized`. A
type that did not implement `Move` could not be moved after it had been
referenced.

This solution had some problems. First, the `?Move` bound ended up "infecting"
many different APIs where it wasn't relevant, and introduced a breaking change
in several cases where the API bound changed in a non-backwards compatible way.

In a certain sense, this proposal is a much more narrowly scoped version of
`?Move`. With `?Move`, *any* reference could act as the "Pin" reference does
here. However, because of this flexibility, the negative consequences of having
a type that can't be moved had a much broader impact.

Instead, we require APIs to opt into supporting immovability (a niche case) by
operating with the `Pin` type, avoiding "infecting" the basic reference type
with concerns around immovable types.

## Comparison to using `unsafe` APIs

Another alternative we've considered was to just have the APIs which require
immovability be `unsafe`. It would be up to the users of these APIs to review
and guarantee that they never moved the self-referential types. For example,
generator would look like this:

```rust
trait Generator {
type Yield;
type Return;

unsafe fn resume(&mut self) -> CoResult<Self::Yield, Self::Return>;
}
```

This would require no extensions to the standard library, but would place the
burden on every user who wants to call resume to guarantee (at the risk of
memory insafety) that their types were not moved, or that they were moveable.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/insafety/unsafety

This seemed like a worse trade off than adding these APIs.

## Relationship to owning-ref & rental

Existing crates like owning-ref and rental make some use of "self-referential"
types. Unlike the generators this RFC is designed to support, their references
always point into the heap - making it acceptable to move their types around.

However, some of this infrastructure is still useful to those crates. In
particular, the stable deref hierarchy is related to the existing hierarchy in
the stable_deref crate, which those other crates depend on. By uplifting those
markers into the standard library, we create a shared, endorsed, and guaranteed
set of markers for the invariants those libraries care about.

In order to be implemented in safe code, those library need additional features
connecting to "existential" or "generative" lifetimes. These language changes
are out of scope for this RFC.

# Unresolved questions
[unresolved]: #unresolved-questions

The names used in this RFC are all entirely up for debate. Some of the items
introduced (especially the `Move` trait) have evolved away from their original
design, making the names a bit of a misnomer (`Move` really means that its safe
to convert between `Pin<T>` and `&mut T`, for example). We want to make sure we
have adequate names before stabilizing these APIs.

[stable-deref]: https://crates.io/crates/stable_deref_trait