Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Representation of Rust references (&T, &mut T) and raw pointers (*const T, *mut T`) #16

Closed
nikomatsakis opened this issue Aug 30, 2018 · 56 comments
Assignees
Labels
A-layout Topic: Related to data structure layout (`#[repr]`) S-writeup-assigned Status: Ready for a writeup and someone is assigned to it

Comments

@nikomatsakis
Copy link
Contributor

nikomatsakis commented Aug 30, 2018

Representation of Rust references:

  • Are &T and &mut T guaranteed to be a pointer?
  • Must always be aligned, non-null
  • Guaranteed to be ABI compatible with C pointer types ("in every way?")
    • presuming that the referent is compatible, of course

Representation of raw pointers:

  • Guaranteed to be ABI compatible with C pointer types
    • presuming that the referent is compatible, of course

Other factors:

  • Considerations for storing things in the low bits of pointers
    • safe with raw pointers, not safe with references (from this comment)
@nikomatsakis nikomatsakis added the A-layout Topic: Related to data structure layout (`#[repr]`) label Aug 30, 2018
@nikomatsakis nikomatsakis changed the title Representation of Rust references (&T, &mut T) Representation of Rust references (&T, &mut T) and raw pointers (*const T, *mut T`) Aug 30, 2018
@RalfJung
Copy link
Member

RalfJung commented Aug 30, 2018

This is related to Stacked Borrows: References might, at least abstractly, have extra metadata attached to them, making transmutes to/from raw pointers potentially non-trivial to handle.

@joshtriplett
Copy link
Member

  • Considerations for storing things in the low bits of pointers (likely answer: safe with raw pointers, not safe with references).

@RalfJung
Copy link
Member

likely answer: safe with raw pointers, not safe with references

Given that we are passing aligned attributes to LLVM, I think this is the only possible answer (unless we want to change which attributes we emit).

@hanna-kruppe
Copy link

Even if we got rid of aligned on function attributes and return values, tagged references also can't be marked as dereferencable which would probably be a bigger loss.

@avadacatavra avadacatavra added A-layout Topic: Related to data structure layout (`#[repr]`) active discussion topic and removed A-layout Topic: Related to data structure layout (`#[repr]`) labels Aug 31, 2018
@alercah
Copy link

alercah commented Sep 8, 2018

I agree that the low bits of a pointer, but not a reference, can be used for storage.

That said, I would like it to be possible to write a type which does store values in the low bits that has an interface to safe Rust basically equivalent in functionality to a (&[mut] T, E) where E is some fieldless enum with few enough values to fit in the low bits. I think that if we can store values in the low bits of raw pointers, it works out, but we should double-check.

@joshtriplett
Copy link
Member

I do agree that we could teach the compiler to internally treat the pointer values of references as having holes.

@nikomatsakis
Copy link
Contributor Author

nikomatsakis commented Sep 10, 2018

I do agree that we could teach the compiler to internally treat the pointer values of references as having holes.

This is trickier than it sounds. The problem is that you can create a &mut &T value — in that case, we would not be able to simply store the value, we'd have to work hard to preserve any low bits (as well as masking them off on a load, potentially). That would potentially affect all loads and stores since you never know if the &mut points into an enum or not.

In general, to do more advanced optimizations of this kind, it would be useful to be able to mark an enum as not permitting "ref bindings", which sidesteps the problem.

@avadacatavra avadacatavra added the S-writeup-assigned Status: Ready for a writeup and someone is assigned to it label Sep 13, 2018
@avadacatavra avadacatavra self-assigned this Sep 13, 2018
@gnzlbg
Copy link
Contributor

gnzlbg commented Sep 20, 2018

Does the "&T is just a pointer" also applies to unsized Ts ?

I ran into the following example in the wild today:

pub struct Wrapper([i32]);

impl Wrapper {
    pub fn new(x: &[i32]) -> &Self { unsafe { mem::transmute(x) } }
}

Somebody suggested that making Wrapper #[repr(transparent)] could help, but on discord nobody appears to know whether this should even work.


EDIT: I get the feeling that this is only for references to sized types. Is there a different issue for the layout of trait objects, DSTs, etc. or is it fine to discuss that here?

@alercah
Copy link

alercah commented Sep 20, 2018

I think that #[repr(transparent)] should work. It guarantees identical ABI, and that ought to include compatible pointers.

@gnzlbg
Copy link
Contributor

gnzlbg commented Sep 20, 2018

@alercah, @ubsan explained on discord that pointers to unsized types are (or can be) larger than normal pointers (I did not know this). For example, in a pointer to a slice, like *const [T], the slice length is part of the pointer, and casting that pointer to a *const T discards this meta-data (one cannot cast a *const T to a *const [T]).

So in the example above, the following should work:

#[repr(transparent)]
pub struct Wrapper([i32]);

impl Wrapper {
    pub fn new(x: &[i32]) -> &Self { 
        unsafe { 
            &*(x as *const [i32] as *const Self) 
        } 
    }
}

Since there is no need to use the mem::transmute, whether it ought to work properly w/o repr(transparent) for the pointers and/or references to the unsized types might not be very important (I don't know).

@RalfJung
Copy link
Member

RalfJung commented Oct 5, 2018

Does the "&T is just a pointer" also applies to unsized Ts ?

More generally, I think it'd be interesting to discuss what we want to say about fat pointers. For example (and I am not sure whether that is more relevant for data layout or validity discussions), I think that even raw pointers must have valid metadata: For slices, it must be an integer (not uninitialized memory) such that the slice is not too big, and for trait objects it must be an actual vtable.

@strega-nil
Copy link

strega-nil commented Oct 11, 2018

From experience, it seems that it'd be really nice if we defined the ABI for pointers to DSTs. I would love it if we had that pointers to trait objects were exactly

struct TraitObject {
  void* object;
  TraitVtable* vt;
};

and pointers to slice were exactly

struct Slice {
  T* ptr;
  uintptr_t len;
}

for purposes of ABI (extern "C" functions, especially)

@gnzlbg
Copy link
Contributor

gnzlbg commented Oct 11, 2018

@ubsan

From experience, it seems that it'd be really nice if we defined the ABI for pointers to DSTs.

I have no thoughts on this issue, but there is a lot of a background and previous discussions about this. It took me a while to go through it, the following is in more or less chronological form:

From the point-of-view of the unsafe code guidelines, it is maybe necessary to know where the custom DST story is going to be able to state what can / cannot be guaranteed about the layout of DSTs in general. It would be a bad idea to not guarantee anything at all about their layout, but it would be equally bad to guarantee something that prevents us from getting good custom DSTs in the future. It might be worth it to re-evaluate whether the priority/cost balance for the custom DSTs has changed, and whether it is something that might be worth prioritizing after Rust2018 is released, and maybe incorporate the unsafe code guidelines "motivation" into a revised version of the RFC.

@gnzlbg
Copy link
Contributor

gnzlbg commented Oct 11, 2018

@RalfJung

For example (and I am not sure whether that is more relevant for data layout or validity discussions), I think that even raw pointers must have valid metadata

I would prefer if "fat pointers" were just "sugar" for structs, unions, etc. containing pointers, so that their layout just follows from the layout for those types. The only thing we would then have to fix here, or explicitly decide not to fix, is their particular layout.

We can fix the rest once we start discussing validity. E.g., slice::is_empty(&self) might only read the length, but depending on whether &self requires all members of the struct/active union variant to be valid, whether the pointer is undef or not might be ok or not. This has nothing to do with layout though.

@RalfJung
Copy link
Member

@gnzlbg That also seems like a very reasonable approach. @nikomatsakis expressed similar preference.

@nikomatsakis nikomatsakis added S-writeup-needed Status: Ready for a writeup and no one is assigned and removed S-writeup-assigned Status: Ready for a writeup and someone is assigned to it labels Oct 11, 2018
@asajeffrey
Copy link

For fat pointers, do we want to mandate a particular representation, or are we good with just saying they're two words long, and the first word is a thin pointer? (so not specifying anything about the rest other than it being one word)

@RalfJung
Copy link
Member

I think with custom DSTs the rest might be differently sized even?

@asajeffrey
Copy link

@RalfJung but if we can't even say anything about the size, then we're back to the (what I believe) is the current spec, that fat pointers guarantee that the first word is a thin pointer, and nothing else.

@RalfJung
Copy link
Member

Seems fair to me?

@asajeffrey
Copy link

Another question... what do we want to say about &T vs *T? For non-ZST T, is a member of &T a non-zero multiple of the word size? What do we want to say about the case of ZST T? Are we guaranteeing that &T only has one element? Is it a multiple of the word size?

@strega-nil
Copy link

strega-nil commented Nov 16, 2018

@asajeffrey & [mut] α has exactly the same representation as * [const|mut] α, has the exact same representation as Option<& [mut] α> -- when I write , I mean "any of these three cases".

∀α : Sized, *α has the same representation as *(). For α: !Sized, 's representation is currently not guaranteed - although I think that the existing pointers to DST should be represented as:

#[repr(C)]
struct *[T] {
  ptr: *(),
  meta: usize,
}  

#[repr(C)]
struct *dyn Trait {
  ptr: *(),
  meta: *(),
}

@asajeffrey
Copy link

@ubsan I'm not sure what you mean by "the same representation". Isn't &T required to be non-zero and T-aligned?

Annoyingly, &T doesn't have the same representation as &() since () is a ZST. It is the same as &u8 though. We may need to add some discussion about ZSTs here, sigh.

I added a note about (e.g.) &[T] being the same as the C repr of (&u8, usize). It's slightly clunky because we don't really have the ability to talk about the C repr of a tuple, the alternative would be to define a struct just so we can put repr(C) on it.

@hanna-kruppe
Copy link

Annoyingly, &T doesn't have the same representation as &() since () is a ZST. It is the same as &u8 though. We may need to add some discussion about ZSTs here, sigh.

What do you mean? References to ZSTs are still pointer-sized and ZSTs' alignment requirements matter.

we don't really have the ability to talk about the C repr of a tuple

There's no such thing. Types have one repr, period, and for tuples that is not the same as a repr(C) struct.

@asajeffrey
Copy link

@rkruppe Depends what we want to say about &(), in particular do we want to say that it's a singleton type? Or do we want to allow &() to be inhabited by any valid pointer?

@hanna-kruppe
Copy link

For starters, we are talking about representation here, not validity, and that &ZST is pointer sized is not just the overwhelming status quo, it's also recently been explicitly confirmed in the closing of RFC PR 2040.

Moreover, even for the validity question, allowing only a single address as reference to ZSTs is not tenable IMO due to how much existing unsafe code uses ZSTs in place of extern types.

@asajeffrey
Copy link

Yes, it's not whether it's pointer sized that's the issue, it's whether any non-zero value is a valid representation of a &(). If there's a lot of code in the wild that uses ZSTs for external types, then it sounds like we have to accept that &() isn't a singleton type.

@strega-nil
Copy link

@asajeffrey object representation and whether specific representations are valid, are very different concerns. *mut () has exactly the same object representation as &T, it just has more values that are valid.

@asajeffrey
Copy link

@ubsan Ah, perhaps we have a terminological issue here, about what is part of "validity" vs what is part of "representation". I am including things like being non-zero as part of representation, and you are thinking of it as part of validity?

@strega-nil
Copy link

strega-nil commented Nov 16, 2018

Correct. Non-zero is not part of representation, at least how we define representation - representation is the list of bits which map to a value; i.e., 15u8 maps to the representation [0, 0, 0, 0, 1, 1, 1, 1]. To say that two types α and β have the same representation, is to say that ∀a : α, ∀ b : β, a ~ b → obj_repr(a) = obj_repr(b) (where a ~ b is some equivalence relation that's defined for the types α and β).

For example, i8 and u8 have the same representation, with ~ defined as x ~ y ≡ x = y (mod 256)

@asajeffrey
Copy link

@ubsan: I'm a bit lost with that defn, if ~ is a relation between α and β then what does it mean for it to be an equivalence? Doesn't that require reflexivity and transitivity, which only really make sense when α and β are the same type?

I was thinking of representation as being the set { bs | ∃ a : α . obj_repr(a) = bs }, so &T and *T don't have the same representation, since *T contains 0 and &T doesn't.

@strega-nil
Copy link

strega-nil commented Nov 16, 2018

@asajeffrey transitivity and reflexivity both make sense when α ≠ β.

You create a morphism (f : α → β, f' : β → α) between α and β, and then you have

reflexive: ∀a : α, ∀ b : β, f(a) defined, f'(b) defined, a ~ b ⇒ f(a) = b ∧ f'(b) = a

transitive: ∀a, a' : α, ∀ b, b' : β, a ~ b ∧ a = a' ∧ b = b' ⇒ a' ~ b ∧ a ~ b'

symmetric: ∀ a : α, ∀ b : β, a ~ b ⇒ f(a) = b

Also, that's not the common way of defining representation. That's the set of values.

For example:

The equivalence ~ between &T and *const T is as follows:

f(a : &T) -> *const T {
  a
}
f'(b : *const T) -> &T {
  if (b is valid) {
    &*b
  } else {
    undefined
  }
}

~ (a : &T, b : *const T) -> bool {
  f(a) == b
}

since bit_repr(a) = bit_repr(b) ∀a : &T, ∀b : *const T s.t. a ~ b, this implies that the representation of &T is the same as the representation of *const T.

Note that this definition is unnecessarily strict, since we can't define a mapping from &T to &mut T. It's likely that you'd want a mapping to a third type (which might be equal to one of the two types) which can represent all values from each type.

@asajeffrey
Copy link

@ubsan hmm, that's not the definition of type-indexed equivalence that I'm used to, which is either the groupoid model, or more recently HoTT, but we are now getting seriously off-topic!

@asajeffrey
Copy link

I updated the draft at https://github.com/asajeffrey/unsafe-code-guidelines/blob/repr-pointers/reference/src/representation/pointers.md, to remove the specialness of ZST, and to add some questions:

  • Is the metadata in a fat pointer guaranteed to be word-aligned? Nonzero? Even if the trait has no methods?
  • Is the pointer in a slice guaranteed to be non-zero and aligned? Even if the slice is empty?
  • Should the defn of representation talk about being non-zero? Being a multiple of the alignment? (Compare with the discussions of bool, where the representations are 0 and 1.)

@strega-nil
Copy link

  1. metadata for dyn Trait must contain size_of_val, and align_of_val.

  2. if you have a non-nullable pointer to a slice, that pointer is guaranteed to be non-null.

  3. probably not. You have a set of defined values which have a defined object representation. Those object representations must be representable in the definition of the representation, but the representation may contain values which do not map to a value of the type.

@strega-nil
Copy link

strega-nil commented Nov 16, 2018

An example of the latter is again, bool. There are two values for bool, {true, false}, with object representations

obj_repr[α] : α → Vect byte (sizeof α)

obj_repr[bool] : bool → Vect byte 1
obj_repr[bool](true) = [0x1]
obj_repr[bool](false) = [0x0]

However, if we like, we can then say that the representation of bool is equivalent to the representation of u8, since they're both a single byte; this makes sense under the equivalence true ~ 1, false ~ 0. It's important to be able to discuss these equivalences without worrying about the undefined values, imo.

@asajeffrey
Copy link

@usban:

  1. OK, so the metadata pointer is guaranteed to be non-zero and word-aligned.
  2. What about alignment? For example, if we declare `let x: [usize; 0] = []; let p = &x[,..];', is &x (and therefore the first word of p) guaranteed to be word-aligned?
  3. Hmm, so in your defn 00000011 is a representation of a bool? That's not my use of the word "representation". It would be nice to hear from others on this. @RalfJung? @nikomatsakis? Would you say u8 and bool have the same representation?

@strega-nil
Copy link

strega-nil commented Nov 16, 2018

@asajeffrey

  1. there is no "metadata pointer". In a vtable, the pointer that is the metadata is guaranteed to be a vtable, and not much else.

  2. the pointer subobject of an &[T] is guaranteed to be aligned, yes.

  3. 3 is a list of bits that fits in a bool. It is an object representation that does not map to any value of type bool, however.

@gnzlbg
Copy link
Contributor

gnzlbg commented Nov 18, 2018

Is the pointer in a slice guaranteed to be non-zero and aligned? Even if the slice is empty?

References are always properly aligned independently of whether they point to a ZST or not. I think it would be unnecessarily complicated to add an exception for DSTs.

That means that the pointer in the &[T] is NonNull and points to a properly aligned [T], even when [T] is empty.

Also, I think it is worth it to guarantee that Option<&DST> has the same size as &DST. Since DSTs have () as metadata, that requires the pointer in the &DST to be NonNull for these DSTs. Does not requiring this in general give any advantages? I don't see any.

EDIT: this is not the case for *const [T], I'd expect *const [T] to support unaligned pointers to slices independently of whether these are empty or not - one would just need to use the _unaligned pointer read/write methods to deal with unaligned ones, but this is not different than for any other pointer.

@RalfJung
Copy link
Member

I don't have many thoughts along @ubsan's "representations" as that's not how I usually think about specifying low-level languages. It presumes the existence of some "higher-level representation" of data, higher-level than "sequence of bytes/bits", and I am not convinced that is something we need for all types. (To be more precise, I think it is useful for some types, like when defining arithmetic, but e.g. unnecessary for compound types.) There is some overlap with validity in the sense that the set of valid bit sequences for a type would likely coincide with the set of bit sequences that "map to a representation of the type".

However, I don't think any of that is even needed to have this discussion. This is the layout discussion ("representation" as in repr, which is how to compute TyLayout in rustc, and from my understanding has nothing to do with how @ubsan used the term above). I think the best way to proceed, if we want to give very concrete guarantees, is to define Rust types that are layout-equivalent to fat pointers:

// `&?mut [T]` is like
#[repr(C)]
struct Slice {
  ptr: &?mut T, // with raw slices, this is a raw ptr
  len: usize,
}

// `&?mut dyn Trait` is like
#[repr(C)]
struct DynObject {
  data: &?mut T, // with raw dyn objects, this is a raw ptr
  vtable: &usize, // with raw dyn objects, this is a raw ptr
}

This is pretty much exactly what @ubsan wrote above. I used reference types to indicate non-nullness and alignedness, though that is already kind-of a validity invariant topic.

@gnzlbg
I guess Option<&DST> (for slices and dyn objects, i.e., only for DST that already exist!) is worth a special case because we want to guarantee very little about enum optimizations, but likely want to guarantee this?

Is there anything to say other than giving equivalent struct layouts and guaranteeing the Option optimization the same way as we do for thin references? Everything else discussed here, about stuff being aligned etc, is not a layout question, it is a "which sequences of bits are valid for this type" question.

@gnzlbg
Copy link
Contributor

gnzlbg commented Nov 20, 2018

Is there anything to say other than giving equivalent struct layouts and guaranteeing the Option optimization the same way as we do for thin references?

No, I don't think so.

Maybe this is something worth repeating in a note, but from the little that we guarantee about enum optimizations, and the difference in niches between &DST and *[const,mut] DST it should already follow that Option<&DST> has the same size as &DST but that Option<*[const,mut] DST> is larger than *[const,mut] DST.

@asajeffrey
Copy link

I updated the draft at https://github.com/asajeffrey/unsafe-code-guidelines/blob/repr-pointers/reference/src/representation/pointers.md, to make "same representation as" the defn of the representation of &Trait and &[T], following @RalfJung's suggestion. AFAIK the remaining point is whether the representation of &T to be a multiple of T's alignment. (The question being whether alignment is part of representation or a validity invariant.)

@asajeffrey
Copy link

Oh, something we might want to add... the representation of None: Option<&T> is the same as ptr::null(): *const T.

@RalfJung RalfJung added S-writeup-assigned Status: Ready for a writeup and someone is assigned to it and removed S-writeup-needed Status: Ready for a writeup and no one is assigned labels Nov 29, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-layout Topic: Related to data structure layout (`#[repr]`) S-writeup-assigned Status: Ready for a writeup and someone is assigned to it
Projects
None yet
Development

No branches or pull requests

9 participants