Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decide on the validity invariant of integers, floats, bool, thin raw pointers, and char #439

Closed
RalfJung opened this issue Aug 4, 2023 · 9 comments

Comments

@RalfJung
Copy link
Member

RalfJung commented Aug 4, 2023

We have closed various issues discussing validity invariants for simple types (integers, float, bool, char, thin raw pointers). I'd like to have somewhere to point for team consensus, such as an FCP in this issue. :)

We decide

that the validity invariants are

  • integers, float, thin raw pointers, and str need to be initialized
  • bool needs to be 0 or 1
  • char needs to be in 0..0xD800 or 0xE000..0x110000

Transmuting any provenance-free input that satisfies the above requirements is definitely allowed. In particular, integers can be transmuted to raw pointers without causing immediate UB. What can be done with those pointers in terms of memory accesses is a different question and not answered here.

We do not decide what happens when the input has provenance. This is tracked here. In particular, values such as &0 (that have provenance) might or might not be legal to transmute to integers.

Rationale

  • For the types with restricted range, we are using those ranges as niches for enum layout optimizations. bool and char have the same validity and safety invariant, which makes these types simpler to think about. char can also be exploited by unicode algorithms, at least in principle.
  • Disallowing uninitialized values in integers is a prerequisite for optimizations that need integers to have a "stable" value (in LLVM terms: it lets us set noundef). For int, float, and thin raw pointers this choice also aligns the safety and validity invariant.
  • str is intended to behave like [u8] when it comes to language UB, so its validity invariant is made consistent with that of integers.

Examples

The following pieces of code cause UB (as in, the UB arises when executing the code, not just potentially later):

let _val: i32 = MaybeUninit::uninit().assume_init();
let _val: bool = mem::transmute(2u8);

The following pieces of code are well-defined:

let val: bool = mem::transmute(1u8);

The following is not decided by this FCP:

let ptr = &0i32;
let ptr_to_ptr = addr_of!(ptr).cast::<usize>();
ptr_to_ptr.read(); // pointer-to-integer transmutation -- UB or not?

The following functions are sound (as in, safe code invoking these functions can never have UB):

fn to_bool(x: u8) -> Option<bool> {
  if x < 2 { Some(mem::transmute(x)) } else { None }
}
fn from_bool(b: bool) -> u8 {
  mem::transmute(b)
}
fn check_bool(b: bool) {
  to_bool(from_bool(b)).unwrap_unchecked();
}

fn to_char(x: u32) -> Option<char> {
  if (0..0xD800).contains(&x) || (0xE000..0x110000).contains(&x)  {
    Some(mem::transmute(x))
  } else {
    None
  }
}
fn from_char(c: char) -> u32 {
  mem::transmute(c)
}
fn check_char(c: char) {
  to_char(from_char(c)).unwrap_unchecked();
}

fn to_ptr<T>(x: usize) -> *const T {
    mem::transmute(x)
    // We don't decie here what may be done with this pointer,
    // but the transmute itself is fine and since safe code
    // can't do anything with raw pointers, the function is even
    // sound.
}

Prior discussion

@RalfJung

This comment was marked as outdated.

@Lokathor

This comment was marked as resolved.

@RalfJung

This comment was marked as resolved.

@RalfJung

This comment was marked as resolved.

@RalfJung
Copy link
Member Author

RalfJung commented Aug 5, 2023

Now again with T-opsem label...
@rfcbot merge

@rfcbot
Copy link
Collaborator

rfcbot commented Aug 5, 2023

Team member @RalfJung has proposed to merge this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot
Copy link
Collaborator

rfcbot commented Aug 7, 2023

🔔 This is now entering its final comment period, as per the review above. 🔔

@rfcbot
Copy link
Collaborator

rfcbot commented Aug 17, 2023

The final comment period, with a disposition to merge, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

This will be merged soon.

@RalfJung
Copy link
Member Author

I added this to https://github.com/rust-lang/opsem-team/blob/main/fcps.md, the issue can be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants