Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

document ABI compatibility #115476

Merged
merged 4 commits into from
Nov 17, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 4 additions & 3 deletions library/core/src/option.rs
Original file line number Diff line number Diff line change
Expand Up @@ -119,27 +119,28 @@
//! # Representation
//!
//! Rust guarantees to optimize the following types `T` such that
//! [`Option<T>`] has the same size and alignment as `T`. In some
//! [`Option<T>`] has the same size, alignment, and [function call ABI] as `T`. In some
//! of these cases, Rust further guarantees that
//! `transmute::<_, Option<T>>([0u8; size_of::<T>()])` is sound and
//! produces `Option::<T>::None`. These cases are identified by the
//! second column:
//!
//! | `T` | `transmute::<_, Option<T>>([0u8; size_of::<T>()])` sound? |
//! |---------------------------------------------------------------------|----------------------------------------------------------------------|
//! | [`Box<U>`] | when `U: Sized` |
//! | [`Box<U>`] (specifically, only `Box<U, Global>`) | when `U: Sized` |
//! | `&U` | when `U: Sized` |
//! | `&mut U` | when `U: Sized` |
//! | `fn`, `extern "C" fn`[^extern_fn] | always |
//! | [`num::NonZero*`] | always |
//! | [`ptr::NonNull<U>`] | when `U: Sized` |
//! | `#[repr(transparent)]` struct around one of the types in this list. | when it holds for the inner type |
//!
//! [^extern_fn]: this remains true for any other ABI: `extern "abi" fn` (_e.g._, `extern "system" fn`)
//! [^extern_fn]: this remains true for any argument/return types and any other ABI: `extern "abi" fn` (_e.g._, `extern "system" fn`)
//!
//! [`Box<U>`]: ../../std/boxed/struct.Box.html
//! [`num::NonZero*`]: crate::num
//! [`ptr::NonNull<U>`]: crate::ptr::NonNull
//! [function call ABI]: ../primitive.fn.html#abi-compatibility
//!
//! This is called the "null pointer optimization" or NPO.
//!
Expand Down
106 changes: 105 additions & 1 deletion library/core/src/primitive_docs.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1480,7 +1480,7 @@ mod prim_ref {}
///
/// ### Casting to and from integers
///
/// You cast function pointers directly to integers:
/// You can cast function pointers directly to integers:
///
/// ```rust
/// let fnptr: fn(i32) -> i32 = |x| x+2;
Expand All @@ -1506,6 +1506,110 @@ mod prim_ref {}
/// Note that all of this is not portable to platforms where function pointers and data pointers
/// have different sizes.
///
/// ### ABI compatibility
///
/// Generally, when a function is declared with one signature and called via a function pointer with
RalfJung marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It isn't just about function pointers. If a function is declared external "ABI" in Rust then the called function must have that ABI. Or if the function is declared just extern or extern "C" then it must have the default ABI (C compilers let you override the ABI of a function, or the function may be written in assembly language specifically for one calling convention).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do we document and guarantee what one has to do when linking multiple Rust objects together? Do we even support that with extern "Rust" functions?

We have to put these docs somewhere, and function pointers are the only way to trigger these issues inside the language (without exotic things such as dlopen or manually linking things together), so I figured this would make sense. If you can think of a better place where we can put this, I can move it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand from seeing other issues why you see "inside the language" as a useful initial scope because that's what seems to matter for the SIMD stuff. I don't object to that.

IME people are much more likely to run into ABI issues in cross-language situations since there are no guardrails at all, so I hope we at least are open to solving the issue for cross-language cases too. Perhaps that means adding similar language to the documentation of extern and then factoring out the commonality to some top-level section of the language reference, in a future PR.

Copy link
Member Author

@RalfJung RalfJung Nov 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IME people are much more likely to run into ABI issues in cross-language situations

Fully agreed. It's also a much less defined space, you basically have to define "which C type and ABI is compatible with which Rust type and ABI" (and then potentially also which C++/Julia/whatever type, though I guess we can rely on those languages saying which C types their types are compatible with). This depends on the target and honestly I'm quickly out of my depth for those questions. ABI still surprises me with new nightmares every few weeks, so I'm sure there's more to come.

I do hope someone will pick this up eventually.

I hope we at least are open to solving the issue for cross-language cases too.

Of course we are, I didn't want to give any indication to the contrary! It's just not the problem I want to solve right now. It's not on my own personal todo list either, at the moment.

/// a different signature, the two signatures must be *ABI-compatible* or else calling the function
/// via that function pointer is Undefined Behavior. ABI compatibility is a lot stricter than merely
/// having the same memory layout; for example, even if `i32` and `f32` have the same size and
/// alignment, they might be passed in different registers and hence not be ABI-compatible.
///
/// ABI compatibility as a concern only arises in code that alters the type of function pointers,
/// and in code that combines `#[target_feature]` with `extern fn`. Altering the type of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The case I run into the most is where the code that uses some feature (e.g. vector registers) is written in another language that doesn't have target_feature (e.g. assembly language .S files) and that assumes that all CPU-supported features are available to use (e.g. vector registers). When porting to systems where this isn't the case (e.g. x86-64-unknown-none) I end up needing to audit all the extern "C" declarations.

The other case I've run into is where I am porting assembly code (.S files) from Linux to Apple targets, where the assembly code uses registers (e.g. r18) that aren't safe to use on Apple targets.

In both cases, no function pointers are involved and there's no use of target_feature (and also there are no vector types in the function signature).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this PR we are strictly only concerned with Rust-to-Rust calls. I agree that Rust-to-other/other-to-Rust calls are important, but they are also a huge topic. Let's not scope creep this PR, please.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't object to that idea. But this specific sentence is very misleading since it says "ABI compatibility as a concern only arises in" two situations when that's actually not true.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair. I adjusted the wording.

/// function pointers is wildly unsafe (as in, a lot more unsafe than even
/// [`transmute_copy`][mem::transmute_copy]), and should only occur in the most exceptional
/// circumstances. `#[target_feature]` is also used rarely. But assuming such circumstances, what
/// are the rules?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly this target-feature hell means that we have to be much less definite when we say "if you care about this you are doing something outlandish" :/

///
/// For two signatures to be considered *ABI-compatible*, they must use a compatible ABI string,
/// must take the same number of arguments, the individual argument types and the return types must
/// be ABI-compatible, and the target feature requirements must be met (see the subsection below for
/// the last point). The ABI string is declared via `extern "ABI" fn(...) -> ...`; note that
/// `fn name(...) -> ...` implicitly uses the `"Rust"` ABI string and `extern fn name(...) -> ...`
/// implicitly uses the `"C"` ABI string.
///
/// The ABI strings are guaranteed to be compatible if they are the same, or if the caller ABI
/// string is `$X-unwind` and the callee ABI string is `$X`, where `$X` is one of the following:
/// "C", "aapcs", "fastcall", "stdcall", "system", "sysv64", "thiscall", "vectorcall", "win64".
///
/// The following types are guaranteed to be ABI-compatible:
///
/// - `*const T`, `*mut T`, `&T`, `&mut T`, `Box<T>` (specifically, only `Box<T, Global>`),
RalfJung marked this conversation as resolved.
Show resolved Hide resolved
/// `NonNull<T>` are all ABI-compatible with each other for all `T`. Two of these pointer types
/// with different `T` are ABI-compatible if they have the same metadata type (`<T as
/// Pointee>::Metadata`).
RalfJung marked this conversation as resolved.
Show resolved Hide resolved
/// - `usize` is ABI-compatible with the `uN` integer type of the same size, and likewise `isize` is
/// ABI-compatible with the `iN` integer type of the same size.
RalfJung marked this conversation as resolved.
Show resolved Hide resolved
/// - Any two `fn` types are ABI-compatible with each other if they have the same ABI string or the
/// ABI string only differs in a trailing `-unwind`, independent of the rest of their signature.
/// (Note that this is about the case of passing a function pointer as an argument to a function.
/// The two pointers being ABI-compatible here means that the call successfully passes the
/// pointer. When actually calling the pointer, of course the rest of the signature becomes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what you mean by "successfully passes the pointer." "calling the pointer" doesn't make sense to me either. Did you mean "calling the function through the pointer"?

Copy link
Member Author

@RalfJung RalfJung Nov 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what you mean by "successfully passes the pointer."

There's no UB from passing the function pointer around.

IOW, if the caller has signature fn(fn()), and the callee has signature fn(fn(i32) -> i32), then the function call itself is completely well-defined. Caller and callee are ABI compatible even though the only argument of this function has a different type in the two signatures.

Or to be completely concrete, this code does not have UB.

Of course if the callee actually calls the function pointer that it was given as argument, then the fact that the signatures are different matters.

"calling the pointer" doesn't make sense to me either.

When f: fn(), then f() is "calling the function pointer". At least that's how I would call that operation. You seem to call it "calling the function through the pointer".

With data pointers we say that we read and write the pointer, we don't always spell out "read and write the memory through the pointer". I am following the same principle for function pointers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This point here is a bit confusing since the question "are two fn ptr types A and B ABI-compatible" is ambiguous:

  • it could mean "can I call a function of type A with a caller-side signature of type B"
  • it could mean "can I call a function of type fn(A) with a caller-side signature of type fn(B)"

Usually when we say, e.g., "u32 and NonZeroU32 are ABI-compatible", we mean the latter, but when the two types in questions are themselves function pointers then the terminology becomes unclear.

I'd be happy for suggestions for how to word this more clearly.

Copy link
Contributor

@briansmith briansmith Nov 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This point here is a bit confusing since the question "are two fn ptr types A and B ABI-compatible" is ambiguous:

it could mean "can I call a function of type A with a caller-side signature of type B"
it could mean "can I call a function of type fn(A) with a caller-side signature of type fn(B)"

Usually when we say, e.g., "u32 and NonZeroU32 are ABI-compatible", we mean the latter, but when
the two types in questions are themselves function pointers then the terminology becomes unclear.

I'd be happy for suggestions for how to word this more clearly.

First, I admit I don't think it is good to guarantee anything about transmuting function pointers with incompatible signatures, as I don't understand the motivation driving us to make such guarantees. So one way to clarify things would be to just have one set of rules for ABI compatibility / transmutation for function types for now, and then maybe follow up later with a proposal to guarantee transmutations of incompatible function types work.

usually when we say, e.g., "u32 and NonZeroU32 are ABI-compatible",

If you reserve "ABI compatibility" to be strictly about function calls, function declarations, function definitions, and function pointers, then we could have a separate term "transmutable, e.g. "NonZeroU32 is transmutable to u32" (where "transmutable" means the result of the transmutation is well-defined), and we could have a separate term "A is argument-compatible with B" to talk about where an argument of type A can be passed for a function parameter of type B. And hopefully we would have rules like "Any type A that is transmutable to B is argument-compatible with B." Then you could use "argument-compatible" as part of the definition of ABI-compatibility, in particular each argument in a function-call must be argument-compatible with corresponding parameter in the function's definition (and later, declaration), and (target_feature stuff, etc.).

In other words, don't use "ABI-compatible" and "ABI compatibility" terms to talk about argument/parameter compatibility.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, I admit I don't think it is good to guarantee anything about transmuting function pointers with incompatible signatures, as I don't understand the motivation driving us to make such guarantees. So one way to clarify things would be to just have one set of rules for ABI compatibility / transmutation for function types for now, and then maybe follow up later with a proposal to guarantee transmutations of incompatible function types work.

I think it would be really strange to say that *const i32 and *mut u8 are ABI-compatible, but fn() and fn(i32) are not. Both are just pointer types with different pointee information. It'd be very surprising to require the function signatures to be ABI compatible in order to have the pointers be ABI compatible.

transmutable

That's a very bad term for this situation. i32 is transmutable to u32 but they are not ABI compatible. The PR explicitly talks about this.

For better or worse, "ABI compatible" is already in common use for this concept, in questions like "are u32 and char ABI compaible?" I think it's not a bad term, it's only a bit annoying for this specific case of ABI compatibility of function pointer types.

I have updated the text to use an example instead of remaining so abstract, I hope that helps.

/// relevant as well, according to the rules in this section.)
/// - Any two types with size 0 and alignment 1 are ABI-compatible.
/// - A `repr(transparent)` type `T` is ABI-compatible with its unique non-trivial field, i.e., the
/// unique field that doesn't have size 0 and alignment 1 (if there is such a field).
/// - `i32` is ABI-compatible with `NonZeroI32`, and similar for all other integer types with their
/// matching `NonZero*` type.
/// - If `T` is guaranteed to be subject to the [null pointer
/// optimization](option/index.html#representation), then `T` and `Option<T>` are ABI-compatible.
Mark-Simulacrum marked this conversation as resolved.
Show resolved Hide resolved
tmandry marked this conversation as resolved.
Show resolved Hide resolved
///
/// Furthermore, ABI compatibility satisfies the following general properties:
///
/// - Every type is ABI-compatible with itself.
/// - If `T1` and `T2` are ABI-compatible, then two `repr(C)` types that only differ because one
/// field type was changed from `T1` to `T2` are ABI-compatible.
/// - If `T1` and `T2` are ABI-compatible and `T2` and `T3` are ABI-compatible, then so are `T1` and
/// `T3` (i.e., ABI-compatibility is transitive).
/// - If `T1` and `T2` are ABI-compatible, then so are `T2` and `T1` (i.e., ABI-compatibility is
/// symmetric).
///
/// More signatures can be ABI-compatible on specific targets, but that should not be relied upon
/// since it is not portable and not a stable guarantee.
Comment on lines +1571 to +1572
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be declared stable and relied on for code that is #[cfg]'d for a specific target. Though it sounds like we aren't declaring any of those stable right now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it could, but this PR for now takes the stance that we shouldn't do that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why we shouldn't, but also not opposed to leaving this until we find a use case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One obvious case would be usize vs u64 on 64 bit platforms (and u32, u16 etc).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds more like a case we might want to add to the list: usize is compatible with the uN type of the same size; and similar for isize.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why we shouldn't, but also not opposed to leaving this until we find a use case.

Which wording would you propose here? "There might be other stable things but we won't tell you which" is useless. And we certainly don't want to promise "anything that's incidentally ABI-compat on some target will remain ABI-compat on that target". So I think we only have two options:

  • the thing I wrote
  • an extra list of guaranteed target-specific extensions to ABI-compatibility

///
/// Noteworthy cases of types *not* being ABI-compatible in general are:
/// * `bool` vs `u8`, and `i32` vs `u32`: on some targets, the calling conventions for these types
/// differ in terms of what they guarantee for the remaining bits in the register that are not
/// used by the value.
/// * `i32` vs `f32` are not compatible either, as has already been mentioned above.
/// * `struct Foo(u32)` and `u32` are not compatible (without `repr(transparent)`) since structs are
/// aggregate types and often passed in a different way than primitives like `i32`.
///
/// Note that these rules describe when two completely known types are ABI-compatible. When
/// considering ABI compatibility of a type declared in another crate (including the standard
/// library), consider that any type that has a private field or the `#[non_exhaustive]` attribute
/// may change its layout as a non-breaking update unless documented otherwise -- so for instance,
/// even if such a type is a 1-ZST or `repr(transparent)` right now, this might change with any
/// library version bump.
///
/// If the declared signature and the signature of the function pointer are ABI-compatible, then the
/// function call behaves as if every argument was [`transmute`d][mem::transmute] from the
/// type in the function pointer to the type at the function declaration, and the return value is
/// [`transmute`d][mem::transmute] from the type in the declaration to the type in the
/// pointer. All the usual caveats and concerns around transmutation apply; for instance, if the
/// function expects a `NonNullI32` and the function pointer uses the ABI-compatible type
/// `Option<NonNullI32>`, and the value used for the argument is `None`, then this call is Undefined
/// Behavior since transmuting `None::<NonNullI32>` to `NonNullI32` violates the non-null
/// requirement.
///
/// #### Requirements concerning target features
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chorman0773 can you double-checking this new section, does this sound accurate?

@rust-lang/opsem please also take a look.

///
/// Under some conditions, the signature used by the caller and the callee can be ABI-incompatible
/// even if the exact same ABI string and types are being used. As an example, the
/// `std::arch::x86_64::__m256` type has a different `extern "C"` ABI when the `avx` feature is
/// enabled vs when it is not enabled.
///
/// Therefore, to ensure ABI compatibility when code using different target features is combined
/// (such as via `#[target_feature]`), we further require that one of the following conditions is
/// met:
///
/// - The function uses the `"Rust"` ABI string (which is the default without `extern`).
/// - Caller and callee are using the exact same set of target features. For the callee we consider
/// the features enabled (via `#[target_feature]` and `-C target-feature`/`-C target-cpu`) at the
/// declaration site; for the caller we consider the features enabled at the call site.
/// - Neither any argument nor the return value involves a SIMD type (`#[repr(simd)]`) that is not
/// behind a pointer indirection (i.e., `*mut __m256` is fine, but `(i32, __m256)` is not).
///
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean we need to declare the target features in every extern "C" declaration?

In general it would be helpful to generalize the discussion here to also handle cases where pointers are not involved but instead extern "ABI" is used to declare a non-Rust function that is then called from Rust.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned above I strongly want to avoid scope creep to non-Rust calls here. That's a much more complicated discussion.

/// ### Trait implementations
///
/// In this documentation the shorthand `fn (T₁, T₂, …, Tₙ)` is used to represent non-variadic
Expand Down
8 changes: 4 additions & 4 deletions tests/ui/abi/compatibility.rs
Original file line number Diff line number Diff line change
Expand Up @@ -231,8 +231,7 @@ macro_rules! test_abi_compatible {
};
}

// Compatibility of pointers is probably de-facto guaranteed,
// but that does not seem to be documented.
// Compatibility of pointers.
test_abi_compatible!(ptr_mut, *const i32, *mut i32);
test_abi_compatible!(ptr_pointee, *const i32, *const Vec<i32>);
test_abi_compatible!(ref_mut, &i32, &mut i32);
Expand All @@ -241,14 +240,15 @@ test_abi_compatible!(box_ptr, Box<i32>, *const i32);
test_abi_compatible!(nonnull_ptr, NonNull<i32>, *const i32);
test_abi_compatible!(fn_fn, fn(), fn(i32) -> i32);

// Some further guarantees we will likely (have to) make.
// Compatibility of 1-ZST.
test_abi_compatible!(zst_unit, Zst, ());
#[cfg(not(any(target_arch = "sparc64")))]
test_abi_compatible!(zst_array, Zst, [u8; 0]);
test_abi_compatible!(nonzero_int, NonZeroI32, i32);

// `DispatchFromDyn` relies on ABI compatibility.
// This is interesting since these types are not `repr(transparent)`.
// This is interesting since these types are not `repr(transparent)`. So this is not part of our
// public ABI guarantees, but is relied on by the compiler.
test_abi_compatible!(rc, Rc<i32>, *mut i32);
test_abi_compatible!(arc, Arc<i32>, *mut i32);

Expand Down
Loading