Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do #[repr(Rust)] unions have internal padding? #354

Open
alercah opened this issue Jul 17, 2022 · 29 comments
Open

Do #[repr(Rust)] unions have internal padding? #354

alercah opened this issue Jul 17, 2022 · 29 comments

Comments

@alercah
Copy link

alercah commented Jul 17, 2022

Forked from #156. The question is specifically whether a union type has non-tail padding when every variant has padding at a particular byte, meaning that the padding could be clobbered. @RalfJung says no, because he would like unions to simply be [byte; N]. Nobody has made a case for any other answer, as far as I'm aware. But making a separate issue since it's a separate question about the #[repr(Rust)] ABI and what we want to guarantee.

@RalfJung
Copy link
Member

It's also an LLVM lowering question, we might have to do some work to ensure that LLVM does not consider any repr(Rust) union to have padding.

@Lokathor
Copy link
Contributor

Is there a scenario where it's an advantage to have internal padding within a union?

@comex
Copy link

comex commented Jul 19, 2022

It can provide niches.

@alercah
Copy link
Author

alercah commented Jul 20, 2022

Padding can't provide niches because it's subject to clobbering.

@alercah
Copy link
Author

alercah commented Jul 20, 2022

Technically one advantage is that padding allows calling convention optimizations.

I also think it would be a bit weird and possibly uninituitive if overaligned unions with tail padding didn't treat it as proper padding?

@CAD97
Copy link

CAD97 commented Jul 21, 2022

The extra work to tell LLVM there is no padding is "simple enough"; lower the union as if it had a variant which contains no padding.

If we do allow interior padding in #[repr(Rust)] unions, then I'd actually prefer the stricter definition of "byte must be potentially valid for some variant" to also benefit from niches, and introduce #[repr(trivial)]1 for the full bag-o-bytes, additionally without trailing padding.

I'm torn on whether #[repr(Rust)] unions having trailing padding is enough for me to prefer strengthening their byte-validity in this way.

Footnotes

  1. bikeshed ahoy; raw, simple, etc; could maybe tie into putting the repr on struct as well for non-C-FFI stable ABI where we can guarantee the trivial layout, and free up #[repr(C)] to change to match the various "platform C" layout peculiarities rather than being target independent and used for manual allocation and pointer offsetting.

@scottmcm
Copy link
Member

The extra work to tell LLVM there is no padding is "simple enough"; lower the union as if it had a variant which contains no padding.

That's what we already do, based on the discussion in rust-lang/rust#97712 (comment)

@CAD97
Copy link

CAD97 commented Jul 21, 2022

That implementation is interesting, as IIUC it's relying on MaybeUninit not having padding in order to copy padding in the untyped copy, at least in the case where it's not broken down into zero-padding-int-type chunks.

The actual behavior of the pointer copy APIs is an orthogonal decision issue from whether unions have padding, though.

Silly libs patch to avoid relying on non-padding of union in MaybeUninit
union MaybeUninit<T> {
    value: T,
    undef: (),
    nopad: [u8; const{size_of::<T>()}],
}

@RalfJung
Copy link
Member

@scottmcm yeah I don't see how that commit relates to how we lower unions.

Currently we don't seem to be doing this right, this

pub union U {
    f: (u8, u16),
    g: (),
}

pub fn u(u: U) {}

becomes define void @_ZN10playground1u17h71b96be7befbcafcE(i8 %u.0, i16 %u.1). :(

@RalfJung
Copy link
Member

Ironically, making it #[repr(C)] "fixes" it to define void @_ZN10playground1u17hdaf5dbffaf364510E(i32 %0).^^ I think both of these are wrong, they are exactly swapped...

@bjorn3
Copy link
Member

bjorn3 commented Jul 21, 2022

No, it is actually correct for #[repr(C)]. The abi specifies that it is to be passed as 32bit integer in a single register: https://godbolt.org/z/8YzKqdEfv If you use a larger type, it will split it: https://godbolt.org/z/sxqqKf46z Just like rustc does: https://godbolt.org/z/fMYEETnjx

@RalfJung
Copy link
Member

Oh I see.

But anyway, for repr(Rust) we do not currently have preserve-all-bits semantics.

@5225225
Copy link

5225225 commented Jul 22, 2022

use std::mem::MaybeUninit;

unsafe fn print(x: MaybeUninit::<(u16, u8)>) {
    let value: u32 = std::mem::transmute(x);
    if value != 0xaabbccdd {
        panic!("got {value:x}");
    }
}

fn main() {
    let mut x = MaybeUninit::<(u16, u8)>::uninit();
    unsafe {
        x.as_mut_ptr().cast::<u32>().write_unaligned(0xaabbccdd);
        print(x);
    }
}

Funky, I didn't know that failed :) I'm assuming this isn't UB, and that it is indeed illegal for rust to zero the padding here, seeing as miri runs it perfectly fine.

Worth me making an issue on rust-lang/rust about to track this? I can't seem to find an existing one.

@CAD97
Copy link

CAD97 commented Jul 22, 2022

Just a note to make sure it's noted in the discussion: MaybeUninit is (currently) #[repr(transparent)] and not #[repr(Rust)].

And #[repr(transparent)] unions are guaranteed to have the same ABI as the transparently wrapped type.

(Tuples are #[repr(Rust)], though, so the previous should follow #[repr(Rust)] padding rules.)

However, Miri IIRC doesn't do any padding deinit at the moment, so Miri accepting it isn't necessarily an indicator that it's intended to be supported.

@RalfJung
Copy link
Member

And #[repr(transparent)] unions are guaranteed to have the same ABI as the transparently wrapped type.

Ah yes, I had entirely forgotten about that. :( And we need that for performance at least in some cases, like SIMD.

But Rust pairs don't have a defined ABI, so we could maybe? use a single i32 to pass this across function boundaries. Transmute is still okay, this only affects fn call ABI, not memory ABI.

@alercah
Copy link
Author

alercah commented Jul 22, 2022

I think it would be a violation of the contract of repr(transparent) unless we changed the ABI of the pair. And I'm not a fan of that at all.

@RalfJung
Copy link
Member

Union repr(transparent) already doesn't forward niches, so -- not sure if it even is clear what the contract of repr(transparent) on unions is. AFAIK the attribute is not stable there?

@CAD97
Copy link

CAD97 commented Jul 22, 2022

IOW, it's (almost certainly) considered valid to link extern "Rust" fn f(_: MaybeUninit<(u8, u16)>) and extern "Rust" fn f(_: (u8, u16)). Making this invalid seems quite an undesirable of #[repr(transparent)].

It could be allowed — roughly, #[repr(Rust)] types may have a different ABI (but not layout) when contained in a #[repr(transparent)] container — but again, this seems undesirable.

@CAD97
Copy link

CAD97 commented Jul 22, 2022

AFAIK the attribute is not stable there?

#[repr(transparent)] union is defined by RFC

Makes a clear API guarantee that a Wrapper<T> can be [...] substituted for a T in an FFI function's signature [...]

and in MaybeUninit's docs

MaybeUninit<T> is guaranteed to have the same [...] ABI as T [...] a type containing a MaybeUninit<T> is not necessarily the same layout [...]. While MaybeUninit is #[repr(transparent)] (indicating it guarantees the same [...] ABI as T), this does not change any of the previous caveats.

@chorman0773
Copy link
Contributor

But Rust pairs don't have a defined ABI, so we could maybe? use a single i32 to pass this across function boundaries. Transmute is still okay, this only affects fn call ABI, not memory ABI.

The same effect can be achieved using #[repr(C)] struct Foo(u8,u16); or #[repr(C)] struct Foo(u8, u64);.

@workingjubilee
Copy link
Member

I should note one of the reasons for repr(transparent) is specifically to opt-in to being treated like a "simple" underlying value and thus, e.g. not to be placed in the stack. If an ABI defines (u8, u16) as passed in two registers, one holding the u8 and one holding the u16, we wouldn't expect padding to be preserved.

@alercah
Copy link
Author

alercah commented Oct 30, 2022

#[repr(Rust)] is not #[repr(transparent)], so not sure why we got sidetracked on it.

@joshtriplett's (implicitly) made the case on #368 that whichever repr has preserve-all-bits semantics should preserve tail padding bits, not just interior padding bits.

However, the preservation of tail padding has significant consequences on unions with highly aligned types. Consider union(u8x64, [u8; 65]). I'm not sure why you would write this type, but it has a whopping 63 bytes of tail padding! (At least on platforms where u8x64 has alignment 64.) So there would be actually a pretty significant performance cost to preserve-all-bits semantics if they included tail padding. Because of Rust's philosophy of zero-cost abstractions, therefore, I think that #[repr(Rust)] should not have preserve-all-bits semantics.

@RalfJung
Copy link
Member

I agree with Josh, I would find it rather surprising that tail padding is lost in a repr advertised as "preserving all bits".

Copying 1 byte of a cacheline vs copying the entire cacheline is unlikely to make a big difference in practice, is it? Also if you want the speed benefit of dropping some of the bits, then maybe you shouldn't be requesting an preserving-all-bits representation?

@chorman0773
Copy link
Contributor

chorman0773 commented Oct 31, 2022 via email

@CAD97
Copy link

CAD97 commented Oct 31, 2022

#[repr(Rust)] is not #[repr(transparent)], so not sure why we got sidetracked on it.

The reason I brought it up is that MaybeUninit is (currently) #[repr(transparent)], and the ABI-matching behavior is important for performance in some cases (e.g. passing MaybeUninit<Simd> in simd registers).

This has an impact on this discussion, because it means that MaybeUninit does not preserve all bits, even if #[repr(Rust)] unions do; MaybeUninit only preserves the bits required to be preserved by the wrapped type.

FWIW, while "preserve all bits up to trailing padding" is more complicated than "preserve all bits," I don't think it necessarily impractically more complicated. This is in contrast to internal padding, which is much harder to define if/when it is preserved.

It's perhaps also worth noting that eventually it'll be possible to layer "preserve all bits" semantics on top:
#![feature(generic_const_exprs)]
use std::mem::{size_of, ManuallyDrop, MaybeUninit};
pub union PreserveAllBits<T>
where
    [(); size_of::<T>()]:,
{
    pub inner: ManuallyDrop<T>,
    pub bytes: [MaybeUninit<u8>; size_of::<T>()],
}

@RalfJung
Copy link
Member

You're copying a whole extra
cacheline of entirely padding.

It's 1 byte of (potential) data and 63 bytes of padding. We can't omit that 1 byte copy anyway since there might be data in there.

What's wrong with just using a repr(Rust) enum if you want to have padding that can be omitted on copies?

@RalfJung
Copy link
Member

FWIW, while "preserve all bits up to trailing padding" is more complicated than "preserve all bits," I don't think it necessarily impractically more complicated. This is in contrast to internal padding, which is much harder to define if/when it is preserved.

I think we can have a recursive definition of what are the "padding bytes" of a type:

  • base types (int, bool, char, float, ref/ptr) don't have padding bytes.
  • structs: all bytes not in a field are padding bytes, and the padding bytes of the fields are propagated
  • unions: padding byte are the intersection of (for each variant, the padding bytes of the field as well as all bytes before and after the field).
  • enums: this is the most tricky one, but it also boils down to "all bytes that are, for each variant, either outside the variant or padding of the variant"

I wouldn't even make "trailing padding" a term that comes up in this specification.

@CAD97
Copy link

CAD97 commented Oct 31, 2022

So here's an argument for #[repr(Rust)] union not preserving trailing padding: #[repr(Rust)] union should be "just" a #[repr(Rust)] enum but without the discriminant tag.

This might even argue against preserving internal padding, depending on how the AM copy of a Rust Enum happens. A copy is necessarily a read, so there's nothing preventing the AM from only copying the "active variant" of a discriminated enum.

(Disclaimer: I don't know what my answer to the question this poses is. However, it has made me at least more sympathetic to the option of #[repr(Rust)] being restrictive and a #[repr(raw)] (bikeshed) being used when the "bag-of-bits" representation is desired. I strongly believe that even in this case it shouldn't impact the language's idea of what operations are safe, however; that's a library-level concern.)

@memoryleak47
Copy link

For anyone curious, this is how minimize (the tool converting Rust to Minirust) calculates union padding:
https://github.com/memoryleak47/minirust-tooling/blob/main/minimize/src/chunks.rs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests