Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

repr(C) on an enum accepts discriminants that do not fit into the default C enum size #124403

Open
RalfJung opened this issue Apr 26, 2024 · 13 comments
Labels
A-enum 🤓 akshually it's an "ADT" or "algebraic data type" A-FFI Area: Foreign function interface (FFI) A-repr Area: the `#[repr(stuff)]` attribute O-windows Operating system: Windows O-windows-msvc Toolchain: MSVC, Operating system: Windows

Comments

@RalfJung
Copy link
Member

Example:

#[repr(C)]
enum OverflowingEnum {
    A = 1111111111111111111,
}
enum overflowing_enum {
    OVERFLOWING_ENUM_A = 1111111111111111111,
};

Rust currently gives size_of::<OverflowingEnum>() = 8. MSVC gives sizeof(enum overflowing_enum) = 4. Additionally, Rust gives OverflowingEnum::A as usize = 1111111111111111111 and MSVC gives OVERFLOWING_ENUM_A = 734294471.

This should probably be a hard error (forward compatibility lint). The nomicon currently states

repr(C) is equivalent to one of repr(u*) (see the next section) for fieldless enums. The chosen size is the default enum size for the target platform's C application binary interface (ABI).

and the reference currently states

For field-less enums, the C representation has the size and alignment of the default enum size and alignment for the target platform's C ABI.

The nomicon is clearly wrong here: #[repr(c_int)] on the enum would fail to compile (where c_int is the correct primitive) (because of the overflowing literal). I think erroring (forward compatibility linting) and pointing the user to use #[repr(u64)] in this case makes sense.

(Summary by @CAD97)

@CAD97 this should error on all targets, not just MSVC, I assume? What do non-MSVC C compilers do with such an enum?

@RalfJung RalfJung added A-repr Area: the `#[repr(stuff)]` attribute A-FFI Area: Foreign function interface (FFI) and removed A-repr Area: the `#[repr(stuff)]` attribute A-FFI Area: Foreign function interface (FFI) labels Apr 26, 2024
@RossSmyth
Copy link
Contributor

The standard did not say much about enum width until recently. Specifically the standard called out only int as the one true enum type. Specifically the wording was that the underlying type

must be representable by an int

In reality many compilers just silently expanded the width (Godbolt link to an old Clang from before C23) to accommodate values. So your enum would be fine in reality on GCC, Clang, TCC, and many others (though on Godbolt CompCert & TenDRA reject it with an error). Obviously MSVC accepts it, but then silently truncates it to the one true (pre-C23) enum type.

Plus the new standard text says

For all the integer constant expressions which make up the values of the enumeration constant, there shall be a signed or unsigned integer type (excluding the bit-precise integer types) capable of representing all of the values.

Meaning that as long as the value can fit in a unsigned long long then it should be fine (on a C23 compiler). MSVC has not implemented this, or really much of C23 yet.

There is also this new syntax in C23:

enum Foo : uint64_t {
    A = 1111111111111111111,
};

Which is implemented in GCC and Clang for specifying the underlying type exactly.

@RalfJung
Copy link
Member Author

RalfJung commented Apr 26, 2024

There is also this new syntax in C23:

I think we have the equivalent of that, via repr(u64).

In reality many compilers just silently expanded the width (Godbolt link to an old Clang from before C23) to accommodate values. So your enum would be fine in reality on GCC, Clang, TCC, and many others (though on Godbolt CompCert & TenDRA reject it with an error). Obviously MSVC accepts it, but then silently truncates it to the one true (pre-C23) enum type.

Thanks for checking!

But the standard also doesn't say that truncation is wrong, or does it?

I guess our options are similar to here. Except for this issue I feel completely fine with just erroring when the discriminant does not fit into the default enum type; that should be very rare and warrants an explicit type annotation in any case...

@RossSmyth
Copy link
Contributor

But the standard also doesn't say that truncation is wrong, or does it?

I would say that C23 does say that is wrong. But pre-C23 silently truncating I would also say is a valid behavior.

So it depends on what we are targeting, the C standard or C compilers. If we say "we target C23" then having 64-bit enums would be valid. If we say "we want to allow any weird C compiler to link with us," then erroring is valid too.

@RalfJung
Copy link
Member Author

I would say that C23 does say that is wrong

Where does it say that? The text you quoted just says that the given discriminant value has to fit some integer type, but it does not say that it has to fit the concrete integer type chosen for the enum.

Ah, but there's more text:

During the processing of each enumeration constant in the enumerator list, the type of the enumeration constant shall be: [...]

The thing is, this says that the different enumeration constants of the same enumeration can have different types! That's not possible in Rust -- so maybe we should indeed reject everything that does not fit into an int.

@workingjubilee
Copy link
Member

That wording is fairly irrelevant in actuality as the type has to be stored in a fixed width.

It is much worse than you think, Ralf. There is nothing actually requiring the C compiler to use int. That is, it has always been standard-conformant to use only 8- or 16-bit types even when the type of int is defined to be a 32-bit integer. Thus we have to match the actual compiler behaviors on this one, as we cannot look to the standard for saying anything about ABI, because it doesn't.

@RalfJung
Copy link
Member Author

I seemed to remember that compilers use the smallest type that fits, but the standard seemed fairly clear about using int...

Thus we have to match the actual compiler behaviors on this one, as we cannot look to the standard for saying anything about ABI, because it doesn't.

Okay so what is the actual compiler behavior for GCC, clang, MSVC? Does MSVC always use int or does it use "the smallest that fits, or int, whatever is smaller"? How does signed/unsigned factor in?

@workingjubilee
Copy link
Member

workingjubilee commented Apr 27, 2024

It says representable by.
All values of type char or short, by definition, are representable by int.

GCC conceals the arbitrary-selection rule behind the -fshort-enums flag, thus making code incompatible based on compiler flags. Lovely practice, that. Clang thus emulates this.

@workingjubilee
Copy link
Member

For some of our targets, they are specified as using -fshort-enums in their ABI, or equivalent, which is why we have the notion of a "minimum C enum width" in enum layout.

@workingjubilee
Copy link
Member

I believe the MSVC situation is expressed by https://learn.microsoft.com/en-us/cpp/build/reference/zc-enumtypes?view=msvc-170

@workingjubilee
Copy link
Member

@kennykerr Vibe check since you're the MSVC bindings expert: I get a strong sense from skimming the current bindings and the codegen patterns in windows-rs that you're avoiding repr(C) int enums? My personal vibe is that since the properly-deduced types are still opt-in,Rust should probably just error for too-high values, but only on MSVC.

@workingjubilee workingjubilee added O-windows Operating system: Windows O-windows-msvc Toolchain: MSVC, Operating system: Windows labels May 4, 2024
@kennykerr
Copy link
Contributor

The only reason I avoid repr(C) enums in Rust is because Rust doesn't allow duplicate discriminant values and that is unfortunately commonplace in the Windows API since C and C++ allows it.

@RalfJung
Copy link
Member Author

RalfJung commented May 4, 2024

That's part of C enums being basically just integer types with named constants, whereas Rust enums are proper algebraic sum types. Seems like there could be a macro to provide the "named constants" kind of enum in Rust? Anyway, that's a different discussion from this issue.

@kennykerr
Copy link
Contributor

I would always recommend using an explicit underlying type for enums to avoid this kind of ambiguity in C++.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-enum 🤓 akshually it's an "ADT" or "algebraic data type" A-FFI Area: Foreign function interface (FFI) A-repr Area: the `#[repr(stuff)]` attribute O-windows Operating system: Windows O-windows-msvc Toolchain: MSVC, Operating system: Windows
Projects
None yet
Development

No branches or pull requests

4 participants