-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add a stucts-and-tuples chapter #31
add a stucts-and-tuples chapter #31
Conversation
Pushed an update containing the various suggestions from @eddyb. |
[#42877]: https://github.com/rust-lang/rust/issues/42877 | ||
[pg-unsized-tuple]: https://play.rust-lang.org/?gist=46399bb68ac685f23beffefc014203ce&version=nightly&mode=debug&edition=2015 | ||
|
||
There are also benefits also to having fewer guarantees. For example: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: two "also"s
### Default layout ("repr rust") | ||
|
||
The default layout of structs is undefined and subject to change | ||
between compiler revisions. We further do not guarantee that two |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Between individual compilations, no? I think that is what we had determined was the line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I would like to push back on this slightly — there is a general desire to ensure that th compiler output is deterministic. This is not 100% true but it is very nearly true, and we would like to to be true. This seems to imply that, so long as the input does not change, the layout cannot change. I am not sure why we would need to lose that guarantee.
Long ago I proposed that we might want to guarantee (some subset of) newtype unpacking for repr(Rust) structs. @nikomatsakis carried this over into #11 as discussion point but it received no further discussion. I like to think that means it's uncontroversial 😄 I've also never heard of any reason why one might not want that to be true. To make a specific proposal, let's restrict it to structs [1] that contain a single field having the same memory layout as the type of the sole field. So [1] The same guarantee for |
@rkruppe for So shouldn't that guarantee be restricted to structs with a single public field? Or at least there should be a lint if such transmutations are used by a crate that does not own |
Visibility should not affect layout, only who can rely on the layout. |
That might be worth documenting. Still, it might be better to document the intent of such structs with |
I agree that such subtleties should be documented, but "punishing" people who do not do that in a certain prescribed way (a repr attribute, instead of e.g. a comment) by making their code de jure undefined behavior (especially if it will de facto never misbehave because the layout cannot sensibly be any different) is simply user-hostile with no benefit. |
Here is the set of unresolved questions: Zero-sized structs (#37). If you have a struct which -- Single-field structs (#34). If you have a struct with single field Homogeneous structs (#36). If you have homogeneous structs, where all Deterministic layout ([#35]). Can we say that layout is some deterministic |
The default layout of structs is not specified. Effectively, the | ||
compiler provdes a deterministic function per struct definition that | ||
defines its layout. This function may as principle take as input the | ||
entire input program. Therefore: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure this wording reserves the freedoms it want to reserve, and even if someone argued it does I would like it to be clarified. Since we did not get consensus to rule out profile-guided layout, the program source code is not all that informs layout. Not even if that includes build scripts, input data files, etc. that are used during the profiling run, since the program may not be deterministic w.r.t. these (e.g. it might have race conditions or depend on the system time or ...). So I don't think we can guarantee anything involving the words "deterministic function" and have to stick to something like "every time you invoke the compiler you may get a completely different layout".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we did not get consensus to rule out profile-guided layout
I might have missed this, but did you managed to write your thoughts about this down?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, not yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. I agree with the gist of your comment @rkruppe but I think I don't quite agree with this part:
Not even if that includes build scripts, input data files, etc. that are used during the profiling run, since the program may not be deterministic w.r.t. these (e.g. it might have race conditions or depend on the system time or ...).
In particular, it seems like we would basically want to say that layout is determined by the program source code + other auxiliary inputs (e.g., compiler settings, output from PGO, etc).
I don't know that we need to give the freedom for rustc to choose arbitrary orderings on two consecutive runs where nothing at all changed (in particular, we've been shooting for deterministic compilation, and this would sort of contravene that). Of course it's ok to start there for now, since it doesn't say that we have to change layout...just seems looser than is needed.
Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear: it is certainly possible to define PGO traces as part of the compiler input (beyond just sources and compiler flags), though it's a bit of a stretch IMO. That should be called out explicit in the text, though. What's currently written here could reasonably be read as saying the layout depends just on the source code and flags such as -C opt-level
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, the updated version is good in that regard, thanks!
a detailed write-up): | ||
layout scheme. See section 6.7.2.1 of the [C17 specification][C17] for | ||
a detailed write-up of what such rules entail. For most platforms, | ||
however, this means the following: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should just go one step further here and just state that the "most recent" C specification applies. It would be a pain to have to update this every N years, history has shown that the newer C specs are backwards compatible with the old ones (e.g. the C99, C11, and C17 specs have never introduced breaking changes here - they just have defined behavior that was undefined before), and we probably want to be as compatible as possible with C anyways which means we have to follow the latest spec.
This kinds of ties Rust with the C spec, but this is already pretty much the case, not only for platform support, but some newer C proposals like N2289 - Zero overhead failure should definetely allow using Rust's Option
and Result
properly as error handling mechanisms in C FFI. So whether we like it or not, Rust is already a stakeholder in the C standardization process, and we should be sending someone to their meetings to represent Rust's interests in the ISO C standard evolution, and that would include layout guarantees for repr(C)
types. We don't want any changes to the C language to make it impossible for Rust to target C via FFI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I honestly think this is a great start. As other issues progress, and unresolved issues get resolved, this will probably be amended many times, but it will be easier to discuss the details of those changes in PRs that build on top of this one.
ahead of | ||
time](https://github.com/rust-rfcs/unsafe-code-guidelines/issues/11#issuecomment-420659840), | ||
so the user cannot do it manually. | ||
- If layout is defined, then it becomes part of your API, such taht |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: s/taht/that/
Also, fix the dangling "therefore".
per rkruppe's points
I pushed some updates. The most interesting thing has to do with zero-sized structs. I added the following text to the
|
a struct like `#[repr(C)] struct Foo { }`. Further, when a | ||
`#[repr(C)]` struct has a field whose type has zero-size, that field | ||
may induce padding due to its alignment, but will not otherwise affect | ||
the offsets of subsequent fields (as it takes up zero space). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As @rkruppe noted on Zulip, this seems like a problem. It means that "copy-pasting struct
definitions and adding repr(C)
everywhere" does not give you C compatibility, because your Foo
would actually take space when put in a larger struct in C
.
This seems like a bug, TBH. I am not sure if it is a bug that we can still fix. Might be worth having at least a warning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes — I was thinking the same in that conversation. That is, "bug and not clearly a bug we can fix", which does suggest that at least a lint is warranted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the benefit of others, the Zulip conversation was here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not actually how C works - C does not allow empty structs. The godbolt link was from C++, which does allow empty structs, and in which empty structs have size one. Basically, this would only be an issue for somebody like Mozilla, who talk to C++ through a non-C ABI.
Notably, also, gcc and clang's C extension for empty structs has sizeof(struct { }) = 0
: https://godbolt.org/z/K3_5fJ
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Transcribing another thing @ubsan noted on Zulip: empty structs are accepted as an extension by some C compilers, but (at least) GCC and Clang make them have size zero, unlike C++. Example: https://godbolt.org/z/AS2gdC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Allow-by-default at best - 0 size structures are weird in C++, and usually you'd use either EBO or [[no_unique_address]]
with them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't them being weird another argument for making this a warn-by-default lint? I expect many people will not know this. I am not a C++ expert, but I have programmed in C++ for many years and never heard about this; I don't think we can expect everybody doing C++ FFI to know about these issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So to summarize: C does not allow empty structs, some C language extensions allow empty structs with sizeof == 0
, C++ does allow empty structs but these have a sizeof == 1
unless they are inherited from or they are fields that have the [[no_unique_address]]
attribute (in both cases, they don't increase the size of the struct - i'm unsure what role the alignment of the type plays though).
I think #[repr(C)]
should warn-by-default on this when it makes a difference, that is, when the ZST would change the layout. We could have an opt-in warning that always warns on ZST being used in #[repr(C)]
but I fear that would be extremely noisy for little win.
About the situation with the C-language extension and C++ it appears that #[repr(C)] != #[repr(Cxx)]
, so maybe we just need to add new repr
s to deal with those. In the mean time it might be worth it to just ignore C++ while specifying #[repr(C)]
here (maybe add a note so that we don't forget).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to google EBO: Empty base optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RalfJung note that EBO is guaranteed by the C++>=11 standard for types with standard layout, like empty structs: struct T {};
, so it is a required layout optimization.
Is this almost ready to merge? |
@avadacatavra my hope was that we will merge it today |
That said, I think that some part of this conversation about zero-sized structs deserves to be added. I think we ought to also add "lint for |
5318410
to
a9223e2
Compare
OK, I attempted to summarize the conversation from here in the latest commit and added some appropriate warning language. |
It looks like gcc (the only compiler which has implemented
|
I'm going to merge this, and any additions can be filed as followup issues/PRs (cc @nikomatsakis) |
Writes up what I believe to be the consensus around layout of structs and tuples. Suggestions very welcome!
Here are some of the highlights:
(T1...Tn)
is laid out the same asTupleN<T1..Tn>
wherestruct TupleN<P0..Pn>(P0..Pn)
.[T; N]
.#[repr(Rust)]
structs (the default) have no particular layout guarantees. In particular, even if two#[repr(Rust)]
structs have the fields of the same types, they are not guaranteed to be laid out in a compatible way. This is a conservative position that could be strengthened, though I personally now think it'd be better to "opt-in" to any such guarantees. I cover the pros and cons in the doc.#[repr(C)]
,#[repr(align)]
,#[repr(packed)]
, and#[repr(transparent)]
are all discussed.Fixes #11
Fixes #12
Fixes #17