Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use MaybeUninit to produce more correct layouts #63035

Merged
merged 2 commits into from
Aug 7, 2019

Conversation

tmandry
Copy link
Member

@tmandry tmandry commented Jul 27, 2019

This fixes a latent instance of #59972 where we would promote an uninhabited field to the generator prefix, causing the entire generator to be uninhabited.

r? @eddyb
cc @cramertj @Zoxc

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 27, 2019
@tmandry tmandry force-pushed the generator-maybe-uninit branch from 9f71527 to 678f5b0 Compare July 27, 2019 04:53
@tmandry tmandry changed the title [WIP] Use MaybeUninit to produce more correct layouts Use MaybeUninit to produce more correct layouts Jul 27, 2019
@RalfJung
Copy link
Member

This is great! I think this means you can remove the following:

ty::Generator(..) => {
// FIXME: Generator layout is lying: it claims a whole bunch of fields exist
// when really many of them can be uninitialized.
// Just treat them as a union for now, until hopefully the layout
// computation is fixed.
return self.visit_union(v);
}

@tmandry
Copy link
Member Author

tmandry commented Jul 29, 2019

I think this means you can remove the following:

Unfortunately I don't think we can do this yet. The original types will still be reported by ty::layout::field(). If I wrap them in MaybeUninit there, I get codegen errors (presumably, because I have to modify code so that it projects through the union).

Maybe there's a way of doing this without requiring those projections. Otherwise, the "right way" to implement this is to decide at MIR transform time which fields are aggregate types and should be wrapped in MaybeUninit. Then we have to record this in the GeneratorLayout somehow and generate the corresponding MIR.

Note for my future self: we don't have to do any of this for the prefix fields, because the layout describes them as being members of variants, not the generator struct itself. We wrap those prefix fields in MaybeUninit just so that we get the correct abi result out of univariant_uninterned when computing the offset of fields of the prefix.

@tmandry tmandry force-pushed the generator-maybe-uninit branch from 678f5b0 to 516f3e5 Compare July 29, 2019 18:57
This prevents uninhabited fields from "infecting" the abi and
largest_niche of the generator layout.

This fixes a latent bug, where an uninhabited field could be promoted to
the generator prefix and cause the entire generator to become
uninhabited.
@tmandry tmandry force-pushed the generator-maybe-uninit branch from 516f3e5 to 9892845 Compare July 29, 2019 19:17
@tmandry
Copy link
Member Author

tmandry commented Jul 29, 2019

@RalfJung If we were to make #60450 a hard error, I think that would mean that aggregates don't have to be considered MaybeUninit, does that seem correct? We would also have to decide that this won't be changed in the future (see #54987 (comment)).

@RalfJung
Copy link
Member

Unfortunately I don't think we can do this yet. The original types will still be reported by ty::layout::field(). If I wrap them in MaybeUninit there, I get codegen errors (presumably, because I have to modify code so that it projects through the union).

I understand -- but then I don't understand what this PR even changes.

If we were to make #60450 a hard error, I think that would mean that aggregates don't have to be considered MaybeUninit, does that seem correct?

I don't think I understand the connection. Last time I checked, MaybeUninit was needed because creating a new generator effectively did mem::uninitialized() on most of the fields because they were not initially live. Same for fields not live at a yield point.

@tmandry
Copy link
Member Author

tmandry commented Jul 31, 2019

I understand -- but then I don't understand what this PR even changes.

It keeps layouts from being incorrectly marked with Abi::Uninhabited. The test that was added crashes when run (when compiled on current nightly), because the generator type is marked uninhabited, even though it's possible to construct. LLVM puts an illegal instruction in the constructor to catch cases where this "uninhabited" type is constructed.

There are more efficient ways of accomplishing this. I thought we were going to use MaybeUninit in other places, but now I realize that may not be true. Still, this seems like the easiest and least error-prone approach.

I don't think I understand the connection. Last time I checked, MaybeUninit was needed because creating a new generator effectively did mem::uninitialized() on most of the fields because they were not initially live. Same for fields not live at a yield point.

Okay, so a nice property that was introduced in #60187 is that we only add a generator field to the layout variants which correspond to the suspension points it is live across. Not storage-live, but actually live (i.e. it has been initialized, and that value is potentially going to be used again in a subsequent resume, before the next time it is reinitialized).

If my thinking is correct, this is a strong property that gets us most of the way to not needing MaybeUninit. We know the variable has been initialized, and since its value may be used again, we know its bytes won't have been clobbered. By the time the resume function returns, the generator object should be in a consistent state, where all fields described by the current variant are initialized.

The only remaining problem is aggregates, which can be partially initialized across a yield. If we were to disallow that, then we should be done. Does that sound right to you?

@RalfJung
Copy link
Member

It keeps layouts from being incorrectly marked with Abi::Uninhabited.

I see.

Okay, so a nice property that was introduced in #60187 is that we only add a generator field to the layout variants which correspond to the suspension points it is live across. Not storage-live, but actually live (i.e. it has been initialized, and that value is potentially going to be used again in a subsequent resume, before the next time it is reinitialized).

Ah. Yes that sounds like it would make a difference.

It would be great if you could come up with a few interesting examples for this (something you would consider corner-cases of this treatment) and add them as tests to Miri here or here. That would help check if MaybeUninit is used the right way, at least once we remove this hack.

The only remaining problem is aggregates, which can be partially initialized across a yield. If we were to disallow that, then we should be done. Does that sound right to you?

The way you put it, yes, this sounds reasonable. But this is subtle stuff and I have nowhere near anything like a good overview of the things that are going on here.

// MaybeUninit above.
let layout = layout?;
if layout.is_aggregate() {
self.layout_of(tcx.mk_maybe_uninit(layout.ty))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this is a bad idea: this won't properly trigger for newtypes of immediates (scalars/vectors).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the correct way would be with a type visitor then.

Or it's probably best to focus efforts on making #60450 a hard error, so we don't need this hack.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing this in favor of #63230

ty,
&prefix_layouts,
&ReprOptions::default(),
StructKind::AlwaysSized,
)?;
// FIXME(eddyb) need `MaybeUninit` around promoted types (see above).
prefix.largest_niche = None;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should add a test for some generator that captures a bool or something, that it's the same size when wrapped in Option.

In the future, the state/discriminant field would have the right validity range and would be used itself as the niche in many situations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm adding a test, but I'm afraid that if we move the upvars into a variant (#62958 (comment)) this test won't work anymore. =/

@tmandry tmandry force-pushed the generator-maybe-uninit branch from 9892845 to 9d4ca87 Compare August 3, 2019 01:07
@tmandry
Copy link
Member Author

tmandry commented Aug 3, 2019

It would be great if you could come up with a few interesting examples for this (something you would consider corner-cases of this treatment) and add them as tests to Miri here or here. That would help check if MaybeUninit is used the right way, at least once we remove this hack.

Sure, though I'm not sure how to test "things are really initialized when we say they are". I suppose I could test the negative case, by writing an invalid value to a local which is supposed to be initialized and then return, expecting miri validation to catch it (once the hack is removed). Does the test harness support that?

The treatment of uninhabited types would make an interesting testcase, too. Though you would get that and more if you ran miri on the rustc run-pass test suite.

@RalfJung
Copy link
Member

RalfJung commented Aug 3, 2019

Sure, though I'm not sure how to test "things are really initialized when we say they are". I suppose I could test the negative case, by writing an invalid value to a local which is supposed to be initialized and then return, expecting miri validation to catch it (once the hack is removed).

Primarily I was thinking of code that should pass. After all, the point of this PR is to make some code correctly work that didn't work correctly before, right? Something where a generator is created or suspended in a state where some of its fields are not yet initialized. That's how I originally found out about this issue.

Does the test harness support that?

Miri has compile-fail tests for this. However, right now we treat generators as unions and don't validate their fields (that is the code I linked above), so to catch such errors we would have to change that first. This would also be tricky to write because we have to circumvent the validation that happens inside the generator itself.

@tmandry
Copy link
Member Author

tmandry commented Aug 5, 2019

I know @eddyb is away this week. Maybe @oli-obk or @nagisa can take a look?

tmandry added a commit to tmandry/rust that referenced this pull request Aug 5, 2019
This is one of the behaviors we no longer allow in NLL. Since it can
lead to undefined behavior, I think it's definitely worth making it a
hard error without waiting to turn off migration mode (rust-lang#58781).

Closes rust-lang#60450.

My ulterior motive here is making it impossible to leave variables
partially initialized across a yield (see discussion at rust-lang#63035), so
tests are included for that.
Centril added a commit to Centril/rust that referenced this pull request Aug 6, 2019
…ized, r=Centril

Make use of possibly uninitialized data [E0381] a hard error

This is one of the behaviors we no longer allow in NLL. Since it can
lead to undefined behavior, I think it's definitely worth making it a
hard error without waiting to turn off migration mode (rust-lang#58781).

Closes rust-lang#60450.

My ulterior motive here is making it impossible to leave variables
partially initialized across a yield (see rust-lang#60889, discussion at rust-lang#63035), so
tests are included for that.

cc rust-lang#54987

---

I'm not sure if bypassing the buffer is a good way of doing this. We could also make a `force_errors_buffer` or similar that gets recombined with all the errors as they are emitted. But this is simpler and seems fine to me.

r? @Centril
cc @cramertj @nikomatsakis @pnkfelix @RalfJung
Centril added a commit to Centril/rust that referenced this pull request Aug 6, 2019
…ized, r=Centril

Make use of possibly uninitialized data [E0381] a hard error

This is one of the behaviors we no longer allow in NLL. Since it can
lead to undefined behavior, I think it's definitely worth making it a
hard error without waiting to turn off migration mode (rust-lang#58781).

Closes rust-lang#60450.

My ulterior motive here is making it impossible to leave variables
partially initialized across a yield (see rust-lang#60889, discussion at rust-lang#63035), so
tests are included for that.

cc rust-lang#54987

---

I'm not sure if bypassing the buffer is a good way of doing this. We could also make a `force_errors_buffer` or similar that gets recombined with all the errors as they are emitted. But this is simpler and seems fine to me.

r? @Centril
cc @cramertj @nikomatsakis @pnkfelix @RalfJung
@oli-obk
Copy link
Contributor

oli-obk commented Aug 6, 2019

The change looks fine to me. I mean treating some value as a union in layout computation is always fine, just pessimistic. So the only "dangerous" change is removing the niche override, but that's also ok now. I'm guessing that Layout::field still returns the regular type, without a MaybeUninit around it? Not sure what effects that has, but I guess it's alright?

All in all, I think it's fine, but I don't have the best overview of downstream effects.

@tmandry
Copy link
Member Author

tmandry commented Aug 6, 2019

I'm guessing that Layout::field still returns the regular type, without a MaybeUninit around it?

Yep.

Not sure what effects that has, but I guess it's alright?

It means that according to the type system the type hasn't changed. We're just "pessimizing" the layout computation so it doesn't do thinks like propagate Uninhabited abi's to the outer type.

The reason for this change is that the "promoted" fields might indeed be uninitialized, so it should give us the correct behavior with respect to layouts.

@RalfJung
Copy link
Member

RalfJung commented Aug 6, 2019

Isn't the type system based on Ty/TyS as opposed to TyLayout? So I can't relate your last statement to what is happening in rustc.

It also means that some code looking at the field's types will see T and some will see MaybeUninit<T>. I don't know which code falls onto which side here and if that is a problem, but it certainly seems odd.

@tmandry
Copy link
Member Author

tmandry commented Aug 6, 2019

It's quite subtle, so sorry for not giving a more detailed explanation before.

The promoted fields we are wrapping in this PR are still only part of a variant in the final layout. So the layout is correct in that each field (including the promoted ones) will be fully initialized when that variant is active.

This PR wraps these promoted fields in MaybeUninit during an intermediate phase of the layout computation, where we are combining all the upvars and discriminant with variant fields which aren't allowed to overlap with other fields.

During this phase, we want any uninhabited upvars to cause our generator to have Abi::Uninhabited. However, promoted locals (which are only part of one variant) should not propagate uninhabited-ness. Therefore, we wrap them in MaybeUninit to prevent this.

@RalfJung
Copy link
Member

RalfJung commented Aug 6, 2019

I know "upvar" from closures, and I think that's just a captured variable, right? What's a "promoted local", does that have anything to do with static promotion?

@tmandry
Copy link
Member Author

tmandry commented Aug 6, 2019

I know "upvar" from closures, and I think that's just a captured variable, right? What's a "promoted local", does that have anything to do with static promotion?

No, it's just a term within the layout computation.

There are two zones of a generator: the non-overlap zone ("prefix") and the overlap zone. Saved locals default to being in the overlap zone. If two locals in the overlap zone aren't allowed to overlap with each other, but might, one of them is "promoted" to the non-overlap zone.

These "promoted" locals were causing trouble, because of a bug in the layout computation that treated them as a field of the generator itself instead of a field of one variant. This wasn't correct with respect to uninhabitedness and largest niche.

I'm now realizing that there needs to be some symmetric kind of change which propagates uninhabitedness of the promoted fields out to the variants which contain them. Largest niche I'm less sure about, since a promoted field can be a member of more than one variant at the same time.

@eddyb
Copy link
Member

eddyb commented Aug 6, 2019

@bors r+

@tmandry Maybe you need to find a different name than "promoted"? Maybe "hoisted"?

@bors
Copy link
Contributor

bors commented Aug 6, 2019

📌 Commit 9d4ca87 has been approved by eddyb

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 6, 2019
@tmandry
Copy link
Member Author

tmandry commented Aug 6, 2019

I'm now realizing that there needs to be some symmetric kind of change which propagates uninhabitedness of the promoted fields out to the variants which contain them. Largest niche I'm less sure about, since a promoted field can be a member of more than one variant at the same time.

I chatted with @eddyb about this and they said it's safer not to do it, so let's skip it for now.

bors added a commit that referenced this pull request Aug 7, 2019
Rollup of 9 pull requests

Successful merges:

 - #63034 (Fix generator size regressions due to optimization)
 - #63035 (Use MaybeUninit to produce more correct layouts)
 - #63163 (add a pair of whitespace after remove parentheses)
 - #63294 (tests for async/await drop order)
 - #63307 (don't ignore mir_dump folder)
 - #63308 (PlaceRef's base is already a reference)
 - #63310 (Tests around moving parts of structs and tuples across await points)
 - #63314 (doc: the content has since been moved to the guide)
 - #63333 (Remove unnecessary features from compiler error code list)

Failed merges:

r? @ghost
@bors bors merged commit 9d4ca87 into rust-lang:master Aug 7, 2019
@bors
Copy link
Contributor

bors commented Aug 7, 2019

⌛ Testing commit 9d4ca87 with merge 615c460...

@RalfJung
Copy link
Member

RalfJung commented Aug 7, 2019

@tmandry would be great to get some of these explanations written down somewhere, probably in the rustc-guide, and/or appropriate places in the layout computation code?

I am still a bit confused by the two zones, it would be worth explaining how they differ. My guess is that one zone is basically a struct and one is an enum (with one variant per "generator state" aka yield point), is that correct? And the non-overlap zone is the struct part? But then it seems odd to say that locals are in the enum part unless we figure out they cannot be there; from a safety perspective it seems they should be in the struct part unless we can prove that they can be moved to the enum part?

@pietroalbini
Copy link
Member

@bors retry r-

@bors bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Aug 7, 2019
@tmandry
Copy link
Member Author

tmandry commented Aug 7, 2019

@RalfJung There is some explanation in the code, along with inline comments in the implementation below:

rust/src/librustc/ty/layout.rs

Lines 1285 to 1302 in 647ed20

// When laying out generators, we divide our saved local fields into two
// categories: overlap-eligible and overlap-ineligible.
//
// Those fields which are ineligible for overlap go in a "prefix" at the
// beginning of the layout, and always have space reserved for them.
//
// Overlap-eligible fields are only assigned to one variant, so we lay
// those fields out for each variant and put them right after the
// prefix.
//
// Finally, in the layout details, we point to the fields from the
// variants they are assigned to. It is possible for some fields to be
// included in multiple variants. No field ever "moves around" in the
// layout; its offset is always the same.
//
// Also included in the layout are the upvars and the discriminant.
// These are included as fields on the "outer" layout; they are not part
// of any variant.

These are implementation details I'd like to change at some point, but it's possible they may persist for some time. See also the PR description for #60187, which probably has the best summary.

I am still a bit confused by the two zones, it would be worth explaining how they differ. My guess is that one zone is basically a struct and one is an enum (with one variant per "generator state" aka yield point), is that correct? And the non-overlap zone is the struct part?

Yes.

But then it seems odd to say that locals are in the enum part unless we figure out they cannot be there; from a safety perspective it seems they should be in the struct part unless we can prove that they can be moved to the enum part?

We rule out some vars initially, but the main constraint is that we have a matrix M_{i,j} ∈ {0, 1} that states whether vars i and j can both be in the enum part. In the implementation we start with all eligible vars in the enum part, and then rule them out as we find pairs that aren't allowed to coexistoverlap. This "destructive" approach is functionally equivalent to the "constructive" approach that you are describing.

@tmandry tmandry deleted the generator-maybe-uninit branch August 7, 2019 18:33
@RalfJung
Copy link
Member

RalfJung commented Aug 7, 2019

then rule them out as we find pairs that aren't allowed to coexist.

You mean fields that are allowed to coexist and hence must not overlap? Or am I mixing something up here?

So every field is either in exactly one variant (if it is in the overlap part) or in all of them? That is a surprising restriction and might have contributed to my confusion. After all, there is a continuum between this: a field could be in 2 out of N variants, so it can still overlap with fields from the remaining variants. I suppose that's not done to keep everything simpler? Might be worth putting that somewhere in the introduction for this documentation: In general, every variant corresponds to a yield point and every variable (field) is live across some yield points, and then it will be in exactly those variants, and the task in layout would be to find the most compact way to represent this. But to keep things simpler, we add all fields either in one variant or in all of them.

But then I wonder why this needs any matrix... the variables that can be overlapped are exactly those that are live for only one yield point, are they not? I must still be missing something, sorry. :/

But at least the PR makes sense now: fields in the prefix are basically in all variants, including those in which they are not actually live, and hence must be considered MaybeUninit.

@tmandry
Copy link
Member Author

tmandry commented Aug 7, 2019

You mean fields that are allowed to coexist and hence must not overlap? Or am I mixing something up here?

Sorry, I meant "overlap". I edited the original message to make this clearer.

So every field is either in exactly one variant (if it is in the overlap part) or in all of them?

No, fields are only in the variants that need them. That can indeed mean a field is in 2 or 3 variants out of many.

What I am saying about the "zones" only applies to the byte layout of fields in the generator. Nothing needs to care about this except the layout code. Think of the variant as a "view" into this structure.

If instead you had said, "every field has space reserved either in exactly one variant or in all of them," then that would be correct. But again, this is an implementation detail of the layout code.

From the point of view of a MIR consumer, if I look at what fields are present in a given variant, I will only see the fields that are live across the corresponding yield point. The rest of the bytes will just look to me like padding between fields.

But then I wonder why this needs any matrix... the variables that can be overlapped are exactly those that are live for only one yield point, are they not? I must still be missing something, sorry. :/

Whether or not two variables can overlap is determined by analysis on the MIR. Here's a case where two vars are each live for a single yield point, but cannot overlap:

|| {
  let x = "hello".to_string();
  yield;
  let y = "world".to_string();;
  drop(x);
  yield;
  drop(y);
}

We have to do dataflow analysis to see that x and y are storage-live at the same time.

@RalfJung
Copy link
Member

RalfJung commented Aug 7, 2019

No, fields are only in the variants that need them. That can indeed mean a field is in 2 or 3 variants out of many.

What I am saying about the "zones" only applies to the byte layout of fields in the generator. Nothing needs to care about this except the layout code. Think of the variant as a "view" into this structure.

If instead you had said, "every field has space reserved either in exactly one variant or in all of them," then that would be correct. But again, this is an implementation detail of the layout code.

Ah, I think now it clicked, finally! Thanks.

Maybe this feedback can help improve the docs for the next time you need to explain this. ;)

From the point of view of a MIR consumer, if I look at what fields are present in a given variant, I will only see the fields that are live across the corresponding yield point. The rest of the bytes will just look to me like padding between fields.

Awesome. So then indeed MaybeUninit should not be needed for MIR/TyLayout consumers. And we should be able to no longer special case generators in Miri (assuming getting the discriminant works like it does for enums).

I suppose an intersting testcase for Miri would be one where there is an uninhabited type live across an unreachable yield point...

fn test(b: bool, x: fn() -> !) {
  yield;
  if b { return; }
  let val = x();
  yield;
  drop(val);
}

Except this is a ZST so maybe that actually doesn't make it interesting? Not sure how smart this all is.

@tmandry
Copy link
Member Author

tmandry commented Aug 7, 2019

Maybe this feedback can help improve the docs for the next time you need to explain this. ;)

Yes, noted :)

I suppose an intersting testcase for Miri would be one where there is an uninhabited type live across an unreachable yield point...

Yeah, I was thinking of a case just like this. FYI, we have tests like this in rust here.

bors added a commit to rust-lang/miri that referenced this pull request Aug 9, 2019
Add generator, async tests with uninhabited saved local

See discussion in rust-lang/rust#63035.
@tmandry
Copy link
Member Author

tmandry commented Sep 5, 2019

cc #60889

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants