-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ICE when assigning to a union with fields implementing Drop #41073
Comments
Should this drop |
That println is distracting, so I reworded the example a bit: {
let mut test = Test { a: A(3) }; // test.a is initialized -> test.b is initialized, test is initialized
{
let tmp_b = test.b; // test.b is uninitialized (moved out) -> test.a is uninitialized, test is uninitialized
// tmp_b is initialized
} // tmp_b is dropped, B is printed
test.b = B(0.5); // test.b is initialized -> test.a is initialized, test is initialized
// test.b is not dropped on assignment because it was uninitialized
} // test is dropped (noop) |
And if you move out a subfield? #![feature(untagged_unions)]
union U {
x: (Box<u32>, Box<u32>),
y: Vec<u32>,
raw: [u64; 4]
}
fn main() {
let mut u = U { raw: [0; 4] };
unsafe {
drop(u.x.1);
if true {
u.y = Vec::new(); // what does this drop?
} else if true {
u.x = (Box::new(0), Box::new(1));
} else {
u.x.0 = Box::new(0); // and this?
}
}
} |
#![feature(untagged_unions)]
union U {
x: (Box<u32>, Box<u32>),
y: Vec<u32>,
raw: [u64; 4]
}
fn main() {
// init u.raw -> init (u.x, u.y) -> init u
let mut u = U { raw: [0; 4] };
unsafe {
// uninit u.x.1 -> uninit u.x -> uninit (u.y, u.raw) -> uninit u
// u.x.0 is still initialized
drop(u.x.1);
if true {
// init u.y -> init (u.x, u.raw) -> init u
// u.y was uninitialized, not dropped on assignment
u.y = Vec::new(); // what does this drop?
} else if true {
// init u.x -> init (u.y, u.raw) -> init u
// u.x was uninitialized, not dropped on assignment
u.x = (Box::new(0), Box::new(1));
} else {
// assign u.x.0
// u.x.0 was initialized, dropped on assignment
u.x.0 = Box::new(0); // and this?
}
}
} I tried to describe the scheme in https://github.com/petrochenkov/rfcs/blob/e5266bd105f592f7408b8592c5c3deaccba7f1ec/text/1444-union.md#initialization-state and https://github.com/petrochenkov/rfcs/blob/e5266bd105f592f7408b8592c5c3deaccba7f1ec/text/1444-union.md#move-and-initialization-checking |
That... sounds like something that would be fun to implement. |
I think people will be confused by these run-time semantics. Should we prohibit replacing into partially-initialized unions? |
Quite likely a duplicate of #48962 |
cc @retep007 -- can you quickly test if this is a duplicate of #48962 ? |
@nikomatsakis It seems not to be related to #48962 . I could take a look at this |
When I tried to minimise example I found out that if either of |
What is the backtrace where the ICE occurs? |
Though reading the back-and-forth between @petrochenkov and @arielb1, it sounds like there is some confusion also as to what we should do upon assignment to a union field. I guess this has to do with whether to drop the pre-existing data or not (in part because the compiler doesn't have visibility into whether or not the union is initialized). I don't know that we can blame this ICE on NLL per se. |
This is not an NLL problem - it occurs even without Note that the move semantics were removed from the newest version of the union spec. However, I'm not sure that implementing the specification as written is the best idea - these move semantics interact in a very weird and inconsistent-seeming way with assignments. union U {
a: (Box<u32>, Box<u32>),
b: (Box<u32>, Box<u32>),
}
let mut u = U { a: (Box::new(0), Box::new(1)) };
// u.a & u.b are both initialized
u.b.0 = Box::new(2); // this would call a destructor
drop(u.a.0);
// u.a.0 is uninitialized, u.b is also now uninitialized,
println!("{:?}", u.a.1); // prints `1`
u.a.1 = Box::new(3); // this would call a destructor
// u.b is *still* uninitialized, same as before.
println!("{:?}", u.b.1); //~ ERROR use of moved value
u.b.1 = Box::new(4); // this would *not* call a destructor, and would leak a `Box` I feel that implementing this "as specified" will lead to a situation where it will be hard to predict whether writing to a union field will call a destructor or not. I can think of 2 ways of fixing this inconsistency:
|
cc #32836 - this has to resolved one way or another before |
@arielb1 I'm looking at the example you gave, and that seems like unsafe code doing something wrong, by accessing That said: this seems like it only applies to union fields that implement |
I'd rather prefer something like 1 with more errors to something like 2 with double frees, to keep "union newtypes" with a single field like |
FWIW, I still see the behavior of code in #41073 (comment) as logical and "following from the first principles", but I agree that nobody would want to deal with such puzzles in practice, so I'm all for generating more errors for sanity's sake even if they are "artificial" in some sense. |
I don't think union "newtypes" with a single field should behave like newtypes at all. Unions are inherently unsafe and only when accessed can we make any assumptions about what they contain. I think an "enum interpretation" is incorrect. Unions were added primarily for FFI purposes, and in C they most certainly do not have an "enum interpretation". For example, there are libraries that use unions as a way to be extensible -- as the library evolves, new union variants are added, and this is binary compatible with old clients that know about fewer variants. (IIRC it was @joshtriplett who brought this up.) A single-variant union just means that this is the only variant we know of, but unless we actually use that variant, we should not assume that this is the variant we are in. TBH the most reasonable behavior for assigning to union fields that I can come up with is to never drop anything. That's at least predictable, and assignment is unsafe after all. I think any scheme that attempts to track union initialization state is bound to backfire. And I am certainly extremely surprised that initializing a field makes its siblings initialized. I would have expected the exact opposite---initializing a union field deinitializes its siblings. Why does the RFC make this strange choice? That's even more surprising if I consider that you want to follow an "enum interpretation", where certainly initializing one variant deinitializes the others. One point I am missing from the RFC: What is the initialization state when a union goes into scope, e.g., when it is passed to a function? |
Well, to support transmuting with unions
rust-lang/rfcs#1897 contains a number of examples that serve as motivation for the currently implemented rules, but I never got enough time to push it over finish line.
To pass something (not necessarily a union) to a function you have to borrow or move/copy it and you can't borrow/move/copy something uninitialized, so all function parameters are in initialized state initially. (EDIT: here and above I'm talking about initialization state as it's tracked by the compiler statically (with move checker) and dynamically (with drop flags), not stuff like |
Ah, that's a good point. I hadn't thought of this because I did not really accept the "writing to a union field initializes the siblings" rule, but with that rule, this actually makes sense. However, doesn't that mean if I have two types that implement union Foo {
x: String,
y: Box<i32>
} and then I do something like let u = Foo { x: "abc".to_owned() };
u.y = Box::new(42); then everything explodes because this will cause The more I look at this, the more I think that maybe having union Foo {
x: ManuallyDrop<String>,
y: ManuallyDrop<Box<i32>>
}
Yeah, it's a huge mess :/ Also, I wasn't even ware of the extent of this RFC. Clearly I didn't do my research. Sorry for that! I hope to find some time to read it soon, that's lots of text. |
Yes, that's why assignments to u.y = Box::new(42); //~ ERROR assignment to non-`Copy` union field requires unsafe function or block |
Fair enough. This is still a footgun though, e.g. all this may happen in an |
I do think that unions with Drop fields are a massive footgun regardless. They do require |
I am reminded of what I recently said in the context of references to fields of a packed struct:
Notice that assigning to a union field is always safe in C; only reading it can yield UB. |
Unions in (standard) C/C++ have much stronger enum interpretation than in Rust, C and C++ have a notion of "active field" of a union and you can't even access non-active fields (i.e. transmute with unions) in general. (Of course, C/C++ have much looser static analysis so you can create and pass around uninitialized unions, but accessing them is still UB.) |
AFAIK the following is legal:
You cannot access non-active fields through pointers, but you are generally allowed to access it "through the union type". (The standard is somewhat ambiguous about this, but Wikipedia agrees and GCC explicitly says it is okay.) Above, we were talking about accessing the union directly, i.e. through the union type, in which case C/C++ does not apply restrictions based on the "active variant". The restriction for accessing the "wrong" field through a pointer is necessary to make type-based alias analysis work. I didn't check the old standards, but I'd guess before the aliasing restrictions were introduced, unions were purely access-based as well. |
@RalfJung rust-lang/rfcs#1897 proposes roughly the same model, i.e. "enum" interpretation is the base, so there's always an active field that's guaranteed to be valid, but you can also access inactive fields at your own risk as an extension. |
I actually disagree. I think the enum one exists solely to enable type-based alias analysis (which we decisively do not want in Rust); if it weren't for that, C would be just fine without any notion of an "active variant".
I'm not at all happy about this proposal. This extra hidden state makes things much more complicated. I think a model where unions are just bytes, and then on every read/write we decide what to do with these bytes, is much simpler. For example, imagine you'd want to extend miri to actually be able to fully check violations of these rules for accessing union fields. I think that is a very desirable feature to have---it's not just practically useful (we could use miri to test for UB violations in unsafe code), it also means that it is even possible to have such a checker. IMO the fact that such a checker is likely impossible for C is one of the biggest faults of the language, and a mistake I'd like to not repeat in Rust. Given that we don't want TBAA, I see no reason to adapt an "enum interpretation". It's a much more complicated model, and I don't see any tangible benefits spelled out in the RFC. |
Full backtrace can be seen on the playground
https://play.rust-lang.org/?gist=bc93f5b222b09b31d3fd286cfc10858a&version=nightly&backtrace=2
The text was updated successfully, but these errors were encountered: