Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add mem_uninitialized lint to FCW lint on uses #100342

Closed
wants to merge 6 commits into from

Conversation

5225225
Copy link
Contributor

@5225225 5225225 commented Aug 9, 2022

This PR adds a lint that will detect uses of std::mem::uninitialized (using the mem_uninitialized diagnostic item) that are not definitely of "basic" types like int/float/raw pointer, and emit a FCW lint for it.

For generic cases, we can't say it's definitely UB, just that it might be.

Background: #98862 (comment)

@rustbot rustbot added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Aug 9, 2022
@rust-highfive
Copy link
Collaborator

r? @wesleywiser

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Aug 9, 2022
@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@5225225 5225225 force-pushed the mem-uwuintiailized branch from 90c3c9f to b427392 Compare August 10, 2022 08:10
@rust-log-analyzer

This comment has been minimized.

@rustbot
Copy link
Collaborator

rustbot commented Aug 10, 2022

Hey! It looks like you've submitted a new PR for the library teams!

If this PR contains changes to any rust-lang/rust public library APIs then please comment with @rustbot label +T-libs-api -T-libs to tag it appropriately. If this PR contains changes to any unstable APIs please edit the PR description to add a link to the relevant API Change Proposal or create one if you haven't already. If you're unsure where your change falls no worries, just leave it as is and the reviewer will take a look and make a decision to forward on if necessary.

Examples of T-libs-api changes:

  • Stabilizing library features
  • Introducing insta-stable changes such as new implementations of existing stable traits on existing stable types
  • Introducing new or changing existing unstable library APIs (excluding permanently unstable features / features without a tracking issue)
  • Changing public documentation in ways that create new stability guarantees
  • Changing observable runtime behavior of library APIs

@rust-log-analyzer

This comment has been minimized.

@5225225
Copy link
Contributor Author

5225225 commented Aug 10, 2022

Did some tests compiling some representative rust programs, and it's worth noting that this would cause likely a fair chunk of nontrivial programs to warn (For example, itoa 0.4.8 warns, and pre 1.0 itoa makes up a good 1/3 of the downloads at the moment).

Also the message might need adjusting, it currently says warning: the following packages contain code that will be rejected by a future version of Rust: bytes v0.4.12, itoa v0.4.8, which... isn't quite true? Well, it might panic in a future version of rust, which I guess is rejection?

I still feel a warning is really the best choice to push crates to update their dependencies / remove their use of mem::uninit, but it's also probably worth keeping in mind just how many people will see it.

@5225225
Copy link
Contributor Author

5225225 commented Aug 10, 2022

and now that i think of it this is probably more @rustbot label -T-compiler T-lang

i think. maybe not API? Not sure!

@rustbot rustbot added T-lang Relevant to the language team, which will review and decide on the PR/issue. and removed T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Aug 10, 2022
@wesleywiser
Copy link
Member

wesleywiser commented Aug 11, 2022

Nominating for Lang Team discussion: should we lint on (most) uses of std::mem::uninitalized?

@wesleywiser wesleywiser added the I-lang-nominated Nominated for discussion during a lang team meeting. label Aug 11, 2022
@5225225
Copy link
Contributor Author

5225225 commented Aug 11, 2022

This PR, as implemented, would be all uses, even definitely or possibly non-UB ones, since at the call-site, we might not know the type, and nearly all uses of mem::uninit are UB anyways. The best we could likely do and still be useful is to skip known non-UB calls, of which I doubt there really are any.

And yeah, I don't know if this is T-Lang or T-Libs or both, but #98862 (comment) was nominated for T-libs and consensus seemed to be that a FCW was a good idea.

@Mark-Simulacrum
Copy link
Member

Mark-Simulacrum commented Aug 11, 2022

Did some tests compiling some representative rust programs, and it's worth noting that this would cause likely a fair chunk of nontrivial programs to warn.

Can we get a better understanding of the actual path most users will have to fixing this? Is it cargo update, or do they need to actually go and edit their Cargo.toml (moving to an entirely different crate or across semver versions)? And, if the majority of users do need to actually make a semver-incompatible change here, is that something we can help with (e.g., by pushing some key crates to release point releases of older versions)?


I am pretty wary of false-positive warnings for uses of uninitialized that aren't currently UB, particularly if we're telling a very large amount of people to take some non-trivial action (e.g., more than cargo update). I think that sets a bad precedent and may discourage people from trusting such notices in the future.

For example, looking at itoa (since you mentioned it), I believe we're warning as a result of this call. Given we've merged #99182, this is just slow but not actually language UB anymore. It's debatable whether it's library UB -- but it seems like, in general, not something we need a FCW to warn users to move off of in their dependencies: doing so would cause more noise than benefit, I think.

It seems likely that we can detect at least this kind of case (concrete types, no generics), so I would expect it to be possible to avoid linting here. Maybe the invalid_value lint could be reused here (not familiar with its exact semantics). If that buys us avoiding warnings for ~10-20% of Rust users, that's still very significant, and seems not unlikely.

@5225225
Copy link
Contributor Author

5225225 commented Aug 12, 2022

And, if the majority of users do need to actually make a semver-incompatible change here, is that something we can help with (e.g., by pushing some key crates to release point releases of older versions)?

In the case of itoa specifically, this was brought up upstream in dtolnay/itoa#36 with the response

People getting a FCW for using an ancient version of the library sounds like a good outcome to me.

But yeah, the fix was made on a breaking change release of the library, (0.4.8 -> 1.0.0), so cargo update is not enough for direct users. Not too many libraries (on crates.io) actually use itoa directly, the majority of the downloads of pre-1.0 are from:

  • csv (bumped in git to 1.0 but not released, will be a semver compatible change that nearly all of its dependencies will get)
  • num-format which looks to be abandoned (no git activity for 2 years, 0.4.1 on git but not released)
  • redis: bumped in git but unreleased, a large number of crates not using the latest version

Of the 1.7 million monthly downloads of itoa 0.4.8, that accounts for around 1.6 million of them.

itoa isn't the only crate that uses mem::uninitialized to make [u8; N]'s but it's the major one. http is the other one that I can think of at the moment, and that has a semver compatible fix that a lot of crates using it will get (https://lib.rs/crates/http/rev). There's a large number of 0.1.x dependents, but they have a relatively small download count.

As for more UB cases, a fair few cases of UB were found in #99389 (even where we needed to allow uninit &[u8]). It was mostly smallvec and crossbeam, the triage comment is #99389 (comment).


Yeah, we could use a modified form of the invalid_value lint here. Invalid_value lints about known to be UB calls, but it also lints for mem::zeroed and MaybeUninit::uninit().assume_init() (and even transmute(0), for zeroed). So for the specific case of "get people off mem::uninitialized", it wouldn't be quite appropriate. And it won't lint on any UB calls to

unsafe fn uninit<T>() -> T { mem::uninitialized() }

We could do a very targeted fix of "if it's known to be a [u8; N], ignore it" which is strange but will capture all the cases of "this is a buffer the caller didn't want to initialize", which is typically what mem::uninitialized is used for, but not ignore much else.

@5225225
Copy link
Contributor Author

5225225 commented Aug 12, 2022

After looking at more failures (by using the https://github.com/mTvare6/hello-world.rs repo as a repo that has a lot of dependencies, so the check runs on a lot of somewhat representative code), ignoring just [u8; N] might not work, since a lot of crates will make ints or structs with all ints, with mem::uninitialized().

I'll try and write a more targeted check, as a "invalid_values but only caring about mem::uninitialized and falling back to linting, instead of ignoring". For reference, the current set of hit crates on that repo is bytes v0.4.12 dtoa v0.4.8 generic-array v0.12.4 generic-array v0.13.3 glsl-layout v0.3.2 hyper v0.13.10 itoa v0.4.8 json v0.11.15 mio v0.6.23 nalgebra v0.19.0 nix v0.14.1 parking_lot v0.4.8 parking_lot v0.9.0 parking_lot_core v0.2.14 winit v0.19.5. Most of them being for ints, as far as I could see.

@5225225
Copy link
Contributor Author

5225225 commented Aug 12, 2022

Ignoring just known to be ints/floats/raw pointers (so a generic T will still lint) gets us generic-array v0.12.4, generic-array v0.13.3, glsl-layout v0.3.2, hyper v0.13.10, nalgebra v0.19.0, nix v0.14.1, winit v0.19.5 list of warning crates (I removed the ones that are actually for procedural-masquarade, that being rental and fluent-bundle (which uses rental)).

@rust-log-analyzer

This comment has been minimized.

/// ```rust,compile_fail
/// #![deny(mem_uninitialized)]
/// fn main() {
/// let x: i32 = unsafe { std::mem::uninitialized() };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is blocked on #98919.


/// Return `None` only if we are sure this type does
/// allow being left uninitialized.
fn ty_find_init_error<'tcx>(cx: &LateContext<'tcx>, ty: Ty<'tcx>) -> Option<InitError> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this so clever?

I think the idea is to just warn on all uses of this function. Basically this is a very strong form of deprecation. We might even do this via a new attribute rather than a specific lint -- #[extremely_deprecated] or so, which makes the existing deprecation lint into an FCW lint.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, looks like this was specifically requested by @Mark-Simulacrum. However my understanding from previous discussions was that we didn't want to differentiate here -- we want to slowly work towards removing that function entirely. If we had a time machine we'd erase it from history; lacking that we will remove it as much as we possibly can.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm not sure if we agreed on exactly when it should lint. #100342 (comment) seemed to say it's better off ignoring cases like [u8; 64] or similar.

"Basically just a lint looking for use of the mem_uninitialized diagnostic item" is what this was originally implemented as, and I can revert it back to that if we agree that's fine.

But if we wanted to reduce noise, then we could ignore things that are not UB (assuming the 0x01 filling or a hypothetical freeze).

Personally I think we've had MaybeUninit for long enough and that a full-on FCW for any mention of mem::uninitialized is warranted, but it would be noisy for some time while the ecosystem upgrades.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at it again, the guidance we got from the lang team is ambiguous, so I asked for clarification in that thread.

@nikomatsakis
Copy link
Contributor

nikomatsakis commented Aug 16, 2022

@rustbot label -I-lang-nominated -T-lang +T-libs-api +I-libs-api-nominated

Given that this issue is spawned from #98862, which is tagged as libs-api, we are relabeling this to libs-api.

Our sense is that the decision of what is UB remains T-lang purview, but the decision of the stdlib surface (e.g., whether to deprecate, remove things, etc) is T-libs-api.

@rustbot rustbot added I-libs-api-nominated Nominated for discussion during a libs-api team meeting. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. and removed I-lang-nominated Nominated for discussion during a lang team meeting. labels Aug 16, 2022
@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Aug 27, 2022
@5225225
Copy link
Contributor Author

5225225 commented Aug 28, 2022

Errors now look like:

(hyper, only the first one since there's 2 instances of the same exact error here)

The package `hyper v0.13.10` currently triggers the following future incompatibility lints:
> warning: the type `[httparse::Header; 100]` does not permit being left uninitialized
>    --> /home/jess/.cargo/registry/src/github.com-1ecc6299db9ec823/hyper-0.13.10/src/proto/h1/role.rs:120:77
>     |
> 120 |             let mut headers: [httparse::Header<'_>; MAX_HEADERS] = unsafe { mem::uninitialized() };
>     |                                                                             ^^^^^^^^^^^^^^^^^^^^
>     |                                                                             |
>     |                                                                             this code causes undefined behavior when executed
>     |                                                                             help: use `MaybeUninit<T>` instead, and only call `assume_init` after initialization is done
>     |
>     = note: `#[allow(mem_uninitialized)]` on by default
>     = note: for more information, see FIXME: fill this in
> note: references must be non-null (in this struct field)
>    --> /home/jess/.cargo/registry/src/github.com-1ecc6299db9ec823/httparse-1.7.1/src/lib.rs:660:5
>     |
> 660 |     pub name: &'a str,
>     |     ^^^^^^^^^^^^^^^^^
> 

Old http and itoa do not lint, as all they do is make uninit [u8; N].

Old crossbeam looks like

The package `crossbeam v0.2.12` currently triggers the following future incompatibility lints:
> warning: the type `T` is generic, and might not permit being left uninitialized
>   --> /home/jess/.cargo/registry/src/github.com-1ecc6299db9ec823/crossbeam-0.2.12/src/sync/ms_queue.rs:66:45
>    |
> 66 |             payload: Payload::Data(unsafe { mem::uninitialized() }),
>    |                                             ^^^^^^^^^^^^^^^^^^^^
>    |                                             |
>    |                                             this code causes undefined behavior when executed
>    |                                             help: use `MaybeUninit<T>` instead, and only call `assume_init` after initialization is done
>    |
>    = note: `#[allow(mem_uninitialized)]` on by default
>    = note: for more information, see FIXME: fill this in
>    = note: type might not be allowed to be left uninitialized
> 
> warning: the type `[std::cell::UnsafeCell<(T, std::sync::atomic::AtomicBool)>; 32]` is generic, and might not permit being left uninitialized
>     --> /home/jess/.cargo/registry/src/github.com-1ecc6299db9ec823/crossbeam-0.2.12/src/sync/seg_queue.rs:40:28
>      |
> 40   |             data: unsafe { mem::uninitialized() },
>      |                            ^^^^^^^^^^^^^^^^^^^^
>      |                            |
>      |                            this code causes undefined behavior when executed
>      |                            help: use `MaybeUninit<T>` instead, and only call `assume_init` after initialization is done
>      |
>      = note: `#[allow(mem_uninitialized)]` on by default
>      = note: for more information, see FIXME: fill this in
> note: type might not be allowed to be left uninitialized (in this struct field)
>     --> /home/jess/src/rust/library/core/src/cell.rs:1869:5
>      |
> 1869 |     value: T,
>      |     ^^^^^^^^
> 

@RalfJung
Copy link
Member

So when I have something like what hyper does in my own crate, I now get two warnings -- one from invalid_values, and one from this one?

@5225225
Copy link
Contributor Author

5225225 commented Aug 28, 2022

Yeah. It's unfortunate.

I could not emit an invalid_value warning if the mem_uninitialized warning would fire (I guess, just query the mem_uninitialized's ty_find_init_error?). Means that if you specifically silence the mem_uninitialized warning, but don't silence invalid_value, you'll get less warnings for things that both care about (But the FCW warning still applies, you'll be told to cargo report future-incompatibilities regardless of lint level). That might be an acceptable edge case.

@5225225
Copy link
Contributor Author

5225225 commented Aug 28, 2022

I could not emit an invalid_value warning if the mem_uninitialized warning would fire

This is now implemented.

Output from doing

fn main() {
    let _x: char = unsafe { std::mem::uninitialized() };
}

looks like

# cargo +source check
    Checking abcdef v0.1.0 (/tmp/scratchaBCBdAv6h)
warning: use of deprecated function `std::mem::uninitialized`: use `mem::MaybeUninit` instead
 --> src/main.rs:2:39
  |
2 |     let _x: char = unsafe { std::mem::uninitialized() };
  |                                       ^^^^^^^^^^^^^
  |
  = note: `#[warn(deprecated)]` on by default

warning: the type `char` does not permit being left uninitialized
 --> src/main.rs:2:29
  |
2 |     let _x: char = unsafe { std::mem::uninitialized() };
  |                             ^^^^^^^^^^^^^^^^^^^^^^^^^
  |                             |
  |                             this code causes undefined behavior when executed
  |                             help: use `MaybeUninit<T>` instead, and only call `assume_init` after initialization is done
  |
  = note: `#[warn(mem_uninitialized)]` on by default
  = note: for more information, see FIXME: fill this in
  = note: characters must be a valid Unicode codepoint

warning: `abcdef` (bin "abcdef") generated 2 warnings
    Finished dev [unoptimized + debuginfo] target(s) in 0.07s
warning: the following packages contain code that will be rejected by a future version of Rust: abcdef v0.1.0 (/tmp/scratchaBCBdAv6h)
note: to see what the problems were, use the option `--future-incompat-report`, or run `cargo report future-incompatibilities --id 8`

Might be a good idea to merge the checks for invalid_value and mem_uninitialized and pass in another argument indicating which form of the checks we're doing, since right now it is duplicated code.

@bors
Copy link
Contributor

bors commented Aug 30, 2022

☔ The latest upstream changes (presumably #98919) made this pull request unmergeable. Please resolve the merge conflicts.

@wesleywiser
Copy link
Member

The compiler changes look good to me but since this is T-libs-api, I'm going to hand it off to them at this time to make the final review of what should/should not trigger the lint.

r? rust-lang/libs

@rust-highfive rust-highfive assigned thomcc and unassigned wesleywiser Sep 7, 2022
@thomcc
Copy link
Member

thomcc commented Sep 7, 2022

I'm in favor of a FCW warning here, but will punt to someone on libs-api.

r? @joshtriplett

@rust-highfive rust-highfive assigned joshtriplett and unassigned thomcc Sep 7, 2022
@5225225
Copy link
Contributor Author

5225225 commented Sep 8, 2022

Created #101570 as a tracking issue for the FCW, and put that in the code.

@bors
Copy link
Contributor

bors commented Sep 28, 2022

☔ The latest upstream changes (presumably #102377) made this pull request unmergeable. Please resolve the merge conflicts.

@Noratrieb
Copy link
Member

any updates here? @joshtriplett @5225225

@Noratrieb
Copy link
Member

I don't think it's a good idea to add an FCW for mem::uninitialized now. The danger of the function has been diffused, it just returns 0x01 now. Programs using it are completely fine (though they are a little bit slower than they originally anticipated :D).
That said, we should FCW on invalid uses of MaybeUninit::uninit().assume_init() because that one is actual UB and should be fixed in code. #100423 tries to make it panic, but imo we should FCW first before making it panic.

@5225225
Copy link
Contributor Author

5225225 commented Mar 27, 2023

Makes sense. I'll adjust the code to instead lint for that.

@5225225
Copy link
Contributor Author

5225225 commented May 29, 2023

Cleaning up old PRs that I won't get to, don't have the time. Happy to help anyone pick this up again, though.

@5225225 5225225 closed this May 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.