Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider deprecating and/or modifying behavior of std::env::set_var #90308

Closed
leo60228 opened this issue Oct 26, 2021 · 40 comments · Fixed by #124636
Closed

Consider deprecating and/or modifying behavior of std::env::set_var #90308

leo60228 opened this issue Oct 26, 2021 · 40 comments · Fixed by #124636
Labels
A-process Area: `std::process` and `std::env` C-bug Category: This is a bug. E-help-wanted Call for participation: Help is requested to fix this issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@leo60228
Copy link
Contributor

Observing an environment variable concurrently with a call to setenv constitutes a data race. Rust currently handles this by using a mutex, but that only protects against functions in std::env. My interpretation of POSIX (which appears to be the same as the glibc developers') is that any libc function is allowed to call getenv. This has already caused problems with getaddrinfo in #27970.
Additionally, a large amount of C code calls getenv but is otherwise thread-safe. While this isn't necessarily the standard library's issue, it's impossible for third-party libraries to soundly use this code without requiring the user to certify that this isn't an issue via an unsafe block. Some examples of this happening in practice are in time-rs/time#293 and rust-lang/flate2-rs#272.
rustsec/advisory-db#926 had several proposals on how this could be handled brought up.

Make std::env::set_var unsafe

This is arguably the cleanest solution to the issue. This could be considered a soundness fix, which would make it an option, but the ecosystem impact feels like it'd be too big.

Don't actually call setenv

std could keep track of the environment itself, and make set_var changes only visible to var and subprocesses. This is probably the way to solve the issue with the least impact on existing code, but the behavior is somewhat unexpected and not zero-cost.

Only call setenv in single-threaded code

This would reduce the impact further, but seems even less expected for negligible benefit.

Deprecate std::env::set_var

This would make it clear that setting an environment variable in the current process is discouraged. It could also be combined with not actually calling setenv, which would be my preferred solution to this issue.

@vi
Copy link
Contributor

vi commented Oct 29, 2021

Shall it be made unsafe fn, just like std::os::unix::process::CommandExt::pre_exec?

@leo60228
Copy link
Contributor Author

Shall it be made unsafe fn, just like std::os::unix::process::CommandExt::pre_exec?

The discussion in #39575 is definitely helpful here, but I don't think the same conclusions can reasonably apply to std::env::set_var.

@tarcieri
Copy link
Contributor

tarcieri commented Oct 29, 2021

Indeed. Notably they retroactively changed pre_exec to unsafe based on a justification that there weren't real-world uses of it.

That's not the case with set_var: https://github.com/search?l=Rust&type=Code&q=set_var

I think to accomplish a similar effect, set_var would have to be deprecated, and a new unsafe fn introduced in its place (which is more or less combining options 1 & 4 proposed in the OP)

@leo60228
Copy link
Contributor Author

I think to accomplish a similar effect, set_var would have to be deprecated, and a new unsafe fn introduced in its place (which is more or less combining options 1 & 4 proposed in the OP)

That's what happened with pre_exec. before_exec is the old name.

@crlf0710 crlf0710 added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Nov 2, 2021
@leo60228
Copy link
Contributor Author

For reference, .NET takes option #2.

@cgwalters
Copy link
Contributor

Hi, there's a bunch of discussion on this issue, but I think https://internals.rust-lang.org/t/synchronized-ffi-access-to-posix-environment-variable-functions/15475 has the most information right now.

@briansmith
Copy link
Contributor

Hi, there's a bunch of discussion on this issue, but I think https://internals.rust-lang.org/t/synchronized-ffi-access-to-posix-environment-variable-functions/15475 has the most information right now.

There is a lot of discussion in #27970 that's really about this issue. This is largely my fault because I didn't know this issue existed.

@briansmith
Copy link
Contributor

briansmith commented Jan 4, 2022

I looked at an application that is using set_var in multiple places:

  • To provide a default value for RUST_LOG=debug when it wasn't already set. We'd prefer a way to do this without setting the environment variable.
  • To force RUST_BACKTRACE=full regardless of what the environment says. Again, we'd prefer a way to do this without setting an environment variable.
  • To test functions that read from the environment. At some point we should have a new solution for supporting this.
  • To do shell scripting, but in Rust. Not a big deal for us to use an unsafe function for this, though not ideal. [Edit: We can use Command::{env,envs}.]

@Nemo157
Copy link
Member

Nemo157 commented Jan 4, 2022

  • To provide a default value for RUST_LOG=debug when it wasn't already set. We'd prefer a way to do this without setting the environment variable.

https://docs.rs/env_logger/latest/env_logger/struct.Env.html#method.filter_or

  • To force RUST_BACKTRACE=full regardless of what the environment says. Again, we'd prefer a way to do this without setting an environment variable.

#79085

EDIT: Actually, this only affects the library backtrace, the panic runtime probably does also need a in-code setter too.

@leo60228
Copy link
Contributor Author

leo60228 commented Jan 4, 2022 via email

@briansmith
Copy link
Contributor

First, it turns out that our use of RUST_LOG was a bit different than I described. I agree current APIs are sufficient, at least for the logging crates we use.

As far as testing goes, if the serial_test crate truly ensures no other threads are doing anything during the test, then that might work for our test uses of set_var and remove_var.

@briansmith
Copy link
Contributor

In order to use serial_test to make set_var and remove_var in a test safe, the tests would have to be integration tests and where there are no other tests in the same file (really, resultant binary), so that all the tests in the binary are marked #[serial], because #[serial] from the serial_test crate only serializes #[serial] tests; it doesn't prevent non-#[serial] tests from executing concurrently with #[serial] tests. Further, all of those #[serial] tests would need to avoid spawning threads (that outlive the test). So it'd be pretty error-prone.

@RalfJung
Copy link
Member

RalfJung commented Jan 5, 2022

To provide a default value for RUST_LOG=debug when it wasn't already set. We'd prefer a way to do this without setting the environment variable.

Miri needs something similar here, and I think we somehow have two loggers here in this process so I don't know if we can use the env_logger API as a replacement.

@cgwalters
Copy link
Contributor

Further, all of those #[serial] tests would need to avoid spawning threads (that outlive the test).

And even inside such a serial test, any use of e.g. tokio or rayon could spawn threads in a not immediately obvious way. So I wouldn't bet on this approach as a good mitigation.

@briansmith
Copy link
Contributor

@cgwalters wrote:

And even inside such a serial test, any use of e.g. tokio or rayon could spawn threads in a not immediately obvious way. So I wouldn't bet on this approach as a good mitigation.

That's right. For the tests we have, what I suggested above would work. What we're more likely to do is refactor the code so that reading the environment is done separately from the functionality being tested, and then run the tests using a private API that allows us to pass in a fake environment stored in a HashMap or similar. So we have workarounds that are good enough for a lot of applications, and I don't think fully solving the testing issue should block the deprecation, as long as an unsafe variant is added during the deprecation.

@briansmith
Copy link
Contributor

Miri needs something similar here, and I think we somehow have two loggers here in this process so I don't know if we can use the env_logger API as a replacement.

It seems like that could be solved by just making rustc_driver's log initialization API more flexible, and then taking advantage of that flexibility?

@RalfJung
Copy link
Member

RalfJung commented Jan 6, 2022

I am not sure how easy that would be, but yeah something like that would probably be needed.

For the other use of set_var in the cargo-miri wrapper, it would be really nice to have a way to say "all future uses of Command should set this env var"...

hawkw added a commit to linkerd/linkerd2-proxy that referenced this issue Jan 20, 2022
Per rust-lang/rust#90308, this is potentially a data race. In practice,
I don't think it was actually problematic here, but it also wasn't doing
anything of value, so we should remove it.

This is currently the only `env::set_var` or `env::remove_var` call in
the proxy.

Closes linkerd/linkerd2#7651
hawkw added a commit to linkerd/linkerd2-proxy that referenced this issue Jan 20, 2022
Per rust-lang/rust#90308, this is potentially a data race. In practice,
I don't think it was actually problematic here, but it also wasn't doing
anything of value, so we should remove it.

This is currently the only `env::set_var` or `env::remove_var` call in
the proxy.

To prevent uses of these functions in the future, I also added a 
`disallowed-methods` configuration in `.clippy.toml` to emit a
warning for any uses of `std::env::set_var` and
`std::env::remove_var`. Unfortunately, this required adding a
`deny` attribute basically everywhere, which is why this diff touches
so many files.

Closes linkerd/linkerd2#7651
@evanj
Copy link
Contributor

evanj commented Oct 31, 2023

I ran into the Go version of this problem (see golang/go#63567 ). I think realistically, this is a small problem, and the solutions described in this thread seem reasonable. However, I am unreasonably annoyed that this problem with the C standard has been known for decades, and we can't get that fixed. I think ultimately we need the C standard to adopt a solution.

My unrealistic idea: I wish there was a way that Rust, Go and other languages that care about this (e.g. Julia? JuliaLang/julia#34726) could share a "safe" C implementation, as a way to either route around the C standard, and encourage the C standard to evolve a solution. I'm mentioning it in case others think this is interesting, because I'm annoyed enough to spend time on this.

@tarcieri
Copy link
Contributor

@evanj there was some discussion of trying to improve the C API on IRLO: https://internals.rust-lang.org/t/thread-safe-environment-variable-mutation/17817

@RalfJung
Copy link
Member

RalfJung commented Oct 31, 2023 via email

@comex
Copy link
Contributor

comex commented Nov 2, 2023

I was wondering if there was some way to implement a tiny library such that

  • any executable or dylib could statically link the library, but
  • if multiple copies of the library ended up in the same process (because multiple images each had their own statically linked copy), then the copies would somehow detect each other and agree on a single piece of shared state.

By being statically linked, the library could be included with the Rust standard library – and any other language runtimes that wanted to participate – without adding a new user-visible dependency. By sharing state, the library copies could all have a single source of truth for the current process's "thread-safe environment".

Sharing state doesn't seem that hard: on Unix, a global weak symbol would work in almost all cases. But there's at least one case where global weak symbols end up not actually being globally unique: when a dylib is linked with linker options that limit its exported symbols list. Is that a deal-breaker? Maybe not?

I also don't know how this would work on Windows, which doesn't support global weak symbols. Maybe it would suffice to not invent a new source of truth there and just use the existing thread-safe GetEnvironmentVariable/SetEnvironmentVariable.

…This is all probably a pipe dream though.

@ChrisDenton
Copy link
Member

It wouldn't be needed on Windows because, as you say, modifying then environment is already thread safe.

@briansmith
Copy link
Contributor

This issue is now over 2 years old (and we've known about it a lot longer than that). It took glibc 3 years from when getrandom was added to Linux to add support, and even now (9 years later) we still can't rely on getrandom without fallbacks. An interoperable memory-safe replacement for these APIs is realistically a decade-long (most likely, multi-decade) endeavor where most of the work would need to happen in the C/POSIX standardization and implementation space, not here. I think it would be great if somebody were to do that, but I don't think we should leave these APIs as they are for years longer.

I would also say that I don't have much hope that there would be a deprecated_safe in Rust in a timely manner. Seems like we should pursue solutions that don't involve inventing new infrastructure, such as what was suggested long ago: create new unsafe APis with new names that do the same, mark the existing APIs #[deprecated] and have them forward to the new unsafe functions, and have the deprecation note for each function reference its replacement.

@jhpratt
Copy link
Member

jhpratt commented Nov 2, 2023

#[deprecated_safe] would be great, but the person who was working on the implementation deleted their GitHub account; I have the branches backed up on my fork of rust-lang/rust.

There is a PR open (#94619) to add unsafe APIs with new names. I have no problem updating it to resolve the conflicts and whatever else may be necessary. It's currently marked as blocked on whether #[deprecated_safe] was going to happen, but that doesn't seem likely in the near future.

bors added a commit to rust-lang-ci/rust that referenced this issue Nov 15, 2023
…r, r=TaKO8Ki

rustc_log: provide a way to init logging based on the values, not names, of the env vars

Miri wants to affect how rustc does logging. So far this required setting environment variables before calling `rustc_driver::init_rustc_env_logger`. However, `set_var` is a function one should really [avoid calling](rust-lang#90308), so this adds the necessary APIs to rustc such that Miri can just pass it the *values* of all the log-relevant environment variables, rather than having to change the global environment.
bors added a commit to rust-lang/miri that referenced this issue Nov 16, 2023
rustc_log: provide a way to init logging based on the values, not names, of the env vars

Miri wants to affect how rustc does logging. So far this required setting environment variables before calling `rustc_driver::init_rustc_env_logger`. However, `set_var` is a function one should really [avoid calling](rust-lang/rust#90308), so this adds the necessary APIs to rustc such that Miri can just pass it the *values* of all the log-relevant environment variables, rather than having to change the global environment.
@Enselic Enselic added the C-bug Category: This is a bug. label Dec 3, 2023
lnicola pushed a commit to lnicola/rust-analyzer that referenced this issue Apr 7, 2024
rustc_log: provide a way to init logging based on the values, not names, of the env vars

Miri wants to affect how rustc does logging. So far this required setting environment variables before calling `rustc_driver::init_rustc_env_logger`. However, `set_var` is a function one should really [avoid calling](rust-lang/rust#90308), so this adds the necessary APIs to rustc such that Miri can just pass it the *values* of all the log-relevant environment variables, rather than having to change the global environment.
RalfJung pushed a commit to RalfJung/rust-analyzer that referenced this issue Apr 27, 2024
rustc_log: provide a way to init logging based on the values, not names, of the env vars

Miri wants to affect how rustc does logging. So far this required setting environment variables before calling `rustc_driver::init_rustc_env_logger`. However, `set_var` is a function one should really [avoid calling](rust-lang/rust#90308), so this adds the necessary APIs to rustc such that Miri can just pass it the *values* of all the log-relevant environment variables, rather than having to change the global environment.
tbu- added a commit to tbu-/rust that referenced this issue May 2, 2024
Allow calling these functions without `unsafe` blocks in editions up
until 2021, but don't trigger the `unused_unsafe` lint for `unsafe`
blocks containing these functions.

Fixes rust-lang#27970.
Fixes rust-lang#90308.
tbu- added a commit to tbu-/rust that referenced this issue May 3, 2024
Allow calling these functions without `unsafe` blocks in editions up
until 2021, but don't trigger the `unused_unsafe` lint for `unsafe`
blocks containing these functions.

Fixes rust-lang#27970.
Fixes rust-lang#90308.
bors added a commit to rust-lang-ci/rust that referenced this issue May 4, 2024
Make `std::env::{set_var, remove_var}` unsafe in edition 2024

Allow calling these functions without `unsafe` blocks in editions up until 2021, but don't trigger the `unused_unsafe` lint for `unsafe` blocks containing these functions.

Fixes rust-lang#27970.
Fixes rust-lang#90308.
tbu- added a commit to tbu-/rust that referenced this issue May 13, 2024
Allow calling these functions without `unsafe` blocks in editions up
until 2021, but don't trigger the `unused_unsafe` lint for `unsafe`
blocks containing these functions.

Fixes rust-lang#27970.
Fixes rust-lang#90308.
CC rust-lang#124866.
tbu- added a commit to tbu-/rust that referenced this issue May 13, 2024
Allow calling these functions without `unsafe` blocks in editions up
until 2021, but don't trigger the `unused_unsafe` lint for `unsafe`
blocks containing these functions.

Fixes rust-lang#27970.
Fixes rust-lang#90308.
CC rust-lang#124866.
tbu- added a commit to tbu-/rust that referenced this issue May 24, 2024
Allow calling these functions without `unsafe` blocks in editions up
until 2021, but don't trigger the `unused_unsafe` lint for `unsafe`
blocks containing these functions.

Fixes rust-lang#27970.
Fixes rust-lang#90308.
CC rust-lang#124866.
tbu- added a commit to tbu-/rust that referenced this issue May 24, 2024
Allow calling these functions without `unsafe` blocks in editions up
until 2021, but don't trigger the `unused_unsafe` lint for `unsafe`
blocks containing these functions.

Fixes rust-lang#27970.
Fixes rust-lang#90308.
CC rust-lang#124866.
fmease added a commit to fmease/rust that referenced this issue May 30, 2024
Make `std::env::{set_var, remove_var}` unsafe in edition 2024

Allow calling these functions without `unsafe` blocks in editions up until 2021, but don't trigger the `unused_unsafe` lint for `unsafe` blocks containing these functions.

Fixes rust-lang#27970.
Fixes rust-lang#90308.
CC rust-lang#124866.
bors added a commit to rust-lang-ci/rust that referenced this issue May 30, 2024
Make `std::env::{set_var, remove_var}` unsafe in edition 2024

Allow calling these functions without `unsafe` blocks in editions up until 2021, but don't trigger the `unused_unsafe` lint for `unsafe` blocks containing these functions.

Fixes rust-lang#27970.
Fixes rust-lang#90308.
CC rust-lang#124866.
@bors bors closed this as completed in 5d8f9b4 May 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-process Area: `std::process` and `std::env` C-bug Category: This is a bug. E-help-wanted Call for participation: Help is requested to fix this issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.