Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make HashMap fall back to RtlGenRandom if BCryptGenRandom fails #96917

Merged
merged 3 commits into from
May 18, 2022

Conversation

marti4d
Copy link
Contributor

@marti4d marti4d commented May 10, 2022

With PR #84096, Rust std::collections::hash_map::RandomState changed from using RtlGenRandom() (msdn) to BCryptGenRandom() (msdn) as its source of secure randomness after much discussion (here, among other places).

Unfortunately, after that PR landed, Mozilla Firefox started experiencing fairly-rare crashes during startup while attempting to initialize the env_logger crate. (docs for env_logger) The root issue is that on some machines, BCryptGenRandom() will fail with an Access is denied. (os error 5) error message. (Bugzilla issue 1754490) (Discussion in issue #94098)

Note that this is happening upon startup of Firefox's unsandboxed Main Process, so this behavior is different and separate from previous issues (like this) where BCrypt DLLs were blocked by process sandboxing. In the case of sandboxing, we knew we were doing something abnormal and expected that we'd have to resort to abnormal measures to make it work.

However, in this case we are in a regular unsandboxed process just trying to initialize env_logger and getting a panic. We suspect that this may be caused by a virus scanner or some other security software blocking the loading of the BCrypt DLLs, but we're not completely sure as we haven't been able to replicate locally.

It is also possible that Firefox is not the only software affected by this; we just may be one of the pieces of Rust software that has the telemetry and crash reporting necessary to catch it.

I have read some of the historical discussion around using BCryptGenRandom() in Rust code, and I respect the decision that was made and agree that it was a good course of action, so I'm not trying to open a discussion about a return to RtlGenRandom(). Instead, I'd like to suggest that perhaps we use RtlGenRandom() as a "fallback RNG" in the case that BCrypt doesn't work.

This pull request implements this fallback behavior. I believe this would improve the robustness of this essential data structure within the standard library, and I see only 2 potential drawbacks:

  1. Slight added overhead: It should be quite minimal though. The first call to sys::rand::hashmap_random_keys() will incur a bit of initialization overhead, and every call after will incur roughly 2 non-atomic global reads and 2 easily predictable branches. Both should be negligible compared to the actual cost of generating secure random numbers
  2. RtlGenRandom() is deprecated by Microsoft: Technically true, but as mentioned in this comment on GoLang, this API is ubiquitous in Windows software and actually removing it would break lots of things. Also, Firefox uses it already in our C++ code, and Chromium uses it in their code as well (which transitively means that Microsoft uses it in their own web browser, Edge). If there did come a time when Microsoft truly removes this API, it should be easy enough for Rust to simply remove the fallback in the code I've added here

Issue rust-lang#84096 changed the hashmap RNG to use BCryptGenRandom instead of
RtlGenRandom on Windows.

Mozilla Firefox started experiencing random failures in
env_logger::Builder::new() (Issue rust-lang#94098) during initialization of their
unsandboxed main process with an "Access Denied" error message from
BCryptGenRandom(), which is used by the HashMap contained in
env_logger::Builder

The root cause appears to be a virus scanner or other software interfering
with BCrypt DLLs loading.

This change adds a fallback option if BCryptGenRandom is unusable for
whatever reason. It will fallback to RtlGenRandom in this case.

Fixes rust-lang#94098
@rustbot rustbot added the T-libs Relevant to the library team, which will review and decide on the PR/issue. label May 10, 2022
@rust-highfive
Copy link
Collaborator

Hey! It looks like you've submitted a new PR for the library teams!

If this PR contains changes to any rust-lang/rust public library APIs then please comment with r? rust-lang/libs-api @rustbot label +T-libs-api -T-libs to request review from a libs-api team reviewer. If you're unsure where your change falls no worries, just leave it as is and the reviewer will take a look and make a decision to forward on if necessary.

Examples of T-libs-api changes:

  • Stabilizing library features
  • Introducing insta-stable changes such as new implementations of existing stable traits on existing stable types
  • Introducing new or changing existing unstable library APIs (excluding permanently unstable features / features without a tracking issue)
  • Changing public documentation in ways that create new stability guarantees
  • Changing observable runtime behavior of library APIs

@rust-highfive
Copy link
Collaborator

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @kennytm (or someone else) soon.

Please see the contribution instructions for more information.

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label May 10, 2022
library/std/src/sys/windows/rand.rs Outdated Show resolved Hide resolved
library/std/src/sys/windows/rand.rs Outdated Show resolved Hide resolved
library/std/src/sys/windows/rand.rs Outdated Show resolved Hide resolved
@marti4d marti4d requested a review from ChrisDenton May 16, 2022 13:24
@ChrisDenton
Copy link
Member

Ok, I think this is good to go.

I understand the concerns expressed in #94098 and that ideally this should not be necessary. However, I think @marti4d makes a strong case that a fallback is in fact needed at this time. It is causing issues for a small but significant number of users and while it would be great if the OS itself could fix or workaround whatever is causing the issue (possibly AVs), that's not happening at the moment. This decision could be revisited in the future if further information comes to light.

@bors r+

@bors
Copy link
Contributor

bors commented May 16, 2022

📌 Commit 3de6c2c has been approved by ChrisDenton

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 16, 2022
@ChrisDenton
Copy link
Member

@bors r- Improve error message

@bors bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels May 16, 2022
Copy link
Member

@ChrisDenton ChrisDenton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bors r+

@ChrisDenton
Copy link
Member

Oh, that didn't work. Let's try bors again.

@bors r+

@bors
Copy link
Contributor

bors commented May 16, 2022

📌 Commit aba3454 has been approved by ChrisDenton

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels May 16, 2022
Dylan-DPC added a commit to Dylan-DPC/rust that referenced this pull request May 17, 2022
Make HashMap fall back to RtlGenRandom if BCryptGenRandom fails

With PR rust-lang#84096, Rust `std::collections::hash_map::RandomState` changed from using `RtlGenRandom()` ([msdn](https://docs.microsoft.com/en-us/windows/win32/api/ntsecapi/nf-ntsecapi-rtlgenrandom)) to `BCryptGenRandom()` ([msdn](https://docs.microsoft.com/en-us/windows/win32/api/bcrypt/nf-bcrypt-bcryptgenrandom)) as its source of secure randomness after much discussion ([here](rust-random/getrandom#65 (comment)), among other places).

Unfortunately, after that PR landed, Mozilla Firefox started experiencing fairly-rare crashes during startup while attempting to initialize the `env_logger` crate. ([docs for env_logger](https://docs.rs/env_logger/latest/env_logger/)) The root issue is that on some machines, `BCryptGenRandom()` will fail with an `Access is denied. (os error 5)` error message. ([Bugzilla issue 1754490](https://bugzilla.mozilla.org/show_bug.cgi?id=1754490)) (Discussion in issue rust-lang#94098)

Note that this is happening upon startup of Firefox's unsandboxed Main Process, so this behavior is different and separate from previous issues ([like this](https://bugzilla.mozilla.org/show_bug.cgi?id=1746254)) where BCrypt DLLs were blocked by process sandboxing. In the case of sandboxing, we knew we were doing something abnormal and expected that we'd have to resort to abnormal measures to make it work.

However, in this case we are in a regular unsandboxed process just trying to initialize `env_logger` and getting a panic. We suspect that this may be caused by a virus scanner or some other security software blocking the loading of the BCrypt DLLs, but we're not completely sure as we haven't been able to replicate locally.

It is also possible that Firefox is not the only software affected by this; we just may be one of the pieces of Rust software that has the telemetry and crash reporting necessary to catch it.

I have read some of the historical discussion around using `BCryptGenRandom()` in Rust code, and I respect the decision that was made and agree that it was a good course of action, so I'm not trying to open a discussion about a return to `RtlGenRandom()`. Instead, I'd like to suggest that perhaps we use `RtlGenRandom()` as a "fallback RNG" in the case that BCrypt doesn't work.

This pull request implements this fallback behavior. I believe this would improve the robustness of this essential data structure within the standard library, and I see only 2 potential drawbacks:

1. Slight added overhead: It should be quite minimal though. The first call to `sys::rand::hashmap_random_keys()` will incur a bit of initialization overhead, and every call after will incur roughly 2 non-atomic global reads and 2 easily predictable branches. Both should be negligible compared to the actual cost of generating secure random numbers
2. `RtlGenRandom()` is deprecated by Microsoft: Technically true, but as mentioned in [this comment on GoLang](golang/go#33542 (comment)), this API is ubiquitous in Windows software and actually removing it would break lots of things. Also, Firefox uses it already in [our C++ code](https://searchfox.org/mozilla-central/rev/5f88c1d6977e03e22d3420d0cdf8ad0113c2eb31/mfbt/RandomNum.cpp#25), and [Chromium uses it in their code as well](https://source.chromium.org/chromium/chromium/src/+/main:base/rand_util_win.cc) (which transitively means that Microsoft uses it in their own web browser, Edge). If there did come a time when Microsoft truly removes this API, it should be easy enough for Rust to simply remove the fallback in the code I've added here
@bors
Copy link
Contributor

bors commented May 17, 2022

⌛ Testing commit aba3454 with merge 72599de5af3d1060893084b6433650cde782a9fb...

@bors
Copy link
Contributor

bors commented May 17, 2022

💔 Test failed - checks-actions

@bors bors removed the S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. label May 17, 2022
@bors bors added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label May 17, 2022
@rust-log-analyzer
Copy link
Collaborator

The job x86_64-gnu-tools failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)
.......... (60/64)
...       (64/64)


/checkout/src/test/rustdoc-gui/search-tab-change-title-fn-sig.goml search-tab-change-title-fn-sig... FAILED
[ERROR] (line 6) TimeoutError: waiting for selector "#titles" failed: timeout 30000ms exceeded: for command `wait-for: "#titles"`
Build completed unsuccessfully in 0:00:45

@ChrisDenton
Copy link
Member

That failure looks spurious to me. At least I don't think it could be caused by this PR. Let's try again.

@bors retry spurious src/test/rustdoc-gui/search-tab-change-title-fn-sig.goml

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 17, 2022
bors added a commit to rust-lang-ci/rust that referenced this pull request May 18, 2022
Rollup of 6 pull requests

Successful merges:

 - rust-lang#94639 (Suggest dereferencing non-lval mutable reference on assignment)
 - rust-lang#95979 (update coherence docs, fix generator + opaque type ICE)
 - rust-lang#96378 (Mention traits and types involved in unstable trait upcasting)
 - rust-lang#96917 (Make HashMap fall back to RtlGenRandom if BCryptGenRandom fails)
 - rust-lang#97101 (Add tracking issue for ExitCode::exit_process)
 - rust-lang#97123 (Clean fix for rust-lang#96223)

Failed merges:

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit 927a40b into rust-lang:master May 18, 2022
@rustbot rustbot added this to the 1.63.0 milestone May 18, 2022
badboy added a commit to badboy/getrandom that referenced this pull request Sep 1, 2022
badboy added a commit to badboy/getrandom that referenced this pull request Sep 2, 2022
badboy added a commit to badboy/getrandom that referenced this pull request Sep 2, 2022
badboy added a commit to badboy/getrandom that referenced this pull request Jan 11, 2023
badboy added a commit to badboy/getrandom that referenced this pull request Jan 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants