Skip to content

Commit

Permalink
Auto merge of #11917 - ehuss:rustup-circumvent, r=weihanglo
Browse files Browse the repository at this point in the history
Optimize usage under rustup.

Closes #10986

This optimizes cargo when running under rustup to circumvent the rustup proxies. The rustup proxies introduce overhead that can make a noticeable difference.

The solution here is to identify if cargo would normally run `rustc` from PATH, and the current `rustc` in PATH points to something that looks like a rustup proxy (by comparing it to the `rustup` binary which is a hard-link to the proxy). If it detects this situation, then it looks for a binary in `$RUSTUP_HOME/toolchains/$TOOLCHAIN/bin/$TOOL`. If it finds the direct toolchain executable, then it uses that instead.

## Considerations

There have been some past attempts in the past to address this, but it has been a tricky problem to solve. This change has some risk because cargo is attempting to guess what the user and rustup wants, and it may guess wrong. Here are some considerations and risks for this:

* Setting `RUSTC` (as in rust-lang/rustup#2958) isn't an option. This makes the `RUSTC` setting "sticky" through invocations of different toolchains, such as a cargo subcommand or build script which does something like `cargo +nightly build`.

* Changing `PATH` isn't an option, due to issues like rust-lang/rustup#3036 where cargo subcommands would be unable to execute proxies (so things like `+toolchain` shorthands don't work).

* Setting other environment variables in rustup (as in rust-lang/rustup#3207 which adds `RUSTUP_TOOLCHAIN_DIR` the path to the toolchain dir) comes with various complications, as there is risk that the environment variables could get out of sync with one another (like with `RUSTUP_TOOLCHAIN`), causing tools to break or become confused.

  There was some consideration in that PR for adding protections by using an encoded environment variable that could be cross-checked, but I have concerns about the complexity of the solution.

  We may want to go with this solution in the long run, but I would like to try a short term solution in this PR first to see how it turns out.

* This won't work for a `rustup-toolchain.toml` override with a [`path`](https://rust-lang.github.io/rustup/overrides.html#path) setting. Cargo will use the slow path in that case. In theory it could try to detect this situation, which may be an exercise for the future.

* Some build-scripts, proc-macros, or custom cargo subcommands may be doing unusual things that interfere with the assumptions made in this PR. For example, a custom subcommand could call a `cargo` executable that is not managed by rustup. Proc-macros may be executing cargo or rustc, assuming it will reach some particular toolchain. It can be difficult to predict what unusual ways cargo and rustc are being used. This PR (and its tests) tries to make extra sure that it is resilient even in unusual circumstances.

* The "dev" fallback in rustup can introduce some complications for some solutions to this problem. If a rustup toolchain does not have cargo, such as with a developer "toolchain link", then rustup will automatically call either the nightly, beta, or stable cargo if they are available. This PR should work correctly, since rustup sets the correct `RUSTUP_TOOLCHAIN` environment variable for the *actual* toolchain, not the one where cargo was executed from.

* Special care should be considered for dynamic linking. `LD_LIBRARY_PATH` (linux), `DYLD_LIBRARY_PATH` (macos), and `PATH` (windows) need to be carefully set so that `rustc` can find its shared libraries. Directly executing `rustc` has some risk that it will load the wrong shared libraries. There are some mitigations for this. macOS and Linux use rpath, and Windows looks in the same directory as `rustc.exe`. Also, rustup configures the dyld environment variables from the outer cargo. Finally, cargo also configures these (particularly for the deprecated compiler plugins).

* This shouldn't impact installations that don't use rustup.

* I've done a variety of testing on the big three platforms, but certainly nowhere exhaustive.
    * One of many examples is making sure Clippy's development environment works correctly, which has special requirements for dynamic linking.

* There is risk about future rustup versions changing some assumptions made here. Some assumptions:
    * It assumes that if `RUSTUP_TOOLCHAIN` is set, then the proxy *always* runs exactly that toolchain and no other. If this changes, cargo could execute the wrong version. Currently `RUSTUP_TOOLCHAIN` is the highest priority [toolchain override](https://rust-lang.github.io/rustup/overrides.html) and is fundamental to how toolchain selection becomes "sticky", so I think it is unlikely to change.
    * It assumes rustup sets `RUSTUP_TOOLCHAIN` to a value that is exactly equal to the name of the toolchain in the `toolchains` directory. This works for user shorthands like `RUSTUP_TOOLCHAIN=nightly`, which gets converted to the full toolchain name. However, it does not work for `path` overrides (see above).
    * It assumes the `toolchains` directory layout is always `$RUSTUP_HOME/toolchains/$TOOLCHAIN`. If this changes, then I think the only consequence is that cargo will go back to the slow path.
    * It assumes downloading toolchains is not needed (since cargo running from the toolchain means it should already be downloaded).
    * It assumes there is no other environment setup needed (such as the dyld paths mentioned above).

  My hope is that if assumptions are no longer valid that the worst case is that cargo falls back to the slow path of running the proxy from PATH.

## Performance

This change won't affect the performance on Windows because rustup currently alters PATH to point to the toolchain directory. However, rust-lang/rustup#3178 is attempting to remove that, so this PR will be required to avoid a performance penalty on Windows. That change is currently opt-in, and will likely take a long while to roll out since it won't be released until after the next release, and may be difficult to get sufficient testing.

I have done some rough performance testing on macOS, Windows, and Linux on a variety of different kinds of projects with different commands. The following attempts to summarize what I saw.

The timings are going to be heavily dependent on the system and the project. These are the values I get on my systems, but will likely be very different for everyone else.

The Windows tests were performed with a custom build of rustup with rust-lang/rustup#3178 applied and enabled (stock rustup shows no change in performance as explained above).

The data is summarized in this spreadsheet: https://docs.google.com/spreadsheets/d/1zSvU1fQ0uSELxv3VqWmegGBhbLR-8_KUkyIzCIk21X0/edit?usp=sharing

`hello-world` has a particularly large impact of about 1.68 to 2.7x faster. However, a large portion of this overhead is related to running `rustc` at the start to discover its version and querying it for information. This is cached after the first run, so except for first-time builds, the effect isn't as noticeable. The "check with info" row is an benchmark that removes `target/debug/deps` but keeps the `.rustc_info.json` file.

Incremental builds are a bit more difficult to construct since it requires customizing the commands for each project. I only did an incremental test for cargo itself, running `touch src/cargo/lib.rs` and then `cargo check --lib`.

These measurements excluded the initial overhead of launching the rustup proxy to launch the initial cargo process. This was done just for simplicity, but it makes the test a little less characteristic of a typical usage, which will have some constant overhead for running the proxy.

These tests were done using [`hyperfine`](https://crates.io/crates/hyperfine) version 1.16.1. The macOS system was an M2 Max (12-thread). The Windows and Linux experiments were run on a AMD Ryzen Threadripper 2950X (32-thread). Rust 1.68.2 was used for testing. I can share the commands if people want to see them.
  • Loading branch information
bors committed May 4, 2023
2 parents 7d11844 + b9993bd commit 2d693e2
Show file tree
Hide file tree
Showing 3 changed files with 336 additions and 7 deletions.
80 changes: 73 additions & 7 deletions src/cargo/util/config/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -388,7 +388,7 @@ impl Config {
/// Gets the path to the `rustdoc` executable.
pub fn rustdoc(&self) -> CargoResult<&Path> {
self.rustdoc
.try_borrow_with(|| Ok(self.get_tool("rustdoc", &self.build_config()?.rustdoc)))
.try_borrow_with(|| Ok(self.get_tool(Tool::Rustdoc, &self.build_config()?.rustdoc)))
.map(AsRef::as_ref)
}

Expand All @@ -406,7 +406,7 @@ impl Config {
);

Rustc::new(
self.get_tool("rustc", &self.build_config()?.rustc),
self.get_tool(Tool::Rustc, &self.build_config()?.rustc),
wrapper,
rustc_workspace_wrapper,
&self
Expand Down Expand Up @@ -1640,11 +1640,63 @@ impl Config {
}
}

/// Looks for a path for `tool` in an environment variable or config path, defaulting to `tool`
/// as a path.
fn get_tool(&self, tool: &str, from_config: &Option<ConfigRelativePath>) -> PathBuf {
self.maybe_get_tool(tool, from_config)
.unwrap_or_else(|| PathBuf::from(tool))
/// Returns the path for the given tool.
///
/// This will look for the tool in the following order:
///
/// 1. From an environment variable matching the tool name (such as `RUSTC`).
/// 2. From the given config value (which is usually something like `build.rustc`).
/// 3. Finds the tool in the PATH environment variable.
///
/// This is intended for tools that are rustup proxies. If you need to get
/// a tool that is not a rustup proxy, use `maybe_get_tool` instead.
fn get_tool(&self, tool: Tool, from_config: &Option<ConfigRelativePath>) -> PathBuf {
let tool_str = tool.as_str();
self.maybe_get_tool(tool_str, from_config)
.or_else(|| {
// This is an optimization to circumvent the rustup proxies
// which can have a significant performance hit. The goal here
// is to determine if calling `rustc` from PATH would end up
// calling the proxies.
//
// This is somewhat cautious trying to determine if it is safe
// to circumvent rustup, because there are some situations
// where users may do things like modify PATH, call cargo
// directly, use a custom rustup toolchain link without a
// cargo executable, etc. However, there is still some risk
// this may make the wrong decision in unusual circumstances.
//
// First, we must be running under rustup in the first place.
let toolchain = self.get_env_os("RUSTUP_TOOLCHAIN")?;
// This currently does not support toolchain paths.
// This also enforces UTF-8.
if toolchain.to_str()?.contains(&['/', '\\']) {
return None;
}
// If the tool on PATH is the same as `rustup` on path, then
// there is pretty good evidence that it will be a proxy.
let tool_resolved = paths::resolve_executable(Path::new(tool_str)).ok()?;
let rustup_resolved = paths::resolve_executable(Path::new("rustup")).ok()?;
let tool_meta = tool_resolved.metadata().ok()?;
let rustup_meta = rustup_resolved.metadata().ok()?;
// This works on the assumption that rustup and its proxies
// use hard links to a single binary. If rustup ever changes
// that setup, then I think the worst consequence is that this
// optimization will not work, and it will take the slow path.
if tool_meta.len() != rustup_meta.len() {
return None;
}
// Try to find the tool in rustup's toolchain directory.
let tool_exe = Path::new(tool_str).with_extension(env::consts::EXE_EXTENSION);
let toolchain_exe = home::rustup_home()
.ok()?
.join("toolchains")
.join(&toolchain)
.join("bin")
.join(&tool_exe);
toolchain_exe.exists().then_some(toolchain_exe)
})
.unwrap_or_else(|| PathBuf::from(tool_str))
}

pub fn jobserver_from_env(&self) -> Option<&jobserver::Client> {
Expand Down Expand Up @@ -2645,3 +2697,17 @@ macro_rules! drop_eprint {
$crate::__shell_print!($config, err, false, $($arg)*)
);
}

enum Tool {
Rustc,
Rustdoc,
}

impl Tool {
fn as_str(&self) -> &str {
match self {
Tool::Rustc => "rustc",
Tool::Rustdoc => "rustdoc",
}
}
}
1 change: 1 addition & 0 deletions tests/testsuite/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -120,6 +120,7 @@ mod rustdoc;
mod rustdoc_extern_html;
mod rustdocflags;
mod rustflags;
mod rustup;
mod search;
mod shell_quoting;
mod source_replacement;
Expand Down
262 changes: 262 additions & 0 deletions tests/testsuite/rustup.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,262 @@
//! Tests for Cargo's behavior under Rustup.
use cargo_test_support::paths::{home, root, CargoPathExt};
use cargo_test_support::{cargo_process, process, project};
use std::env;
use std::env::consts::EXE_EXTENSION;
use std::ffi::OsString;
use std::fs;
use std::path::{Path, PathBuf};

/// Helper to generate an executable.
fn make_exe(dest: &Path, name: &str, contents: &str, env: &[(&str, PathBuf)]) -> PathBuf {
let rs_name = format!("{name}.rs");
fs::write(
root().join(&rs_name),
&format!("fn main() {{ {contents} }}"),
)
.unwrap();
let mut pb = process("rustc");
env.iter().for_each(|(key, value)| {
pb.env(key, value);
});
pb.arg("--edition=2021")
.arg(root().join(&rs_name))
.exec()
.unwrap();
let exe = Path::new(name).with_extension(EXE_EXTENSION);
let output = dest.join(&exe);
fs::rename(root().join(&exe), &output).unwrap();
output
}

fn prepend_path(path: &Path) -> OsString {
let mut paths = vec![path.to_path_buf()];
paths.extend(env::split_paths(&env::var_os("PATH").unwrap_or_default()));
env::join_paths(paths).unwrap()
}

struct RustupEnvironment {
/// Path for ~/.cargo/bin
cargo_bin: PathBuf,
/// Path for ~/.rustup
rustup_home: PathBuf,
/// Path to the cargo executable in the toolchain directory
/// (~/.rustup/toolchain/test-toolchain/bin/cargo.exe).
cargo_toolchain_exe: PathBuf,
}

/// Creates an executable which prints a message and then runs the *real* rustc.
fn real_rustc_wrapper(bin_dir: &Path, message: &str) -> PathBuf {
let real_rustc = cargo_util::paths::resolve_executable("rustc".as_ref()).unwrap();
// The toolchain rustc needs to call the real rustc. In order to do that,
// it needs to restore or clear the RUSTUP environment variables so that
// if rustup is installed, it will call the correct rustc.
let rustup_toolchain_setup = match std::env::var_os("RUSTUP_TOOLCHAIN") {
Some(t) => format!(
".env(\"RUSTUP_TOOLCHAIN\", \"{}\")",
t.into_string().unwrap()
),
None => format!(".env_remove(\"RUSTUP_TOOLCHAIN\")"),
};
let mut env = vec![("CARGO_RUSTUP_TEST_real_rustc", real_rustc)];
let rustup_home_setup = match std::env::var_os("RUSTUP_HOME") {
Some(h) => {
env.push(("CARGO_RUSTUP_TEST_RUSTUP_HOME", h.into()));
format!(".env(\"RUSTUP_HOME\", env!(\"CARGO_RUSTUP_TEST_RUSTUP_HOME\"))")
}
None => format!(".env_remove(\"RUSTUP_HOME\")"),
};
make_exe(
bin_dir,
"rustc",
&format!(
r#"
eprintln!("{message}");
let r = std::process::Command::new(env!("CARGO_RUSTUP_TEST_real_rustc"))
.args(std::env::args_os().skip(1))
{rustup_toolchain_setup}
{rustup_home_setup}
.status();
std::process::exit(r.unwrap().code().unwrap_or(2));
"#
),
&env,
)
}

/// Creates a simulation of a rustup environment with `~/.cargo/bin` and
/// `~/.rustup` directories populated with some executables that simulate
/// rustup.
fn simulated_rustup_environment() -> RustupEnvironment {
// Set up ~/.rustup/toolchains/test-toolchain/bin with a custom rustc and cargo.
let rustup_home = home().join(".rustup");
let toolchain_bin = rustup_home
.join("toolchains")
.join("test-toolchain")
.join("bin");
toolchain_bin.mkdir_p();
let rustc_toolchain_exe = real_rustc_wrapper(&toolchain_bin, "real rustc running");
let cargo_toolchain_exe = make_exe(
&toolchain_bin,
"cargo",
r#"panic!("cargo toolchain should not be called");"#,
&[],
);

// Set up ~/.cargo/bin with a typical set of rustup proxies.
let cargo_bin = home().join(".cargo").join("bin");
cargo_bin.mkdir_p();

let rustc_proxy = make_exe(
&cargo_bin,
"rustc",
&format!(
r#"
match std::env::args().next().unwrap().as_ref() {{
"rustc" => {{}}
arg => panic!("proxy only supports rustc, got {{arg:?}}"),
}}
eprintln!("rustc proxy running");
let r = std::process::Command::new(env!("CARGO_RUSTUP_TEST_rustc_toolchain_exe"))
.args(std::env::args_os().skip(1))
.status();
std::process::exit(r.unwrap().code().unwrap_or(2));
"#
),
&[("CARGO_RUSTUP_TEST_rustc_toolchain_exe", rustc_toolchain_exe)],
);
fs::hard_link(
&rustc_proxy,
cargo_bin.join("cargo").with_extension(EXE_EXTENSION),
)
.unwrap();
fs::hard_link(
&rustc_proxy,
cargo_bin.join("rustup").with_extension(EXE_EXTENSION),
)
.unwrap();

RustupEnvironment {
cargo_bin,
rustup_home,
cargo_toolchain_exe,
}
}

#[cargo_test]
fn typical_rustup() {
// Test behavior under a typical rustup setup with a normal toolchain.
let RustupEnvironment {
cargo_bin,
rustup_home,
cargo_toolchain_exe,
} = simulated_rustup_environment();

// Set up a project and run a normal cargo build.
let p = project().file("src/lib.rs", "").build();
// The path is modified so that cargo will call `rustc` from
// `~/.cargo/bin/rustc to use our custom rustup proxies.
let path = prepend_path(&cargo_bin);
p.cargo("check")
.env("RUSTUP_TOOLCHAIN", "test-toolchain")
.env("RUSTUP_HOME", &rustup_home)
.env("PATH", &path)
.with_stderr(
"\
[CHECKING] foo v0.0.1 [..]
real rustc running
[FINISHED] [..]
",
)
.run();

// Do a similar test, but with a toolchain link that does not have cargo
// (which normally would do a fallback to nightly/beta/stable).
cargo_toolchain_exe.rm_rf();
p.build_dir().rm_rf();

p.cargo("check")
.env("RUSTUP_TOOLCHAIN", "test-toolchain")
.env("RUSTUP_HOME", &rustup_home)
.env("PATH", &path)
.with_stderr(
"\
[CHECKING] foo v0.0.1 [..]
real rustc running
[FINISHED] [..]
",
)
.run();
}

// This doesn't work on Windows because Cargo forces the PATH to contain the
// sysroot_libdir, which is actually `bin`, preventing the test from
// overriding the bin directory.
#[cargo_test(ignore_windows = "PATH can't be overridden on Windows")]
fn custom_calls_other_cargo() {
// Test behavior when a custom subcommand tries to manipulate PATH to use
// a different toolchain.
let RustupEnvironment {
cargo_bin,
rustup_home,
cargo_toolchain_exe: _,
} = simulated_rustup_environment();

// Create a directory with a custom toolchain (outside of the rustup universe).
let custom_bin = root().join("custom-bin");
custom_bin.mkdir_p();
// `cargo` points to the real cargo.
let cargo_exe = cargo_test_support::cargo_exe();
fs::hard_link(&cargo_exe, custom_bin.join(cargo_exe.file_name().unwrap())).unwrap();
// `rustc` executes the real rustc.
real_rustc_wrapper(&custom_bin, "custom toolchain rustc running");

// A project that cargo-custom will try to build.
let p = project().file("src/lib.rs", "").build();

// Create a custom cargo subcommand.
// This will modify PATH to a custom toolchain and call cargo from that.
make_exe(
&cargo_bin,
"cargo-custom",
r#"
use std::env;
use std::process::Command;
eprintln!("custom command running");
let mut paths = vec![std::path::PathBuf::from(env!("CARGO_RUSTUP_TEST_custom_bin"))];
paths.extend(env::split_paths(&env::var_os("PATH").unwrap_or_default()));
let path = env::join_paths(paths).unwrap();
let status = Command::new("cargo")
.arg("check")
.current_dir(env!("CARGO_RUSTUP_TEST_project_dir"))
.env("PATH", path)
.status()
.unwrap();
assert!(status.success());
"#,
&[
("CARGO_RUSTUP_TEST_custom_bin", custom_bin),
("CARGO_RUSTUP_TEST_project_dir", p.root()),
],
);

cargo_process("custom")
// Set these to simulate what would happen when running under rustup.
// We want to make sure that cargo-custom does not try to use the
// rustup proxies.
.env("RUSTUP_TOOLCHAIN", "test-toolchain")
.env("RUSTUP_HOME", &rustup_home)
.with_stderr(
"\
custom command running
[CHECKING] foo [..]
custom toolchain rustc running
[FINISHED] [..]
",
)
.run();
}

0 comments on commit 2d693e2

Please sign in to comment.