Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add rust.lto=off to bootstrap and set as compiler/library default #107241

Merged
merged 1 commit into from
Feb 3, 2023

Conversation

clubby789
Copy link
Contributor

Closes #107202

The issue mentions embed-bitcode=on, but here

if stage >= 1 {
cargo.rustflag("-Cembed-bitcode=yes");
}

it appears that this is always set for std stage 1+, so I'm unsure if changes are needed here.

@rustbot label +A-bootstrap

@rustbot
Copy link
Collaborator

rustbot commented Jan 23, 2023

r? @Mark-Simulacrum

(rustbot has picked a reviewer for you, use r? to override)

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) labels Jan 23, 2023
@@ -722,6 +722,9 @@ impl Step for Rustc {
cargo.rustflag("-Cembed-bitcode=yes");
}
RustcLto::ThinLocal => { /* Do nothing, this is the default */ }
RustcLto::Off => {
Copy link
Contributor

@ehuss ehuss Jan 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just FYI, this doesn't disable LTO for the compiler built in stage0. I think this would need to go outside the check for the compiler stage.

To elaborate: The reason -Clto doesn't run in stage 0 is per the comment above about LTO not working in stage 0, but the "off" setting is specifically turning it off, so it doesn't have the same restriction.

@Kobzol
Copy link
Contributor

Kobzol commented Jan 25, 2023

Can you confirm by local benchmarks that rebuilds are faster with this option set to off?

@clubby789
Copy link
Contributor Author

clubby789 commented Jan 25, 2023

x b library with lto=off -

x b library  1260.84s user 31.05s system 527% cpu 4:05.01 total
x b library  1251.37s user 30.98s system 526% cpu 4:03.67 total
x b library  1248.36s user 30.92s system 510% cpu 4:10.61 total

x b library with lto=thin-local -

x b library  1462.13s user 38.48s system 567% cpu 4:24.46 total
x b library  1471.53s user 39.20s system 551% cpu 4:33.82 total
x b library  1474.35s user 39.11s system 532% cpu 4:44.30 total

For x b library --keep-stage 0 (after touch library/core/src/lib.rs), the average times were around 35s to build stage1 std with thin-local, and 15-20s for off.

@Kobzol
Copy link
Contributor

Kobzol commented Jan 26, 2023

Looks great! Hopefully the compiler won't be unusably slow without any LTO.

@albertlarsan68
Copy link
Member

albertlarsan68 commented Jan 26, 2023

$ hyperfine -L mode off,thinlocal -p "x clean; x build library --dry-run" "x build library --config config.{mode}.toml";hyperfine -L mode off,thinlocal -p "x clean; x build library; touch library/core/src/lib.rs" "x build library --config config.{mode}.toml"

# Clean build
Benchmark 1: x build library --config config.off.toml
  Time (mean ± σ):     25.388 s ±  0.361 s    [User: 59.866 s, System: 5.066 s]
  Range (min … max):   25.094 s … 26.181 s    10 runs

Benchmark 2: x build library --config config.thinlocal.toml
  Time (mean ± σ):     26.330 s ±  0.308 s    [User: 68.647 s, System: 6.686 s]
  Range (min … max):   26.096 s … 27.065 s    10 runs

Summary
  'x build library --config config.off.toml' ran
    1.04 ± 0.02 times faster than 'x build library --config config.thinlocal.toml'

# Frech incr comp (touch library/core/src/lib.rs)
Benchmark 1: x build library --config config.off.toml
  Time (mean ± σ):     11.516 s ±  0.159 s    [User: 20.368 s, System: 3.478 s]
  Range (min … max):   11.392 s … 11.936 s    10 runs

Benchmark 2: x build library --config config.thinlocal.toml
  Time (mean ± σ):     25.453 s ±  0.412 s    [User: 65.687 s, System: 5.908 s]
  Range (min … max):   25.105 s … 26.519 s    10 runs

Summary
  'x build library --config config.off.toml' ran
    2.21 ± 0.05 times faster than 'x build library --config config.thinlocal.toml'

@lqd
Copy link
Member

lqd commented Jan 26, 2023

How much slower is the resulting compiler built without LTO, e.g. on one of the longer check build benchmarks from rustc-perf (like cargo, diesel, keccak) ?

@Kobzol

This comment was marked as outdated.

@Kobzol
Copy link
Contributor

Kobzol commented Jan 26, 2023

Oh, I'm stupid, I didn't realize that this PR also changed the default to be lto=off. Therefore I was just benchmarking the same code against itself before...

Anyway, without any LTO, it's (expectedly) quite slower:
image

But I would say that it's not catastrophically slower (100+%), and the rustc incremental compilation wins are really nice. This should definitely be propagated as a way of speeding up (incremental) compiler builds, and I think that it would also make sense to make it be the default.

Although maybe implementing it and switching the default should be split into 2 PRs.

Copy link
Member

@jyn514 jyn514 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than Eric's comment, this looks great!

src/bootstrap/defaults/config.compiler.toml Show resolved Hide resolved
@clubby789 clubby789 changed the title Add rust.lto=off to bootstrap Add rust.lto=off to bootstrap and set as compiler/library default Jan 26, 2023
@lqd
Copy link
Member

lqd commented Jan 26, 2023

Let's also notify people if the default for these profiles is changed: a 30% difference is significant enough to be surprising/confusing, especially when a PR will need interacting with the perfbot, its results are likely to be quite different from local runs.

@Kobzol
Copy link
Contributor

Kobzol commented Jan 26, 2023

Indeed, and probably we should also add it to the rustc-dev-guide, e.g. here: https://rustc-dev-guide.rust-lang.org/profiling.html.

@jyn514
Copy link
Member

jyn514 commented Jan 26, 2023

Let's also notify people if the default for these profiles is changed: a 30% difference is significant enough to be surprising/confusing, especially when a PR will need interacting with the perfbot, its results are likely to be quite different from local runs.

What do you mean by "notify people"? In the past, we've used blog posts and bumping MAJOR_VERSION in src/bootstrap/main.rs - I would be fine with a blog post, but I don't think changing the version number is warranted since it only affects some profiles.

@jyn514
Copy link
Member

jyn514 commented Jan 26, 2023

Anyway, without any LTO, it's (expectedly) quite slower:

Do you have a link to the perf run for that screenshot? I'm curious to know how much it improves bootstrap times for the compiler itself.

@Kobzol
Copy link
Contributor

Kobzol commented Jan 26, 2023

I ran it locally, just a few crates, I didn't test bootstrap. We could run the whole perf. suite in this PR by changing this line to --set rust.lto=off.

Actually, now that I think about it, we should be very careful with the off default. There are a lot of dist builders out there
and currently only dist-x86_64-msvc and dist-x86_64-linux (maybe also OS X? not sure) overwrite the default to thin. So if the default changes to off, we could start building much slower rustc artifacts for all targets except x64 Linux and x64 Windows.

Edit: I'm not sure if the defaults in the modified .toml example files are used in CI though. If not, then it's fine.

@lqd
Copy link
Member

lqd commented Jan 26, 2023

What do you mean by "notify people"?

Nothing necessarily major, but like the announcements at the beginning of t-compiler meetings, a zulip message there, and/or one to the wg-compiler-performance zulip stream. Not a full MCP or Inside Rust blog post I don't think, just a heads up that this is coming. Using these profiles is the preferred way described in the dev guide IIRC, and the perf difference could therefore surprise many contributors ?

@jyn514
Copy link
Member

jyn514 commented Jan 26, 2023

@Kobzol no, CI intentionally doesn't use any of the customized defaults (I want to change it to use profile = "user" at some point, but currently it doesn't even do that).

@Kobzol
Copy link
Contributor

Kobzol commented Jan 26, 2023

Ah, good to know :) Then it should be fine.

@Mark-Simulacrum
Copy link
Member

r=me after a notification at next meeting, I'll nominate for putting it on the agenda.

@Mark-Simulacrum Mark-Simulacrum added the I-compiler-nominated Nominated for discussion during a compiler team meeting. label Jan 29, 2023
@michaelwoerister
Copy link
Member

michaelwoerister commented Feb 2, 2023

Did anyone look into how this affects ./x test? In such a case we are invoking the (now slower) compiler thousands of times.

EDIT: I don't mean to block this PR -- it's just something we might want to look out for after it has landed.

@clubby789
Copy link
Contributor Author

$ hyperfine -L mode off,thin-local -s "x clean; x build library --config config.{mode}.toml" "x t tests/ui --force-rerun --config config.{mode}.toml"
Benchmark 1: x t tests/ui --force-rerun --config config.off.toml
  Time (mean ± σ):     192.653 s ±  7.313 s    [User: 926.090 s, System: 301.098 s]
  Range (min … max):   186.157 s … 200.573 s    3 runs
 
Benchmark 2: x t tests/ui --force-rerun --config config.thin-local.toml
  Time (mean ± σ):     190.043 s ±  4.840 s    [User: 852.896 s, System: 298.910 s]
  Range (min … max):   185.453 s … 195.100 s    3 runs
 
Summary
  'x t tests/ui --force-rerun --config config.thin-local.toml' ran
    1.01 ± 0.05 times faster than 'x t tests/ui --force-rerun --config config.off.toml'

@pnkfelix
Copy link
Member

pnkfelix commented Feb 2, 2023

We discussed this at today's T-compiler triage meeting.

We concluded by saying we'd like to see this PR move forward, and we'll just need to keep an eye on whether there's negative fallout from the change in defaults. But overall, we expect that a decent proportion of the developers will benefit from this change.

@rustbot label: -I-compiler-nominated

Nothing necessarily major, but like the announcements at the beginning of t-compiler meetings, a zulip message there, and/or one to the wg-compiler-performance zulip stream.

oh, and after re-reading @lqd 's message above, I opted to also post a topic about this PR moving forward in the t-compiler/performance zulip stream.

@rustbot rustbot removed the I-compiler-nominated Nominated for discussion during a compiler team meeting. label Feb 2, 2023
@pnkfelix
Copy link
Member

pnkfelix commented Feb 2, 2023

@bors r=simulacrum rollup=never

(I opted not to roll this up. I know its not supposed to affect perf.rlo, but if anything that's all the more reason to ensure that its very easy to bisect, if someone is locally observing a regression and wants to track it down to this.)

@bors
Copy link
Contributor

bors commented Feb 2, 2023

📌 Commit 2adf26f has been approved by simulacrum

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 2, 2023
@bors
Copy link
Contributor

bors commented Feb 3, 2023

⌛ Testing commit 2adf26f with merge f02439d...

@bors
Copy link
Contributor

bors commented Feb 3, 2023

☀️ Test successful - checks-actions
Approved by: simulacrum
Pushing f02439d to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Feb 3, 2023
@bors bors merged commit f02439d into rust-lang:master Feb 3, 2023
@rustbot rustbot added this to the 1.69.0 milestone Feb 3, 2023
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (f02439d): comparison URL.

Overall result: no relevant changes - no action needed

@rustbot label: -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
1.9% [1.3%, 3.0%] 3
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) - - 0

Cycles

This benchmark run did not return any relevant results for this metric.

matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Mar 5, 2023
Sync codegen defaults with compiler defaults and add a ping message so they stay in sync

Looks like this got missed in rust-lang#107241.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for rust.lto=off to bootstrap for faster dev builds