Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: Enable opt-dist for dist-aarch64-linux builds #133807

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

mrkajetanp
Copy link
Contributor

@mrkajetanp mrkajetanp commented Dec 3, 2024

Move the CI dist-aarch64-linux job to an aarch64 runner and enable optimised dist builds with the opt-dist pipeline.

For the time being, disable bolt on aarch64 due to upstream bolt bugs.

r? @Kobzol
cc @lqd

try-job: dist-aarch64-linux

@rustbot rustbot added A-testsuite Area: The testsuite used to check the correctness of rustc S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue. labels Dec 3, 2024
@rustbot
Copy link
Collaborator

rustbot commented Dec 3, 2024

Some changes occurred in src/tools/opt-dist

cc @Kobzol

@Kobzol
Copy link
Contributor

Kobzol commented Dec 3, 2024

Hi! Could you please split the part that moves the job to the aarch64 runner and the PGO/LTO part? So that we can evaluate the CI cost of these two actions separately. Thanks!

Comment on lines +48 to +96
ENV SCRIPT python3 ../x.py build --set rust.debug=true opt-dist && \
./build/$HOSTS/stage0-tools-bin/opt-dist linux-ci -- python3 ../x.py dist \
--host $HOSTS --target $HOSTS --include-default-paths build-manifest bootstrap
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way this is a completely new dockerfile, so do you mean just replace this with a simple ./x dist call and then wrap it with opt-dist separately? Just in separate commits or separate PRs altogether?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant a separate PR, so that we can land these two changes (move to aarch64 host first, and then enable optimizations) separately :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, just wanted to make sure - no problem :)

@lqd
Copy link
Member

lqd commented Dec 3, 2024

What improvements are you seeing with this PR, over the current artifacts?

@mrkajetanp
Copy link
Contributor Author

I've not yet benchmarked the changes, and I'm not sure how they compare to the artifacts from cross-compilation because I was only doing aarch64 runs but specifically adding opt-dist with LTO and PGO seems to increase the binary sizes of the main artifacts as follows:

librustc_driver-8c547d00f2ea16d5.so - 84M -> 94M
libLLVM.so.19.1-rust-1.85.0-nightly - 106M -> 108M
rustc -> 2.7M for both, +100 bytes

@Kobzol
Copy link
Contributor

Kobzol commented Dec 3, 2024

@bors try

Let's also see how long it takes with the optimizations.

bors added a commit to rust-lang-ci/rust that referenced this pull request Dec 3, 2024
ci: Enable opt-dist for dist-aarch64-linux builds

Move the CI dist-aarch64-linux job to an aarch64 runner and enable optimised dist builds with the opt-dist pipeline.

For the time being, disable bolt on aarch64 due to upstream bolt bugs.

r? `@Kobzol`
cc `@lqd`
@bors
Copy link
Contributor

bors commented Dec 3, 2024

⌛ Trying commit 02b958d with merge b828608...

bors added a commit to rust-lang-ci/rust that referenced this pull request Dec 3, 2024
ci: Enable opt-dist for dist-aarch64-linux builds

Move the CI dist-aarch64-linux job to an aarch64 runner and enable optimised dist builds with the opt-dist pipeline.

For the time being, disable bolt on aarch64 due to upstream bolt bugs.

r? `@Kobzol`
cc `@lqd`
@bors
Copy link
Contributor

bors commented Dec 3, 2024

⌛ Trying commit 02b958d with merge 945b9be...

@lqd
Copy link
Member

lqd commented Dec 3, 2024

That’s not going to be the good try job Jakub :3

@bors
Copy link
Contributor

bors commented Dec 3, 2024

💔 Test failed - checks-actions

@bors bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Dec 3, 2024
@Kobzol
Copy link
Contributor

Kobzol commented Dec 3, 2024

Ah, crap. Thanks!

@bors try

bors added a commit to rust-lang-ci/rust that referenced this pull request Dec 3, 2024
ci: Enable opt-dist for dist-aarch64-linux builds

Move the CI dist-aarch64-linux job to an aarch64 runner and enable optimised dist builds with the opt-dist pipeline.

For the time being, disable bolt on aarch64 due to upstream bolt bugs.

r? `@Kobzol`
cc `@lqd`

try-job: dist-aarch64-linux
@bors
Copy link
Contributor

bors commented Dec 3, 2024

⌛ Trying commit 02b958d with merge d156b32...

@bors
Copy link
Contributor

bors commented Dec 4, 2024

☀️ Try build successful - checks-actions
Build commit: d156b32 (d156b32c73dd32916f0ca83fb0104e600fbad49d)

@mrkajetanp
Copy link
Contributor Author

So that's an extra hour for LTO+PGO without the cache. 2h22 vs 3h22.

@lqd
Copy link
Member

lqd commented Dec 4, 2024

@bors try

bors added a commit to rust-lang-ci/rust that referenced this pull request Dec 4, 2024
ci: Enable opt-dist for dist-aarch64-linux builds

Move the CI dist-aarch64-linux job to an aarch64 runner and enable optimised dist builds with the opt-dist pipeline.

For the time being, disable bolt on aarch64 due to upstream bolt bugs.

r? `@Kobzol`
cc `@lqd`

try-job: dist-aarch64-linux
@bors
Copy link
Contributor

bors commented Dec 4, 2024

⌛ Trying commit 02b958d with merge ef33f24...

@bors
Copy link
Contributor

bors commented Dec 4, 2024

☀️ Try build successful - checks-actions
Build commit: ef33f24 (ef33f249830d94b7afdb529458aae4052f14ca98)

@mrkajetanp
Copy link
Contributor Author

1h54 cached, not so bad. Back to roughly the same time as the x86 cross build then.
Completely outside of the CI costs conversations, these changes not making the overall turnaround time for bors longer than it is already is probably a nice bonus.

@lqd
Copy link
Member

lqd commented Dec 4, 2024

I assume good benchmark results can also help with the cost discussion.

@Kobzol
Copy link
Contributor

Kobzol commented Dec 4, 2024

Indeed! You can download the CI artifacts e.g. using rustup-toolchain-install-master and benchmark it locally using rustc-perf. It would be nice to see the perf. diff. Let me know on Zulip if you want help with that.

@mrkajetanp
Copy link
Contributor Author

rustup-toolchain-install-master

Huh TIL that this is a thing - very neat, thanks for the suggestion!

@mrkajetanp
Copy link
Contributor Author

mrkajetanp commented Dec 4, 2024

Between these two builds, downloaded from the CI with rustup-toolchain-install-master, rustc-perf gives me:
Screenshot 2024-12-04 at 12 58 33 pm
Screenshot 2024-12-04 at 1 00 06 pm
All metrics with a mean improvement of 10-20%, the only thing that's worse is the artifact sizes, I already put the numbers for those in a previous comment.

@mrkajetanp
Copy link
Contributor Author

Just for completeness I ran a build with LTO+PGO+BOLT on AArch64. While the bolt hacks and workarounds can get it to work it gets very mixed results for the time being. Some improvements in compile times and on average it's beneficial but the artifact size really explodes. At the moment it doesn't work without out-of-tree patches to llvm anyway.

Screenshot 2024-12-04 at 3 44 09 pm
Screenshot 2024-12-04 at 3 44 55 pm
Screenshot 2024-12-04 at 3 45 17 pm

@Kobzol
Copy link
Contributor

Kobzol commented Dec 4, 2024

Yeah BOLT is quite bad in reusing the old .text segment currently. For BOLT, don't look at instruction counts, these will be artificially inflated by the larger artifact size. For BOLT cycles and wall-time should be improved, and it looks like it is (mean -2% walltime improvement, although wall-time measurements are quite noisy, this looks genuine).

@mrkajetanp
Copy link
Contributor Author

don't look at instruction counts, these will be artificially inflated by the larger artifact size

Is the "instructions" metric in rustc-perf not just "number of instructions executed"?

Apart from that yeah, the improvements are there so once BOLT is fixed we can still enable it.

@Kobzol
Copy link
Contributor

Kobzol commented Dec 5, 2024

Is the "instructions" metric in rustc-perf not just "number of instructions executed"?

Yeah, but somewhat counter-intuitively, that includes the load of the binary itself into the address space, the work of the dynamic linker etc., and this can be inflated a lot if the binary suddenly gets a lot larger :) When we enabled BOLT for the Rust compiler (#116352 (comment)), the icounts were horrible, but cycles had really nice improvements.

@Kobzol
Copy link
Contributor

Kobzol commented Dec 5, 2024

Although previously, the icount regressions were mostly for tiny crates, which is what I would expect given the larger binary. In your screenshot, the icount regressions were for large crates, where the binary size of the compiler shouldn't have such an effect. Anyway, we'll deal with BOLT later, no need to dig into that now :)

@mrkajetanp
Copy link
Contributor Author

Ahh right yeah that makes sense - thank you for the explanation! Exactly, let's cross that bridge when we get to it :)

Move the dist-aarch64-linux CI job to an aarch64 runner instead of
cross-compiling it from an x86 one. This will make it possible to
perform optimisations such as LTO, PGO and BOLT later on.
Move the CI dist-aarch64-linux job to an aarch64 runner and enable
optimised dist builds with the opt-dist pipeline.

For the time being, disable bolt on aarch64 due to upstream bolt bugs.
@mrkajetanp
Copy link
Contributor Author

@Kobzol wanna try opt-dist on centos as well?

@Kobzol
Copy link
Contributor

Kobzol commented Dec 10, 2024

@bors try

bors added a commit to rust-lang-ci/rust that referenced this pull request Dec 10, 2024
ci: Enable opt-dist for dist-aarch64-linux builds

Move the CI dist-aarch64-linux job to an aarch64 runner and enable optimised dist builds with the opt-dist pipeline.

For the time being, disable bolt on aarch64 due to upstream bolt bugs.

r? `@Kobzol`
cc `@lqd`

try-job: dist-aarch64-linux
@bors
Copy link
Contributor

bors commented Dec 10, 2024

⌛ Trying commit 0f4f465 with merge 3d52e24...

@bors
Copy link
Contributor

bors commented Dec 10, 2024

☀️ Try build successful - checks-actions
Build commit: 3d52e24 (3d52e243cf55ef270d2d201a3bf28ec172a63c18)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-testsuite Area: The testsuite used to check the correctness of rustc S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants