Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate sccache for Travis/AppVeyor builds #38119

Closed
alexcrichton opened this issue Dec 1, 2016 · 7 comments
Closed

Investigate sccache for Travis/AppVeyor builds #38119

alexcrichton opened this issue Dec 1, 2016 · 7 comments

Comments

@alexcrichton
Copy link
Member

sccache is being used to great success in Firefox, and is essentially (as I understand) a ccache that stores the cache in S3. Our travis builds rely on ccache for speedy LLVM builds, but they're all building the same thing all the time and the cache is massively duplicated and taking up tons of space. Additionally, we can unify our "LLVM caching strategies" by using ssccache on Windows where we're doing a different caching strategy with AppVeyor.

The benefits of such a transition I see are:

  • Unifying Travis/AppVeyor LLVM caching
  • Perhaps being speedier as a too-large cache wouldn't be downloaded on Travis builds (I'm not sure we ever clean this out...)
  • Maybe being useful for contributors, as everyone's basically compiling the same LLVM code, so we could speed up everyone else's builds
  • More Rust!

cc @luser

@alexcrichton alexcrichton changed the title Investigate sccache for Travis builds Investigate sccache for Travis/AppVeyor builds Dec 1, 2016
@luser
Copy link
Contributor

luser commented Dec 2, 2016

For Linux/Mac sccache should be pretty much a drop-in replacement. You just need to have S3 buckets configured (we configure one per AWS region that we use, so we're not making cross-region requests), have AWS credentials available on the builder, and set the environment variable SCCACHE_BUCKET to the bucket to use. Currently sccache doesn't support anything like CCACHE_BASEDIR, but that's planned.

It's worth trying it out on a local build first to see if anything breaks--we haven't implemented proper support for all the gcc/clang options out there, just enough to build Firefox. If you just use it locally without setting any options it'll use a local disk cache in ~/.cache/sccache, but you should be able to see if the build works.

For Windows we do a little fiddling with the compiler options because sccache doesn't currently support debug info in PDB files (hence the -Z7), and we override the MSVC wrapper that we normally use to parse depedency info out of -showIncludes because sccache supports a synthetic -deps option that works like gcc's dependency file options.

@sanxiyn
Copy link
Member

sanxiyn commented Dec 2, 2016

Does sccache speed build up when used over wide area network? That would be necessary for "Maybe being useful for contributors".

@luser
Copy link
Contributor

luser commented Dec 2, 2016

We don't have data on that yet, that is somewhat blocked by the CCACHE_BASEDIR work for C++ since otherwise people would have to build with source paths matching those on the build infrastructure. (Unless the LLVM build is invoking the compiler with relative source paths, I haven't looked.)

My gut feeling is that it will depend on the individual circumstances of how fast your machine can download files from S3 vs. compile them locally. If local compilation is faster than downloading then no, it would not be a win. I have plans to make sccache smarter such that it could still help in these circumstances--asking it to run as many compiles in parallel as the system can support while fetching other objects from cache in the background ought to provide a benefit except in a case where a developer has a very fast CPU and very slow network access to S3.

@alexcrichton
Copy link
Member Author

Ok I've been seeing quite a few timeouts on AppVeyor/Travis all associated with compiling LLVM when they should have it cached, so I've taken renewed interest in looking into this. So far my findings are:

So far the only real blocker so far being mozilla/sccache#43. Next comes the fun with MSVC. Unlike Firefox we have to use CMake for LLVM, and that's just where the fun begins...

  • For the life of me I couldn't figure out how to get the Visual Studio generators in CMake to use a different compiler, so I couldn't figure how how to insert sccache into the build.
  • Next I tried NMake, but that also didn't work (but to be continued...)
  • I settled on trying to get Ninja to work. I could at least override the compiler.
  • Using CMAKE_CXX_COMPILER_ARG1 works but apparently it doesn't support spaces in arguments. Typically the path to cl.exe has a space in it, so we can't take this strategy.
  • We still need to invoke sccache <cl.exe-path> args... so I created a wrapper script src/bootstrap/bin/sccache-cl.rs which does exactly that
  • Then I had to update cmake-rs to set VS env vars and such for calls to cmake to get everything to link correctly, and finally builds were underway.
  • Unfortunately, nothing was a cacheable command. A sample one looked like this. This looks to be because the commands all have -showIncludes which isn't supported by sccache.
  • Ok... so back to nmake! Now that I fixed the cmake+ninja issue that looked like it'd also fix the previous nmake issues. Turns out this also has no cacheable commands, but for the same reason as MinGW, it uses @ to pass all the arguments.

So my conclusion for now is that I was unable to hook together our MSVC builds and sccache. With CMake I couldn't get the Visual Studio build generator to use sccache at all, with Ninja it uses unsupported arguments, and with NMake it also uses unsupported arguments. It turns out NMake doesn't support parallelism anyway, though!

My proposal for next steps would be:

  • See if we can get Ninja, sccache, and MSVC to all play nicely. Sounds like this'll start with -showIncludes, but @luser do you need any more info from me?
  • Let's get the @ bug fixed for MinGW, and we may then have an NMake fallback if need be (although undesirable due to lack of parallelism).

I think those steps will get us to at least a workable state on Windows to the point where I can start testing on AppVeyor.


Along the way, I was surprised by the handling of environment variables in sccache (which makes sense in retrospect). The server process dictates all the environment variables for all compilations, which for MSVC has big ramifications due to cl.exe relying on INCLUDE and LIB so much. In Rust we try to avoid the need for running inside of a VS shell b/c it's a pain sometimes and is otherwise nice to work without.

The gcc-rs crate does all the probing logic for figuring out what the right values of INCLUDE and LIB and such are for any particular compile, so we have the information on hand at least. With the current architecture of sccache, though, we can only select one mode of compiling. That is, we have to make sure to set up the sccache server in a context with the variables set (which I forgot about when debugging) and then once set up we can't change that.

Normally this should work just fine. We're only using sccache for LLVM and we're only going to build LLVM once. In theory, however, we could compile LLVM for both 32 and 64-bit Windows in one compile (e.g. configure two host targets). It looks like this wouldn't be supported with sccache, though? The second compile would fail because it'd use a compiler with the wrong env vars in theory. @luser do you have thoughts on fixing this? Perhaps sccache could ship env vars to the server and configure all processes to run with the same suite of env vars?

I originally thought this'd be a caching hazard, but in retrospect I think not. On MSVC at least you've got different compilers for each target (e.g. 32/64-bit have entirely different compiler binaries) so that should be sufficient enough of a cache key for invalidation. Now I'd just be worried about getting it to work! To be clear, though, this isn't a blocker, just something I was thinking about. None of our AppVeyor builds do more than one host.

@luser
Copy link
Contributor

luser commented Dec 13, 2016

Unfortunately, nothing was a cacheable command. A sample one looked like this. This looks to be because the commands all have -showIncludes which isn't supported by sccache.

Presumably this is because something is parsing the -showIncludes output to produce dependency files. I'm not actually sure why we refuse to cache compiles with this other than it being a little fiddly to get the output right. I filed mozilla/sccache#47 to support it.

Perhaps sccache could ship env vars to the server and configure all processes to run with the same suite of env vars?

There's no reason that sccache can't send the environment from each compiler invocation over to the server and use that for invoking the real compiler. We might have to include some relevant vars in the hash key (like INCLUDE for MSVC), but there's nothing hard about that. I filed mozilla/sccache#48 to support that.

@alexcrichton
Copy link
Member Author

Oh oops right I forgot to file sccache issues, but both of those sound great to me, thanks @luser!

bors added a commit that referenced this issue Dec 16, 2016
rustbuild: Add sccache support

This commit adds support for sccache, a ccache-like compiler which works on MSVC
and stores results into an S3 bucket. This also switches over all Travis and
AppVeyor automation to using sccache to ensure a shared and unified cache over
time which can be shared across builders.

The support for sccache manifests as a new `--enable-sccache` option which
instructs us to configure LLVM differently to use a 'sccache' binary instead of
a 'ccache' binary. All docker images for Travis builds are updated to download
Mozilla's tooltool builds of sccache onto various containers and systems.
Additionally a new `rust-lang-ci-sccache` bucket is configured to hold all of
our ccache goodies.

---

Note that this does not currently change Windows [due to previously written up issues](#38119 (comment)). Despite that, however, I was curious to get timings for the builds on Travis to see what ranges we're working with. As a result, this is a WIP PR I'm using to gauge build times and such.
@alexcrichton
Copy link
Member Author

We've done this and this has landed, so closing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants