-
-
Notifications
You must be signed in to change notification settings - Fork 2k
runtime: avoid cleaning up extensions directory #4030
Conversation
👏 This also explain why bundler keep recompiling native gems. You just cut our build time in half @sirupsen |
@@ -169,7 +169,7 @@ def clean(dry_run = false) | |||
spec_gem_executables.flatten! | |||
|
|||
stale_gem_bins = gem_bins - spec_gem_executables | |||
stale_git_dirs = git_dirs - spec_git_paths | |||
stale_git_dirs = git_dirs - spec_git_paths - ["extensions"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would make more sense to remove it on line 143:
git_dirs = Dir["#{Gem.dir}/bundler/gems/*"]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree
b08be3e
to
7d19520
Compare
7d19520
to
73fde33
Compare
runtime: avoid cleaning up extensions directory
Woah @andremedeiros, this should not have been merged without a test |
I've tested it myself. @sirupsen will push another PR with tests. |
Still, changes like this should be tested before they hit master, and merged via |
@segiddins agree, this wasn't critical to get merged. |
@sirupsen great catch, also awesome debugging story! I'm very excited to learn why git extensions always recompile, since I've wondered about it for a long time. @andremedeiros Thanks for reviewing this! In the future, write a comment that includes @segiddins it's totally my fault for not making sure @andremedeiros knew about @homu, sorry. 😕 I don't think we've ever officially stated this before, but as part of the workflow improvements that I've been working on (like homu), from now on I'd like to only accept PRs that include tests that are red before the PR and green after. Bundler is so complex nowadays that regressions seem to happen every time something we care about doesn't have a test. 😝 Thanks for the good work, everyone! 🏆 |
clean: add test for not removing git extensions artifacts Unfortunately #4030 was merged before I had a chance to write a test. I confirmed that the test did not pass before my patch. Any ETA of 1.10.7 which includes #4030? ❤️ @indirect @segiddins @andremedeiros @skottler
clean: add test for not removing git extensions artifacts Unfortunately #4030 was merged before I had a chance to write a test. I confirmed that the test did not pass before my patch. Any ETA of 1.10.7 which includes #4030? ❤️ @indirect @segiddins @andremedeiros @skottler
clean: add test for not removing git extensions artifacts Unfortunately #4030 was merged before I had a chance to write a test. I confirmed that the test did not pass before my patch. Any ETA of 1.10.7 which includes #4030? ❤️ @indirect @segiddins @andremedeiros @skottler
When bundling gems via Git with native extensions with the
--clean
flag theextensions/
directory should be left around.Lacks a test, I'll follow up with that.
(optional debugging fairy tale for the record)
It all started an innocent Friday morning with a production Rails console taking a long time to boot (~2m). I started investigating why.
ps auxf
quickly surfaced that something fishy was going on:For some reason,
nokogiri
was compiling every time we booted. But only when booting the production console.rails runner
, starting a Unicorn worker or firing up Resque all boot normally without doing any ridiculous compilation.I
gdb(1)
'ed into the process and ran my dear colleague @csfrancis' MRI call stack script to obtain a Ruby backtrace to figure out where that compilation was happening, giving me:This confirms that it does indeed happen as part of starting the REPL, and thus is only a problem with the production console. This problem did not manifest in development either.
Looking into the
irb
source guided by the backtrace it's clear it attempts to pick up the fileirb/encoding_aliases.rb
. Eventually it makes it's way down toIRB::Locale#search_file
to try and findirb/encoding_aliases.rb
:The interesting code here is
Gem.try_activate(..)
. If Rubygems is loaded, it'll attempt to callGem#try_activate
in an effort to expand the load path even further to try and findirb/encoding_aliases.rb
. Eventually we end up inGem::Specification#build_extensions
as witnessed by the backtrace:What seems to be happening is that
nokogiri
is not quitting this check, but simply proceeding down to build the extension once again.I added the following line to get an overview of a) whether any other extensions were getting built and I just happened to only catch
nokogiri
, b) figure out if any of those values were wrong:There are three interesting cases here I wanted to inspect the output of:
An example of 1) is
Shopify/semian
:This looks completely reasonable. For
nokogiri
as an example of 2):What was even more strange is that all other examples of 2) actually didn't compile.
nokogiri
is the only gem that actually compiles in the 2) group. The others look like thisWhat's interesting about those two different examples of 2) is:
nokogiri
has a properinstalled_by_version
, butjson
has it at the default of0
json
andnokogiri
'sgem.build_complete
file doesn't existnokogiri
compiles, butjson
does notgem.build_complete
is an empty filerubygems
leaves around after compiling the extension.The reason we're compiling
nokogiri
and notjson
is thatjson
hasinstalled_by_version
at the default value of0
, which means the check above:Makes
#build_extensions
return early, avoiding the compilation. However, this is just a coincidence (and probably a bug inrubygems
). The check that should make it terminate is:So why is
installed_by_version
set fornokogiri
and notjson
? Because our fork ofnokogiri
hasinstalled_by_version
set in its gemspec. It's the only example of a Gem with a git source that has this for Shopify:(I haven't dug deeper into is why the input says
2.2.2
when the output above says2.2.3
)So we now know why
nokogiri
is the only C-extension compiling: It doesn't return early on the version check. This starts to uncover the actual bug:gem_build_complete_path
is not present for git-sourced gems and is not compiling by accident due to the version default.Sure enough, running
find vendor/bundler -name "*gem.build_complete"
shows thatgem.build_complete
exists for all gems that are downloaded forrubygems
. However, there are none for the gems with a Git source.When running the same
find(1)
command in development "*gem.build_complete" are clearly present invendor/bundler/bundler/extensions
. This directory doesn't exist at all in production. However, whennokogiri
compiles it creates theextensions/
directory with thegem.build_complete
file in this directory as well as the shared object for the extension.What's different between development and production? I suspected our Docker image building process
would be the cause, since it does a lot of liberal cleanups to save space:
Unfortunately
bundler/extensions
is not there. Something else must be removing theextensions/
directory and all its contents. I looked at our container build logs to try and see if anything suspicious was being run afterbundle install
:The second last line looked extremely suspicious. But where was it coming from? The removal goes back to the
--clean
flag tobundler(1)
which ends up callingBundler::Runtime#clean
. This method is responsible for cleaning up unused gems. However, it doesn't consider theextensions/
directory inside the directory that stores all the gems with a Git source:extensions/
would be part ofgit_dirs
, but notspec_git_paths
and thus would end up being part of the resulting array forstale_git_dirs
.If the
extensions/
directory is left around, Rubygems would be able to pick up thegem.build_complete
file and not compile the extension for the right reasons which resolves my bug.Some things to consider at the tail of this:
2.2.3
onnokogiri
even though the spec says2.2.2
?irb
callingGem.try_activate
on every gem? Removing that would break backwards compatibility unfortunately.@ibawt @csfrancis @camilo @byroot @arthurnn @sgrif @EiNSTeiN- @burke
@hone