Skip to content
This repository has been archived by the owner on Apr 14, 2021. It is now read-only.

runtime: avoid cleaning up extensions directory #4030

Merged
merged 1 commit into from
Oct 3, 2015

Conversation

sirupsen
Copy link
Contributor

@sirupsen sirupsen commented Oct 3, 2015

When bundling gems via Git with native extensions with the --clean flag the extensions/ directory should be left around.

Lacks a test, I'll follow up with that.


(optional debugging fairy tale for the record)

It all started an innocent Friday morning with a production Rails console taking a long time to boot (~2m). I started investigating why. ps auxf quickly surfaced that something fishy was going on:

8192     12831  0.3  0.0 196096 10696 pts/4    Ssl+ 15:48   0:00  \_ borginit:shopify/console/1
8192     12927  0.0  0.0   8316   668 pts/4    S+   15:48   0:00      \_ /sbin/ppidshim /bin/bash -c exec bin/rails console
8192     12929 48.0  0.3 853924 475960 pts/4   Sl+  15:48   0:10          \_ ruby /app/bin/rails console
8192     16762  0.0  0.0   8568   816 pts/4    S+   15:49   0:00              \_ sh -c /usr/lib/shopify-ruby/2.1.6-shopify2/bin/ruby extconf.rb 2>&1
8192     16764  7.7  0.1 290372 167536 pts/4   Sl+  15:49   0:00                  \_ /usr/lib/shopify-ruby/2.1.6-shopify2/bin/ruby extconf.rb
8192     23646  0.0  0.0   8568   820 pts/4    S+   15:49   0:00                      \_ sh -c make >/artifacts/ruby/2.1.0/bundler/gems/nokogiri-e2821bee7e21/ext/nokogiri/tmp/x86_64-unknown-linux-
8192     23647  0.0  0.0  10580  2492 pts/4    S+   15:49   0:00                          \_ make
8192     23652  0.0  0.0  10580  2776 pts/4    S+   15:49   0:00                              \_ make all-recursive
8192     23653  0.0  0.0  21696  3428 pts/4    S+   15:49   0:00                                  \_ /bin/bash -c fail=; \ if (target_option=k; case ${target_option-} in ?) ;; *) echo "am__make_ru
8192     23668  0.0  0.0  21700  2328 pts/4    S+   15:49   0:00                                      \_ /bin/bash -c fail=; \ if (target_option=k; case ${target_option-} in ?) ;; *) echo "am__mak
8192     23669  0.0  0.0  10580  2584 pts/4    S+   15:49   0:00                                          \_ make all-am
8192     24300  0.0  0.0  21660  3288 pts/4    S+   15:49   0:00                                              \_ /bin/bash -c echo "  CC      " tree.lo;/bin/bash ./libtool --silent --tag=CC   --mo
8192     24303  0.0  0.0  22420  4164 pts/4    S+   15:49   0:00                                                  \_ /bin/bash ./libtool --silent --tag=CC --mode=compile gcc -DHAVE_CONFIG_H -I. -I
8192     24309  0.0  0.0  10852  2456 pts/4    S+   15:49   0:00                                                      \_ gcc -DHAVE_CONFIG_H -I. -I./include -I./include -D_REENTRANT -fPIC -pedanti
8192     24310  0.0  0.0  46200 16276 pts/4    R+   15:49   0:00                                                          \_ /usr/lib/gcc/x86_64-linux-gnu/4.8/cc1 -quiet -I . -I ./include -I ./inc

For some reason, nokogiri was compiling every time we booted. But only when booting the production console. rails runner, starting a Unicorn worker or firing up Resque all boot normally without doing any ridiculous compilation.

I gdb(1)'ed into the process and ran my dear colleague @csfrancis' MRI call stack script to obtain a Ruby backtrace to figure out where that compilation was happening, giving me:

/artifacts/ruby/2.1.0/gems/activesupport-4.2.4/lib/active_support/core_ext/kernel/agnostics.rb:7:in ``'
/usr/lib/shopify-ruby/2.1.6-shopify2/lib/ruby/2.1.0/rubygems/ext/builder.rb:73:in `run'
/usr/lib/shopify-ruby/2.1.6-shopify2/lib/ruby/2.1.0/rubygems/ext/ext_conf_builder.rb:38:in `block in build'
/usr/lib/shopify-ruby/2.1.6-shopify2/lib/ruby/2.1.0/tempfile.rb:324:in `open'
/usr/lib/shopify-ruby/2.1.6-shopify2/lib/ruby/2.1.0/rubygems/ext/ext_conf_builder.rb:17:in `build'
/usr/lib/shopify-ruby/2.1.6-shopify2/lib/ruby/2.1.0/rubygems/ext/builder.rb:161:in `block (2 levels) in build_extension'
/usr/lib/shopify-ruby/2.1.6-shopify2/lib/ruby/2.1.0/rubygems/ext/builder.rb:160:in `block in build_extension'
/usr/lib/shopify-ruby/2.1.6-shopify2/lib/ruby/2.1.0/monitor.rb:211:in `mon_synchronize'
/usr/lib/shopify-ruby/2.1.6-shopify2/lib/ruby/2.1.0/rubygems/ext/builder.rb:159:in `build_extension'
/usr/lib/shopify-ruby/2.1.6-shopify2/lib/ruby/2.1.0/rubygems/ext/builder.rb:198:in `block in build_extensions'
/usr/lib/shopify-ruby/2.1.6-shopify2/lib/ruby/2.1.0/rubygems/ext/builder.rb:195:in `build_extensions'
/usr/lib/shopify-ruby/2.1.6-shopify2/lib/ruby/2.1.0/rubygems/specification.rb:1436:in `block in build_extensions'
/usr/lib/shopify-ruby/2.1.6-shopify2/lib/ruby/2.1.0/rubygems/user_interaction.rb:45:in `use_ui'
/usr/lib/shopify-ruby/2.1.6-shopify2/lib/ruby/2.1.0/rubygems/specification.rb:1434:in `build_extensions'
/usr/lib/shopify-ruby/2.1.6-shopify2/lib/ruby/2.1.0/rubygems/basic_specification.rb:56:in `contains_requirable_file?'
/usr/lib/shopify-ruby/2.1.6-shopify2/lib/ruby/2.1.0/rubygems/specification.rb:915:in `block in find_by_path'
/usr/lib/shopify-ruby/2.1.6-shopify2/lib/ruby/2.1.0/rubygems/specification.rb:883:in `block in each'
/usr/lib/shopify-ruby/2.1.6-shopify2/lib/ruby/2.1.0/forwardable.rb:183:in `each'
/usr/lib/shopify-ruby/2.1.6-shopify2/lib/ruby/2.1.0/rubygems/specification.rb:882:in `each'
/usr/lib/shopify-ruby/2.1.6-shopify2/lib/ruby/2.1.0/rubygems/specification.rb:914:in `find_by_path'
/usr/lib/shopify-ruby/2.1.6-shopify2/lib/ruby/2.1.0/rubygems.rb:188:in `try_activate'
/usr/lib/shopify-ruby/2.1.6-shopify2/lib/ruby/2.1.0/irb/locale.rb:150:in `block in search_file'
/usr/lib/shopify-ruby/2.1.6-shopify2/lib/ruby/2.1.0/irb/locale.rb:158:in `block in each_localized_path'
/usr/lib/shopify-ruby/2.1.6-shopify2/lib/ruby/2.1.0/irb/locale.rb:167:in `each_sublocale'
/usr/lib/shopify-ruby/2.1.6-shopify2/lib/ruby/2.1.0/irb/locale.rb:157:in `each_localized_path'
/usr/lib/shopify-ruby/2.1.6-shopify2/lib/ruby/2.1.0/irb/locale.rb:145:in `search_file'
/usr/lib/shopify-ruby/2.1.6-shopify2/lib/ruby/2.1.0/irb/locale.rb:124:in `find'
/usr/lib/shopify-ruby/2.1.6-shopify2/lib/ruby/2.1.0/irb/locale.rb:108:in `load'
/usr/lib/shopify-ruby/2.1.6-shopify2/lib/ruby/2.1.0/irb/locale.rb:32:in `initialize'
/usr/lib/shopify-ruby/2.1.6-shopify2/lib/ruby/2.1.0/irb/init.rb:114:in `init_config'
/usr/lib/shopify-ruby/2.1.6-shopify2/lib/ruby/2.1.0/irb/init.rb:16:in `setup'
/usr/lib/shopify-ruby/2.1.6-shopify2/lib/ruby/2.1.0/irb.rb:380:in `start'
/artifacts/ruby/2.1.0/gems/railties-4.2.4/lib/rails/commands/console.rb:110:in `start'
/artifacts/ruby/2.1.0/gems/railties-4.2.4/lib/rails/commands/console.rb:9:in `start'
/artifacts/ruby/2.1.0/gems/railties-4.2.4/lib/rails/commands/commands_tasks.rb:68:in `console'
/artifacts/ruby/2.1.0/gems/railties-4.2.4/lib/rails/commands/commands_tasks.rb:39:in `run_command!'
/artifacts/ruby/2.1.0/gems/railties-4.2.4/lib/rails/commands.rb:17:in `<top (required)>'
/artifacts/ruby/2.1.0/gems/bootscale-0.5.1/lib/bootscale/core_ext.rb:4:in `require'
bin/rails:5:in `<main>’

This confirms that it does indeed happen as part of starting the REPL, and thus is only a problem with the production console. This problem did not manifest in development either.

Looking into the irb source guided by the backtrace it's clear it attempts to pick up the file irb/encoding_aliases.rb. Eventually it makes it's way down to IRB::Locale#search_file to try and find irb/encoding_aliases.rb:

    def search_file(lib_paths, dir, file)
      each_localized_path(dir, file) do |lc_path|
        lib_paths.each do |libpath|
          full_path = File.join(libpath, lc_path)
          return full_path if File.readable?(full_path)
        end
        redo if defined?(Gem) and Gem.try_activate(lc_path)
      end
      nil
    end

The interesting code here is Gem.try_activate(..). If Rubygems is loaded, it'll attempt to call Gem#try_activate in an effort to expand the load path even further to try and find irb/encoding_aliases.rb. Eventually we end up in Gem::Specification#build_extensions as witnessed by the backtrace:

  def build_extensions # :nodoc:
    return if default_gem?
    return if extensions.empty?
    return if installed_by_version < Gem::Version.new('2.2.0.preview.2')
    return if File.exist? gem_build_complete_path
    return if !File.writable?(base_dir)
    return if !File.exist?(File.join(base_dir, 'extensions'))
    # code to build
  end

What seems to be happening is that nokogiri is not quitting this check, but simply proceeding down to build the extension once again.

I added the following line to get an overview of a) whether any other extensions were getting built and I just happened to only catch nokogiri, b) figure out if any of those values were wrong:

puts "name=#{name} installed_by_version=#{installed_by_version.inspect} gem_build_complete_path=#{gem_build_complete_path} build_complete=#{File.exist?(gem_build_complete_path)} whatever=#{ !File.writable?(base_dir) && !File.exist?(File.join(base_dir, 'extensions'))}"

There are three interesting cases here I wanted to inspect the output of:

  1. Gem with rubygems as the source with extensions
  2. Gem with git as the source with extensions

An example of 1) is Shopify/semian:

name=semian installed_by_version=#<Gem::Version "2.2.3"> gem_build_complete_path=/artifacts/ruby/2.1.0/extensions/x86_64-linux/2.1.0-static/semian-0.3.0/gem.build_complete build_complete=true whatever=false

This looks completely reasonable. For nokogiri as an example of 2):

name=nokogiri installed_by_version=#<Gem::Version "2.2.3"> gem_build_complete_path=/artifacts/ruby/2.1.0/bundler/gems/extensions/x86_64-linux/2.1.0-static/nokogiri-e2821bee7e21/gem.build_complete build_complete=false whatever=false

What was even more strange is that all other examples of 2) actually didn't compile. nokogiri is the only gem that actually compiles in the 2) group. The others look like this

name=json installed_by_version=#<Gem::Version "0"> gem_build_complete_path=/artifacts/ruby/2.1.0/bundler/gems/extensions/x86_64-linux/2.1.0-static/json-a41ce74ba367/gem.build_complete build_complete=false whatever=false

What's interesting about those two different examples of 2) is:

  1. nokogiri has a proper installed_by_version, but json has it at the default of 0
  2. Both json and nokogiri's gem.build_complete file doesn't exist
  3. nokogiri compiles, but json does not

gem.build_complete is an empty file rubygems leaves around after compiling the extension.

The reason we're compiling nokogiri and not json is that json has installed_by_version at the default value of 0, which means the check above:

    return if installed_by_version < Gem::Version.new('2.2.0.preview.2')

Makes #build_extensions return early, avoiding the compilation. However, this is just a coincidence (and probably a bug in rubygems). The check that should make it terminate is:

    return if File.exist? gem_build_complete_path

So why is installed_by_version set for nokogiri and not json? Because our fork of nokogiri has installed_by_version set in its gemspec. It's the only example of a Gem with a git source that has this for Shopify:

s.installed_by_version = "2.2.2" if s.respond_to? :installed_by_version

(I haven't dug deeper into is why the input says 2.2.2 when the output above says 2.2.3)

So we now know why nokogiri is the only C-extension compiling: It doesn't return early on the version check. This starts to uncover the actual bug: gem_build_complete_path is not present for git-sourced gems and is not compiling by accident due to the version default.

Sure enough, running find vendor/bundler -name "*gem.build_complete" shows that gem.build_complete exists for all gems that are downloaded for rubygems. However, there are none for the gems with a Git source.

When running the same find(1) command in development "*gem.build_complete" are clearly present in vendor/bundler/bundler/extensions. This directory doesn't exist at all in production. However, when nokogiri compiles it creates the extensions/ directory with the gem.build_complete file in this directory as well as the shared object for the extension.

What's different between development and production? I suspected our Docker image building process
would be the cause, since it does a lot of liberal cleanups to save space:

prune_unwanted_files() {
  set +x
  # Sprockets cache; no value at runtime
  rm -rf ${artifacts_path}/cache/assets
  # Another cache; no value once all gems are installed
  rm -rf ${artifacts_path}/ruby/*/cache
  # git dirs of installed gems are useless once we no longer have to update them
  rm -rf ${artifacts_path}/ruby/*/bundler/gems/*/.git
  # final output is named rlibmemcached*; lib* are intermediate artifacts
  rm -rf ${artifacts_path}/ruby/*/bundler/gems/memcached-*/ext/lib*
  # Again, git dirs useless once we're done installing
  rm -rf ${artifacts_path}/ruby/*/gems/*/.git
  # Large source files, intermediate artifacts
  rm -rf ${artifacts_path}/ruby/*/bundler/gems/nokogiri-*/{ports,ext/nokogiri/tmp}
  # *.o, *.c, *.h, and *.a are intermediate object or source files not loadable
  # at runtime.
  find "${artifacts_path}/ruby" -name '*.{o,c,h,a}' -exec rm -f {} \;
  find "${artifacts_path}/ruby" -name 'gem_make.out' -exec rm -f {} \;
  set -x
}

Unfortunately bundler/extensions is not there. Something else must be removing the extensions/ directory and all its contents. I looked at our container build logs to try and see if anything suspicious was being run after bundle install:

Oct  2 15:15:22.014:    Bundle complete! 213 Gemfile dependencies, 292 gems now installed.
Oct  2 15:15:22.015:    Gems in the groups development, test and debug were not installed.
Oct  2 15:15:22.015:    Bundled gems are installed into /artifacts.
Oct  2 15:15:22.079:    Removing  (extensions)
Oct  2 15:15:22.083:    Removing billing (7570b0472fbe)

The second last line looked extremely suspicious. But where was it coming from? The removal goes back to the --clean flag to bundler(1) which ends up calling Bundler::Runtime#clean. This method is responsible for cleaning up unused gems. However, it doesn't consider the extensions/ directory inside the directory that stores all the gems with a Git source:

stale_git_dirs  = git_dirs - spec_git_paths

extensions/ would be part of git_dirs, but not spec_git_paths and thus would end up being part of the resulting array for stale_git_dirs.

If the extensions/ directory is left around, Rubygems would be able to pick up the gem.build_complete file and not compile the extension for the right reasons which resolves my bug.

Some things to consider at the tail of this:

  • Why are versions not set correctly for Git sourced gems in Rubygems?
  • Why was the version set to 2.2.3 on nokogiri even though the spec says 2.2.2?
  • Why is irb calling Gem.try_activate on every gem? Removing that would break backwards compatibility unfortunately.

@ibawt @csfrancis @camilo @byroot @arthurnn @sgrif @EiNSTeiN- @burke

@hone

@byroot
Copy link
Contributor

byroot commented Oct 3, 2015

👏 This also explain why bundler keep recompiling native gems. You just cut our build time in half @sirupsen

@@ -169,7 +169,7 @@ def clean(dry_run = false)
spec_gem_executables.flatten!

stale_gem_bins = gem_bins - spec_gem_executables
stale_git_dirs = git_dirs - spec_git_paths
stale_git_dirs = git_dirs - spec_git_paths - ["extensions"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would make more sense to remove it on line 143:

git_dirs             = Dir["#{Gem.dir}/bundler/gems/*"]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree

andremedeiros pushed a commit that referenced this pull request Oct 3, 2015
runtime: avoid cleaning up extensions directory
@andremedeiros andremedeiros merged commit a72b9a5 into rubygems:master Oct 3, 2015
@segiddins
Copy link
Member

Woah @andremedeiros, this should not have been merged without a test

@andremedeiros
Copy link
Member

I've tested it myself. @sirupsen will push another PR with tests.

@segiddins
Copy link
Member

Still, changes like this should be tested before they hit master, and merged via @homu

@sirupsen
Copy link
Contributor Author

sirupsen commented Oct 3, 2015

@segiddins agree, this wasn't critical to get merged.

@indirect
Copy link
Member

indirect commented Oct 3, 2015

@sirupsen great catch, also awesome debugging story! I'm very excited to learn why git extensions always recompile, since I've wondered about it for a long time.

@andremedeiros Thanks for reviewing this! In the future, write a comment that includes @homu r+ after you approve. Then the bot will take care of creating the merge, running the tests on the merge, and update master if the merged commit passes.

@segiddins it's totally my fault for not making sure @andremedeiros knew about @homu, sorry. 😕

I don't think we've ever officially stated this before, but as part of the workflow improvements that I've been working on (like homu), from now on I'd like to only accept PRs that include tests that are red before the PR and green after. Bundler is so complex nowadays that regressions seem to happen every time something we care about doesn't have a test. 😝

Thanks for the good work, everyone! 🏆

homu added a commit that referenced this pull request Oct 6, 2015
clean: add test for not removing git extensions artifacts

Unfortunately #4030 was merged before I had a chance to write a test. I confirmed that the test did not pass before my patch.

Any ETA of 1.10.7 which includes #4030? ❤️ 

@indirect @segiddins @andremedeiros @skottler
homu added a commit that referenced this pull request Oct 7, 2015
clean: add test for not removing git extensions artifacts

Unfortunately #4030 was merged before I had a chance to write a test. I confirmed that the test did not pass before my patch.

Any ETA of 1.10.7 which includes #4030? ❤️ 

@indirect @segiddins @andremedeiros @skottler
homu added a commit that referenced this pull request Oct 7, 2015
clean: add test for not removing git extensions artifacts

Unfortunately #4030 was merged before I had a chance to write a test. I confirmed that the test did not pass before my patch.

Any ETA of 1.10.7 which includes #4030? ❤️ 

@indirect @segiddins @andremedeiros @skottler
@coilysiren coilysiren modified the milestone: Release Archive Oct 9, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants