Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cucumber fails to terminate from subshell #1372

Closed
sleekweasel opened this issue Oct 29, 2019 · 7 comments
Closed

Cucumber fails to terminate from subshell #1372

sleekweasel opened this issue Oct 29, 2019 · 7 comments
Labels
🐛 bug Defect / Bug ⌛ stale Will soon be closed by stalebot unless there is activity

Comments

@sleekweasel
Copy link

Summary

When cucumber is started from a subshell (run by https://github.com/badoo/parallel_cucumber) and ^C is hit, it sometimes ends up in a state where all threads are dead, but somehow the process doesn't exit and eats an entire CPU.

I attached gdb to try to explore, per:

gdb /usr/share/rvm/rubies/ruby-2.6.3/bin/ruby 7796
(gdb) define ruby_eval
call(rb_p(rb_eval_string_protect($arg0,(int*)0)))
end

redirected stdout (and stderr)

(gdb) call rb_eval_string("$_old_stdout, $stdout = $stdout, File.open('/tmp/ruby-debug.' + Process.pid.to_s, 'a'); $_old_stderr, $stdout = $stderr, $stdout; $stdout.sync = true")

I tried to get a stacktrace:

(gdb) ruby_eval("Kernel.caller")
["/home/tim/.rvm/gems/ruby-2.6.3/gems/cucumber-2.99.0/lib/cucumber/formatter/interceptor.rb:17:in write'", "/home/tim/.rvm/gems/ruby-2.6.3/gems/cucumber-2.99.0/lib/cucumber/formatter/interceptor.rb:17:in block in write'", "/home/tim/.rvm/gems/ruby-2.6.3/gems/cucumber-2.99.0/lib/cucumber/formatter/interceptor.rb:15:in synchronize'", "/home/tim/.rvm/gems/ruby-2.6.3/gems/cucumber-2.99.0/lib/cucumber/formatter/interceptor.rb:15:in write'"]

then looked at Thread.list

(gdb) ruby_eval("Thread.list")
[]

(gdb) ruby_eval("Thread.current")
#<Thread:0x0000556f6bde72d8 dead>

(gdb) call (void) rb_backtrace()
Program received signal SIGPIPE, Broken pipe.
0x00007fe3f6f5a187 in __GI___libc_write (fd=2, buf=0x7fffb72e1510, nbytes=110) at ../sysdeps/unix/sysv/linux/write.c:27
27 ../sysdeps/unix/sysv/linux/write.c: No such file or directory.
... (stuff about receiving SIGPIPE)

(gdb) ruby_eval("$stderr")
somehow resulted in the following (with truncation)

#<Cucumber::Formatter::Interceptor::Pipe:0x0000556f70958aa0 @pipe=#<IO:>, @buffer=["/home/tim/.rvm/gems/ruby-2.6.3/gems/cucumber-2.99.0/lib/cucumber/cli/main.rb:58:in write'", "/home/tim/.rvm/gems/ruby-2.6.3/gems/cucumber-2.99.0/lib/cucumber/cli/main.rb:58:in write'",
... some 308716000 bytes later ...
"/home/tim/.rvm/gems/ruby-2.6.3/gems/cucumber-2.99.0/lib/cucumber/cli/main.rb:58:in write'", "/home/tim/.rvm/gems/ruby-2.6.3/gems/cucumber-2.99.0/lib/cucumber/cli/main.rb:58:in write'"], @wrapped=true, @lock=#Thread::Mutex:0x0000556f700aed00>

I observe that cucumber/cli/main.rb:58 doesn't mention 'write' directly:

57: rescue Exception => e
58: @err.puts("#{e.message} (#{e.class})")
59: @err.puts(e.backtrace.join("\n"))

Expected Behavior

Cucumber would quit.

Current Behavior

It hangs the process after all theads have terminated and eats a whole CPU

Suggested Solution

I'm guessing that - presumably in main.rb:58, it needs to avoid writing to stderr if it's broken.

Steps to Reproduce (for bugs)

It's tricky: the process tree shows that it's just a subprocess of shell:

3578 ? S 0:00 sh -c "/usr/share/rvm/rubies/ruby-2.6.3/bin/ruby" -S cucumber --format ParallelCucumber::Helper::Cucumber::JsonStatusFormatter --out /tmp/w-1571919064-1/test_state.json --expand --format Shared::Cucumber::Formatter::CustomJunit --out /tmp/w-1571919064-1/junit_1571919064-1/ --format Shared::Cucumber::Formatter::CustomPretty --no-color APP=badoo -t ~@hon -t ~@not_badoo PLATFORM=android --strict -r support/ -r require_dependencies.rb --exclude features/functional/TODOS/ --format Shared::Cucumber::Formatter::CustomPretty -t @android -t ~@devel -t ~@liveshots -t ~@two_persona -t ~@c_livestream -t ~@ui_perf_dump -t ~@android_real_device -t ~@c_billing MAPPING_PATH=/home/tim/git/bma-calabash/artifacts/badoo-internal-QA-dev-39685.mapping.txt -t ~@android_real_device badoo/features/functional/connections/connections_filters.feature:261 APP_PATH="/home/tim/git/bma-calabash/artifacts/badoo-internal-QA-dev-39685.apks.unzipped/splits/base-master.apk" TEST_APP_PATH="test_servers/98368556ed073bb93e9145b6e3982eae_0.9.12.uia2.apk"

3579 ? R 7059:54 _ /usr/share/rvm/rubies/ruby-2.6.3/bin/ruby -S cucumber --format ParallelCucumber::Helper::Cucumber::JsonStatusFormatter --out /tmp/w-1571919064-1/test_state.json --expand --format Shared::Cucumber::Formatter::CustomJunit --out /tmp/w-1571919064-1/junit_1571919064-1/ --format Shared::Cucumber::Formatter::CustomPretty --no-color APP=badoo -t ~@hon -t ~@not_badoo PLATFORM=android --strict -r support/ -r require_dependencies.rb --exclude features/functional/TODOS/ --format Shared::Cucumber::Formatter::CustomPretty -t @android -t ~@devel -t ~@liveshots -t ~@two_persona -t ~@c_livestream -t ~@ui_perf_dump -t ~@android_real_device -t ~@c_billing MAPPING_PATH=/home/tim/git/bma-calabash/artifacts/badoo-internal-QA-dev-39685.mapping.txt -t ~@android_real_device badoo/features/functional/connections/connections_filters.feature:261 APP_PATH=/home/tim/git/bma-calabash/artifacts/badoo-internal-QA-dev-39685.apks.unzipped/splits/base-master.apk TEST_APP_PATH=test_servers/98368556ed073bb93e9145b6e3982eae_0.9.12.uia2.apk

I know the foregoing isn't remotely a reproducible case.

But essentially, I'm running rake, which runs parallel_cucumber, which runs cucumber. When I ^C, someone catches it and terminates most of the processes so I get my prompt back, but these cucumbers are left in what seems to be a SIGPIPE based loop. Eventually I notice and kill -9 them.

Context & Motivation

Primarily, I don't want my CPUs eaten. I can add some process-level tidy-up to parallel_cucumber, but this looks like a cucumber or Ruby problem.

I'm going to try adding a SIGPIPE handler that calls Kernel.exit! if Threads.list is null or empty, but I already tried that with a gdb eval, and this looks like something right at the very end of exiting, so maybe Ruby isn't even available at that point.

Your Environment

  • Version used: ruby-2.6.3 cucumber-2.99.0
  • Operating System and version: Ubuntu Bionic, but happens on Mac too.
  • Link to your project:
@vincent-psarga vincent-psarga added the 🐛 bug Defect / Bug label Oct 30, 2019
@vincent-psarga
Copy link
Contributor

vincent-psarga commented Oct 30, 2019

Thank you for this really complete report. By any chance, have you tried reproducing this bug with a more recent version of Cucumber ? (if possible of course)
(I don't see anything in the Changelog that would relate to your particular issue, but it may be worth checking).

Edit: I'm trying out some bash scripts to ease reproducibility of that, not yet using parallel Cucumber.

#!/usr/bin/env bash

echo "Running cucumber suite"
bundle exec cucumber &
PID=$!
sleep 2
echo "Killing cucumber suite ($PID)"
kill $PID
echo "---------------------------------------------------"
ps -efa | grep cucumber

@sleekweasel
Copy link
Author

sleekweasel commented Oct 31, 2019

I've wrapped the main.rb exception handler:

  rescue Exception => e
    if Thread.current.alive?
      @err.puts("#{e.message} (#{e.class})")
      @err.puts(e.backtrace.join("\n"))
    end
    exit_unable_to_finish
  end

which seems to be helping so far. I suppose instead of simply dropping the messages, it could try opening a /tmp/$0.$$.panic file and send the data there?

Are there real situations when Thread.current should be dead?(!)

@sleekweasel
Copy link
Author

Oh, and with regard to trying with the latest cucumber... it's a bit tricky with all our dependencies, but I'll see what I can manage.

@sleekweasel
Copy link
Author

The 'alive' check didn't work reliably. I thought a counting guard would be better, but this still ends up with an infinite stacktrace on the @err.puts("#{message} @{class}" line, which looks really weird because... how is it calling repeatedly from that line, when that line is guarded by the exit_attempt global?

I tried moving to cucumber 3, but we have dependencies on rb_dsl and my timebox ran out. I'll try again when I have another one.

  rescue Exception => e
    #if Thread.current.alive? # TIMB-try to fix hanging on ^C
    $exit_attempt ||= 1
    if $exit_attempt == 1
      $exit_attempt -= 1
      @err = STDERR # Try avoiding it completely
      @err.puts("#{e.message} (#{e.class})")
      @err.puts(e.backtrace.join("\n"))
      @err = STDERR
      exit_unable_to_finish
    else
      STDERR.puts("NOTE: Recursive exception abort: #{__FILE__}:#{__LINE__}")
      STDERR.puts("#{e.message} (#{e.class})")
      STDERR.puts(e.backtrace.join("\n"))
      exit_unable_to_finish!
    end

hopefully STDERR really is a constant, and not yet another thing someone can overwrite in some random library because they think they know better.

@luke-hill
Copy link
Contributor

We made a "slight" modification to the exit logic (Due to an outstanding long issue), but it was only about the timing (So it probably isn't relevant).

If you want to check it out, see if it is relevant. It was merged into the master tree quite recently. More info here: #1353

@stale
Copy link

stale bot commented Jan 31, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs.

@stale stale bot added the ⌛ stale Will soon be closed by stalebot unless there is activity label Jan 31, 2020
@stale
Copy link

stale bot commented Feb 7, 2020

This issue has been automatically closed because of inactivity. You can support the Cucumber core team on opencollective.

@stale stale bot closed this as completed Feb 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐛 bug Defect / Bug ⌛ stale Will soon be closed by stalebot unless there is activity
Projects
None yet
Development

No branches or pull requests

3 participants