Rerun Failing Tests #882

mattwynne · 2015-07-02T15:16:47Z

e.g. http://maven.apache.org/surefire/maven-surefire-plugin/examples/rerun-failing-tests.html

danascheider · 2015-09-05T12:48:06Z

Anybody working on this?

richarda · 2015-09-05T15:47:24Z

I started, but unfortunately I haven't been able to make time for it
recently. I'd be happy if someone else did, but if not, I'll check in next
week.

On Saturday, September 5, 2015, Dana Scheider [email protected]
wrote:

Anybody working on this?

—
Reply to this email directly or view it on GitHub
#882 (comment)
.

danascheider · 2015-09-07T01:22:35Z

I think I can work on this. I'll take a look at your PR, @richarda, and go from there.

danascheider · 2015-09-12T00:21:39Z

BTW, if anybody has any ideas on how to simulate a flaky test in the features, I'd be glad to hear them. It's trivial to make the feature under test fail or pass at random, but I'm still working out how to make it fail, then pass, as needed for the test. @richarda, I'm using pretty much your same feature file at this point. Any thoughts?

mattwynne · 2015-09-12T10:49:18Z

You could use a global variable in the step def, a bit like we do here: https://github.com/cucumber/cucumber-ruby/blob/master/features/docs/cli/randomize.feature

danascheider · 2015-09-15T03:52:42Z

Thanks, @mattwynne, I'll go ahead and do it that way.

danascheider · 2015-09-26T18:50:56Z

Any objections to my implementing this as a formatter?

brasmusson · 2015-09-28T06:47:48Z

I think the event-API should be used (which @mattwynne change the fail-fast formatter to use a296821, #903). It is intended both for internal clients and external clients like user defined filters. Today filters generally live in Cucumber::Filters, maybe we should create Cucumber::Listeners as a home for event listeners?

mattwynne · 2015-09-28T07:51:34Z

A simplistic solution would be to make this both a filter and an event listener - listen for failed test cases, them pump them back through the filter API for re-evaluation.

Where this falls down is that, presumably, we'll want to be able to report a test-case as passed if it's passed within the specified number of re-runs. To do that we'd need to reach deeper into things. I think this might need changes within the core to do it properly.

mattwynne · 2015-09-28T07:52:05Z

@danascheider I think the place to start is with a failing feature anyway.

mattwynne · 2015-09-28T08:15:50Z

On 28 Sep 2015, at 07:47, Björn Rasmusson [email protected] wrote:
maybe we should create Cucumber::Listeners as a home for event listeners?

Honestly, I don’t particularly like this kind of “layering by category” as a way to organise things. It’s too solution-domain focused and can break cohesion. I can imagine lots of different uses for event listeners, and things that listen for events may have other responsibilities too. Let’s not pre-judge yet.

You’re now making me think I want to get rid of the Filters namespace!

danascheider · 2015-09-28T15:33:31Z

I have failing features on my fork already if you want me to make a PR. I went ahead and wrote them with the same idea that @richarda used, which was that test cases that passed within the specified number of retries were considered passing, but the output noted that they were flakes by saying "1 passing, 1 flake". The disadvantage of that is that it's a little unclear whether it's talking about two test cases or one, so one alternative would be to consider flakes separately from either passes or failures, but still exit with status 0. I'm open to suggestions.

On Sep 28, 2015, at 1:15 AM, Matt Wynne [email protected] wrote:

On 28 Sep 2015, at 07:47, Björn Rasmusson [email protected] wrote:
maybe we should create Cucumber::Listeners as a home for event listeners?

Honestly, I don’t particularly like this kind of “layering by category” as a way to organise things. It’s too solution-domain focused and can break cohesion. I can imagine lots of different uses for event listeners, and things that listen for events may have other responsibilities too. Let’s not pre-judge yet.

You’re now making me think I want to get rid of the Filters namespace!
—
Reply to this email directly or view it on GitHub.

danascheider · 2015-09-28T18:25:01Z

Incidentally, after working on this more, I also came to the conclusion that a formatter is not the best way to do this, and was also tending more toward a filter or event listener. I should have commented to that effect, sorry! And @mattwynne, it is starting to look to me as well that implementing this "right" would involve some changes to core, although probably not very dramatic ones. In any case, if you think of a better implementation that hasn't been discussed here, let me know and I'll be happy to incorporate it.

brasmusson · 2015-09-29T05:52:53Z

One option about the summary output is to use the same principle as the --wip option. For instance if only one scenario was executed first failing and the passing when rerun, the summary output would be something like:

2 scenarios (1 passed, 1 failed)
<step and time report>

All failing scenarios did pass when rerun (rerun was not attempted more the x times)

where "x" is the parameter for the maximum number of rerun attempts.

mattwynne · 2015-09-29T08:14:43Z

Deciding on the output is tricky. I think that the implementation will be enough to focus on for now, so I suggest that for this first iteration, we just stick with the current formatter output.

For example:

Background:
  Given a scenario "Flakey" that fails once, then passes
  And a scenario "Shakey" that fails twice, then passes
  And a scenario "Solid" that passes

Scenario:
  When I run `cucumber --retry 2`
  Then it should pass with:
    """
    3 scenarios (2 passed, 1 failed)
    """

brasmusson · 2015-09-29T09:25:52Z

I also think that focusing on the implementation is enough for now. I think that (the core of) the runner should be ignorant of the fact that the same test case is executed again (and again), it should just get the next test case to be executed and run it. Some other part (maybe in cucumber-ruby-core, maybe in cucumber-ruby) should be responsible to detect failures and trigger reruns.

With some relation to the thoughts about the future of reporting (cucumber-attic/gherkin#12 (comment)), from which I get the idea of a primary (json) report, from which other report are created, I think that the first level of reporting should be very much WYSIWWE ("What You See Is What Was Executed"). In case of the example above I would expect the events/formatter-calls:

before_test_case name="Flakey"
after_test_case name="Flakey" result="Failed"
before_test_case name="Flakey"
after_test_case name="Flakey" result="Passed"
before_test_case name="Shakey"
after_test_case name="Shakey" result="Failed"
before_test_case name="Shakey"
after_test_case name="Shakey" result="Failed"
before_test_case name="Shakey"
after_test_case name="Shakey" result="Passed"
before_test_case name="Solid"
after_test_case name="Solid" result="Passed"

Maybe I interpret --retry 2 differently - as try two times more after the first run - than in the example above.

danascheider · 2015-10-03T06:07:11Z

@brasmusson - I was interpreting --retry 2 the same way you do, as two extra tries, not counting the original run.

mattwynne · 2015-10-03T19:34:32Z

You're right! I wasn't thinking straight. So it should be:

Background:
  Given a scenario "Flakey" that fails once, then passes
  And a scenario "Shakey" that fails twice, then passes
  And a scenario "Solid" that passes

Scenario:
  When I run `cucumber --retry 1`
  Then it should fail with:
    """
    3 scenarios (2 passed, 1 failed)
    """

Scenario:
  When I run `cucumber --retry 2`
  Then it should pass with:
    """
    3 scenarios (3 passed)
    """

I think you can do this with a filter that listens for events, and pumps failed test cases back into the pipeline within the retry limit.

Make sense @danascheider?

mattwynne · 2015-10-03T19:47:04Z

Interesting UI here: https://github.com/NoRedInk/rspec-retry

They use what are effectively tags to mark specific scenarios for being rerun. That seems like it might be more useful than a one-size-fits-all CLI options. WDYT?

danascheider · 2015-10-03T21:30:36Z

The problem I see there is that it would require the user to figure out, or at least guess, which features are likely to be flaky to begin with. I use cucumber-ruby to run integration tests against my Backbone.js app, so thinking of my use case there, literally any of the features are potentially flaky in the event the remote test API hiccups or there's a problem with the connection. Having to put a tag on all of them could be cumbersome. That said, I do think a tag is a good idea in cases like the wire protocol features here. Is there a reason we couldn't implement both?

mattwynne · 2015-10-04T20:17:56Z

Is there a reason we couldn't implement both?

👍

jnpoyser · 2016-02-10T09:42:38Z

@mattwynne has pointed me in this direction from the cukes forums after making a similar request. I'd suggested the following, using an around filter:

Around do |scenario, block|
  begin
    $slowdown = 0
    block.call
  rescue
    $slowdown = 1
    block.call
  end
end

The idea here was to catch the exception of a failing test, and to rerun it - in this case, once. In my particular flaky test, setting a slowdown global helps them to run. If it were implemented using hooks it would give the dev the control over how to handle the failing test. In the use case for Maven it could inspect an environment variable from your CI platform and retry that number of times.

This is all well and good, but there's no exception to catch :-/

I don't know enough about the internals of cukes to make an informed suggestion. When I realised there's no exception to catch, I stopped. Would it be possible to pass another variable into the block, against which the success or failure of the scenario can be set?

diabolo · 2016-02-10T11:08:51Z

@jnpoyser Doesn't your failing expectation create the exception?

jnpoyser · 2016-02-10T11:15:31Z

@diabolo No. Here's an example

Feature: tests
  Scenario: tests
    Given a flipflop test

Given(/^a flipflop test$/) do
  expect($dunit).to be(true)
end

Around do |scenario, block|
  $dunit = false
  begin
    puts "First run"
    block.call
  rescue
    $dunit = true
    puts "Second run"
    block.call
  end
end

Rescue block is not executed

mattwynne · 2016-02-10T22:57:48Z

@jnpoyser @diabolo no, when those Around hook blocks run, any exceptions from the scenario are caught by Cucumber and reported to the formatters. Around hooks are a bit weird.

Let's look forward and imagine how you might do this using the new --retry feature.

We could add a BeforeRetry event that fires just before a failed scenario is retried - then you could set your $slowdown variable inside that. WDYT?

jnpoyser · 2016-02-11T06:59:16Z

@mattwynne retry sounds great, and straightforward - if the beforeretry hook was present then it would solve this problem. It would be a nice-to-have to be able to list the re-tried tests at the end of a run, but that's not a deal breaker - especially as CI servers wouldn't know what to do with that.

brasmusson · 2016-02-11T18:06:52Z

@mattwynne I get really uncomfortable by the notion of the receiver of an "event" being able to influence the "event" sender by setting a variable or something. Then you are talking about something else than events. If it wasn't for that "hook" in Cucumber means something that becomes a step in a test case, it would be a good name for what you are talking about here ("register a BeforeRetry hook that is called just before a failed scenario is retried").

I would also like to relate this to the discussion of Parallel scenario execution and the vision of a Cucumber IO using a set of Cucumber runners distributed across machines. The events you are referring to are all generated inside the runners, and in my mind the will be propagated to the Cucumber IO, so that formatters and other event recipients will not see any difference compared to non-parallel Cucumber execution. In that context an event recipient on the Cucumber IO side, for sure cannot expect to "set a variable inside" and expect that it will influence a runner on a different machine.

Archenoth · 2016-04-15T18:49:50Z

I don't really have too much to add except for another possible use case that a --retry parameter and BeforeRetry hook would allow.

I stumbled on this issue when looking for ways that I could interactively retry individual scenarios from pry on my local box before pushing fixes.

Basically, what I have right now is an After hook that calls binding.pry (Which drops me into a debugger) when a test fails so I can inspect the failed test state manually:

After do |scenario|
  binding.pry if scenario.failed?
end

This works beautifully. I can play around with the thing being tested in pry at the exact moment it failed. I can also hotswap in possible fixes or new features (In the case of TDD) and play around in pry to see if that yields good results.

However, when I try to make a particular test pass, I often just want to run the same test again and again until it works while making changes to the thing I am testing. A BeforeRetry hook could be a very opportune place to put this binding.pry call. It would allow me to put a gigantic number in the --retry parameter, start my tests from the beginning, and just run any failing tests over until I make them work.

Basically I am trying to build bridges as I am crossing them.

mattwynne · 2016-09-23T11:23:24Z

@Archenoth that's an interesting idea. Could we discuss that in a separate ticket?

mattwynne · 2016-09-23T11:44:22Z

I'm closing this ticket. IMO --retry is now working. The summary output could be better, but that's something we need to figure out separately I think.

mattwynne · 2016-09-23T11:44:44Z

Thanks for all your hard work on this one @danascheider @brasmusson and everyone who gave us feedback!

westlakem · 2016-12-19T16:25:25Z

@mattwynne Is there any way to expose the --retry value as an environment variable? that way I can do something like if $total_retries == ENV['MAX_RETRIES']?

lock · 2018-10-25T00:06:32Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

mattwynne added 🚦 needs tests Needs tests ⚡ enhancement Request for new functionality good first issue Good for newcomers labels Jul 2, 2015

mattwynne added this to the 3.0 milestone Jul 2, 2015

richarda mentioned this issue Jul 17, 2015

add an initial feature for rerunning failing tests #895

Closed

danascheider mentioned this issue Oct 3, 2015

Add --retry option to retry failed tests as part of the same run #920

Merged

mattwynne closed this as completed Sep 23, 2016

jgeo mentioned this issue Apr 25, 2017

Retry Option cucumber/cucumber-rails#336

Closed

lock bot locked as resolved and limited conversation to collaborators Oct 25, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rerun Failing Tests #882

Rerun Failing Tests #882

mattwynne commented Jul 2, 2015

danascheider commented Sep 5, 2015

richarda commented Sep 5, 2015

danascheider commented Sep 7, 2015

danascheider commented Sep 12, 2015

mattwynne commented Sep 12, 2015

danascheider commented Sep 15, 2015

danascheider commented Sep 26, 2015

brasmusson commented Sep 28, 2015

mattwynne commented Sep 28, 2015

mattwynne commented Sep 28, 2015

mattwynne commented Sep 28, 2015

danascheider commented Sep 28, 2015

danascheider commented Sep 28, 2015

brasmusson commented Sep 29, 2015

mattwynne commented Sep 29, 2015

brasmusson commented Sep 29, 2015

danascheider commented Oct 3, 2015

mattwynne commented Oct 3, 2015

mattwynne commented Oct 3, 2015

danascheider commented Oct 3, 2015

mattwynne commented Oct 4, 2015

jnpoyser commented Feb 10, 2016

diabolo commented Feb 10, 2016

jnpoyser commented Feb 10, 2016

mattwynne commented Feb 10, 2016

jnpoyser commented Feb 11, 2016

brasmusson commented Feb 11, 2016

Archenoth commented Apr 15, 2016 •

edited

Loading

mattwynne commented Sep 23, 2016

mattwynne commented Sep 23, 2016

mattwynne commented Sep 23, 2016

westlakem commented Dec 19, 2016

lock bot commented Oct 25, 2018

Rerun Failing Tests #882

Rerun Failing Tests #882

Comments

mattwynne commented Jul 2, 2015

danascheider commented Sep 5, 2015

richarda commented Sep 5, 2015

danascheider commented Sep 7, 2015

danascheider commented Sep 12, 2015

mattwynne commented Sep 12, 2015

danascheider commented Sep 15, 2015

danascheider commented Sep 26, 2015

brasmusson commented Sep 28, 2015

mattwynne commented Sep 28, 2015

mattwynne commented Sep 28, 2015

mattwynne commented Sep 28, 2015

danascheider commented Sep 28, 2015

danascheider commented Sep 28, 2015

brasmusson commented Sep 29, 2015

mattwynne commented Sep 29, 2015

brasmusson commented Sep 29, 2015

danascheider commented Oct 3, 2015

mattwynne commented Oct 3, 2015

mattwynne commented Oct 3, 2015

danascheider commented Oct 3, 2015

mattwynne commented Oct 4, 2015

jnpoyser commented Feb 10, 2016

diabolo commented Feb 10, 2016

jnpoyser commented Feb 10, 2016

mattwynne commented Feb 10, 2016

jnpoyser commented Feb 11, 2016

brasmusson commented Feb 11, 2016

Archenoth commented Apr 15, 2016 • edited Loading

mattwynne commented Sep 23, 2016

mattwynne commented Sep 23, 2016

mattwynne commented Sep 23, 2016

westlakem commented Dec 19, 2016

lock bot commented Oct 25, 2018

Archenoth commented Apr 15, 2016 •

edited

Loading