Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Disables flaky test test_operator_gpu.test_deconvolution #14146

Conversation

perdasilva
Copy link
Contributor

@perdasilva perdasilva commented Feb 13, 2019

Description

Disables flaky test
Relates to #10973

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

Disables test_operator_gpu.test_deconvolution test

Comments

======================================================================

FAIL: test_operator_gpu.test_deconvolution

----------------------------------------------------------------------

Traceback (most recent call last):

  File "C:\Anaconda3\envs\py3\lib\site-packages\nose\case.py", line 197, in runTest

    self.test(*self.arg)

  File "C:\Anaconda3\envs\py3\lib\site-packages\nose\util.py", line 620, in newfunc

    return func(*arg, **kw)

  File "C:\jenkins_slave\workspace\ut-python-gpu\tests\python\gpu\../unittest\common.py", line 173, in test_new

    orig_test(*args, **kwargs)

  File "C:\jenkins_slave\workspace\ut-python-gpu\tests\python\gpu\../unittest\test_operator.py", line 1413, in test_deconvolution

    pad                 = (1,1)

  File "C:\jenkins_slave\workspace\ut-python-gpu\tests\python\gpu\../unittest\test_operator.py", line 1305, in check_deconvolution_forward_backward

    assert_almost_equal(out + args_grad_addto_npy[0], args_grad_addto[0].asnumpy(), rtol=1e-3, atol=1e-3)

  File "C:\jenkins_slave\workspace\ut-python-gpu\windows_package\python\mxnet\test_utils.py", line 495, in assert_almost_equal

    raise AssertionError(msg)

AssertionError: 

Items are not equal:

Error 1.290297 exceeds tolerance rtol=0.001000, atol=0.001000.  Location of maximum error:(31, 2, 7, 11), a=0.181059, b=0.182585

 a: array([[[[  71.53303956,   27.22900263,  -81.94053338, ...,  131.83555384,

             0.14714987, -125.67693774],

         [  71.65034077, -127.16477248,   97.06501721, ...,  224.75550074,...

 b: array([[[[  71.53299713,   27.22902107,  -81.94055176, ...,  131.83551025,

             0.14712699, -125.6769104 ],

         [  71.65033722, -127.1647644 ,   97.06500244, ...,  224.75546265,...

-------------------- >> begin captured logging << --------------------

common: INFO: Setting test np/mx/python random seeds, use MXNET_TEST_SEED=1614629591 to reproduce.

--------------------- >> end captured logging << ---------------------

http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Fwindows-gpu/detail/PR-14144/6/pipeline

@perdasilva perdasilva changed the title disable test test_operator_gpu.test_deconvolution Disables flaky test test_operator_gpu.test_deconvolution Feb 13, 2019
@perdasilva
Copy link
Contributor Author

@mxnet-label-bot add [pr-awaiting-review]

@marcoabreu marcoabreu added the pr-awaiting-review PR is waiting for code review label Feb 13, 2019
@perdasilva
Copy link
Contributor Author

@mxnet-label-bot add [flaky, test]

@perdasilva perdasilva force-pushed the disable_flaky_gpu_test_deconvolution branch from 0dd80b5 to 6b90641 Compare February 13, 2019 20:43
@perdasilva perdasilva force-pushed the disable_flaky_gpu_test_deconvolution branch from 6b90641 to b27a6b1 Compare February 14, 2019 12:31
@apeforest
Copy link
Contributor

Have you run the tests with master branch?

@perdasilva
Copy link
Contributor Author

Not specifically. But it was detected in a PR with unrelated changes. My understanding is that the PR jobs take the current master, apply the changes and run validation against that. So, in a way, they were run against master - at the time of that PR.

Copy link
Contributor

@apeforest apeforest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is not a very strict way to identify a flaky test. A flaky test should be the one that fails in the master branch. Any PR no matter how seemlingly unrelated, could have potential impact to the existing code. That's why we have CI in place. Reject this PR unless the test failure could be reproduced on master

@perdasilva
Copy link
Contributor Author

perdasilva commented Feb 18, 2019

I think as a general statement, what you are saying makes sense. But in this case (#14144) - which I would suggest you look at - the changes were to the test script for the installation documentation. I find it difficult to believe that a bash script run by CI would trigger the (flaky) test_operator_gpu.test_deconvolution. Especially since, a) the PR was merged after running the job a second time (on the same code) and ended up green and b) the error report from the jenkins job (bellow) has all the hallmarks of a flaky test, namely miniscule floating point variations.

Items are not equal:

Error 1.290297 exceeds tolerance rtol=0.001000, atol=0.001000.  Location of maximum error:(31, 2, 7, 11), a=0.181059, b=0.182585

 a: array([[[[  71.53303956,   27.22900263,  -81.94053338, ...,  131.83555384,
             0.14714987, -125.67693774],
         [  71.65034077, -127.16477248,   97.06501721, ...,  224.75550074,...

 b: array([[[[  71.53299713,   27.22902107,  -81.94055176, ...,  131.83551025,
             0.14712699, -125.6769104 ],
         [  71.65033722, -127.1647644 ,   97.06500244, ...,  224.75546265,...

I would argue that the CI is in place to catch potential errors of merging you code in to the code base. If it catches an error on one run, but not on a second re-run of the exact same code, then the unreliably failing test is flaky by definition and CI isn't doing its job properly.

Given all the data above, I don't feel the need to spend time reproducing this in master.

@apeforest
Copy link
Contributor

apeforest commented Feb 21, 2019

@perdasilva Thanks for your detailed explanation. I am not disagreeing with your analysis. However, such analysis based on individual's expertise case by case IMHO is not a scalable mechanism for a software release and deployment process. On the other hand, even if this PR passes and fails CI tests at random times, we cannot exclude the possibility that the flaskiness was introduced by this particular PR. The most reliable way to monitor flaky tests should be and only be through master branch over a period of time.

CI is our last guard to protect the quality of our software. I think it's better to be more conservative than optimistic.

@perdasilva
Copy link
Contributor Author

@apeforest sure, no problems. Let's kick the ball down the road.

@perdasilva perdasilva closed this Feb 22, 2019
@perdasilva perdasilva deleted the disable_flaky_gpu_test_deconvolution branch February 22, 2019 13:55
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Flaky pr-awaiting-review PR is waiting for code review Test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants