Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deployment E2E tests failing in upgrade test jobs #42449

Closed
skriss opened this issue Mar 2, 2017 · 17 comments
Closed

Deployment E2E tests failing in upgrade test jobs #42449

skriss opened this issue Mar 2, 2017 · 17 comments
Assignees
Labels
area/workload-api/deployment kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. kind/flake Categorizes issue or PR as related to a flaky test. sig/apps Categorizes an issue or PR as relevant to SIG Apps.
Milestone

Comments

@skriss
Copy link
Contributor

skriss commented Mar 2, 2017

@pwittrock you are listed as the test owner in test_owners.csv so directing this at you for now.

Most of the Deployment E2E tests are failing in the upgrade test jobs. See, for example:

https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/ci-kubernetes-e2e-gke-container_vm-1.4-container_vm-latest-upgrade-cluster/479

Common error message:

Expected an error to have occurred. Got:
: nil

Note that the 1.6 release upgrade test jobs are not set up yet, but I'm trying to get ahead of issues by looking at the jobs running off master. Thanks!

@0xmichalis
Copy link
Contributor

@kubernetes/sig-apps-test-failures

@0xmichalis
Copy link
Contributor

Why are we expecting an error to occur on every test?:)

@0xmichalis 0xmichalis self-assigned this Mar 3, 2017
@0xmichalis
Copy link
Contributor

Is this test running 1.6 tests against a 1.4 cluster? This should explain the other deployment failure regarding the overlapping test.

@0xmichalis 0xmichalis added area/workload-api/deployment sig/apps Categorizes an issue or PR as relevant to SIG Apps. labels Mar 3, 2017
@skriss
Copy link
Contributor Author

skriss commented Mar 3, 2017

It should actually be the opposite, i.e. running 1.4 tests against a 1.6 cluster (that was upgraded from a 1.4 cluster)

@0xmichalis
Copy link
Contributor

The deployment overlap behavior has changed in 1.6 and is about to be dropped entirely, assuming #42175 gets merged. It was just bandage over a user error but that should not be possible in 1.6 with owner references. This means that the test is obsolete. I don't know what happens in this case. cc: @janetkuo

@janetkuo
Copy link
Member

janetkuo commented Mar 4, 2017

Looking at the test, it expects get deployment to fail (right after calling deployment reaper) to make sure it's deleted https://github.com/kubernetes/kubernetes/blob/b440e9a9dbb30b/test/e2e/deployment.go#L206

@janetkuo
Copy link
Member

janetkuo commented Mar 4, 2017

Something to do with garbage collector?

@janetkuo janetkuo added this to the v1.6 milestone Mar 4, 2017
@ethernetdan ethernetdan added the kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. label Mar 9, 2017
@skriss
Copy link
Contributor Author

skriss commented Mar 9, 2017

@Kargakis @janetkuo can you clarify the status? To be clear, for 1.5->1.6 upgrades, there are 2 failing Deployment tests in the jobs that perform an upgrade then run 1.5 tests to check for compatibility. Are these issues that need to be addressed for the 1.6 release?

[k8s.io] Deployment deployment reaping should cascade to its replica sets and pods
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/deployment.go:62
Expected an error to have occurred. Got:
: nil
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/deployment.go:192

[k8s.io] Deployment overlapping deployment should not fight with each other
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/deployment.go:98
Failed to update the second deployment's overlapping annotation
Expected error:
<*errors.errorString | 0xc420388ca0>: {
s: "timed out waiting for the condition",
}
timed out waiting for the condition
not to have occurred
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/deployment.go:1223

Note that both of these pass in the jobs that upgrade 1.5->1.6 and then run the 1.6 tests.

@0xmichalis
Copy link
Contributor

upgrade 1.5->1.6, run 1.5 tests: https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/ci-kubernetes-e2e-gke-container_vm-1.5-container_vm-1.6-upgrade-cluster/21

The overlapping test is obsolete for 1.6 as the functionality has been totally rewritten.
@krmayankk can you have a look in the cascading deletion test as you introduced server side deletion in Deployments?

upgrade 1.5->1.6, run 1.6 tests: https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/ci-kubernetes-e2e-gke-container_vm-1.5-container_vm-1.6-upgrade-cluster-new/22

This is something that @enisoc is currently looking into - seems like the problem is that the upgrade suite is compiled with a 1.6 when it runs the 1.5 part of the upgrade.

@skriss
Copy link
Contributor Author

skriss commented Mar 13, 2017

@krmayankk please update by EOD today on whether the cascading deletion failure is a blocking issue for 1.6 or not; if it is, we need to get a PR in by tomorrow for the next beta/RC.

@0xmichalis
Copy link
Contributor

@caesarxuchao can you also have a look? It seems related to the kubectl reaper not deleting the deployment, likely because it goes through the GC in the end?

@caesarxuchao
Copy link
Member

caesarxuchao commented Mar 13, 2017

The "[k8s.io] Deployment deployment reaping should cascade to its replica sets and pods" failure is because the deployment is not deleted yet when the 1.5 reaper.Stop() returns, because of the orphan finalizer. See #35676 (comment). 1.6 kubectl won't have this problem because of #40576 (thanks to @nikhiljindal).

I expect the upgrade test will fail all the deletion test for other controllers as well, after @enisoc's PRs that add controllerRef get merged.

I suggest that

  1. we communicate this behavior change of old kubectl via CHANGELOG.
  2. Cherrypick Updating kubectl to send delete requests with orphanDependents=false if --cascade is true #40576 to 1.5.

@ethernetdan ethernetdan added the kind/flake Categorizes issue or PR as related to a flaky test. label Mar 14, 2017
@janetkuo
Copy link
Member

janetkuo commented Mar 14, 2017

Filed #43041 to fix this test failure by updating 1.5 reapers (#40576 has changes we don't want, so filed a separate PR).

@skriss this doesn't block 1.6. The fix targets 1.5.

k8s-github-robot pushed a commit that referenced this issue Mar 14, 2017
Automatic merge from submit-queue

Updating reapers to set OrphanDependents=false

For #42449, updating 1.5 reapers 

@caesarxuchao @kubernetes/sig-cli-pr-reviews
@ethernetdan ethernetdan modified the milestones: v1.5, v1.6 Mar 14, 2017
@0xmichalis
Copy link
Contributor

@skriss can you give an update on this issue?

@skriss
Copy link
Contributor Author

skriss commented Mar 20, 2017

mintzhao pushed a commit to mintzhao/kubernetes that referenced this issue Jun 1, 2017
…orphan

Automatic merge from submit-queue

Updating reapers to set OrphanDependents=false

For kubernetes#42449, updating 1.5 reapers 

@caesarxuchao @kubernetes/sig-cli-pr-reviews
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/workload-api/deployment kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. kind/flake Categorizes issue or PR as related to a flaky test. sig/apps Categorizes an issue or PR as relevant to SIG Apps.
Projects
None yet
Development

No branches or pull requests

5 participants