-
Notifications
You must be signed in to change notification settings - Fork 39.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(daemon): create more expections when skipping pods #74856
fix(daemon): create more expections when skipping pods #74856
Conversation
Hi @draveness. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/assign @k82cn |
can you help to add e2e test for this case? |
46f21a1
to
1dad9c4
Compare
97ec9d0
to
c45fd23
Compare
Hi @k82cn, I use a unit test to verify the pods creation expectations which works the same but simpler, is that ok? |
e9e2cd4
to
7e1d314
Compare
Hi, @k82cn it's already been a while. Could you give me some advice on this? Many thanks. |
/ok-to-test |
80607e2
to
8e76760
Compare
/test pull-kubernetes-bazel-test
…On Apr 6, 2019, 10:49 AM +0800, kubernetes/kubernetes ***@***.***>, wrote:
/test pull-kubernetes-bazel-test
|
ea35b7a
to
d8e45d3
Compare
The expectations for the controller is just like remainingCreations what @lavalamp has mentioned above. When there isn't an error with pod creations, the timing of sending creation expectations is when daemon set controller receives create pod notification by the informer. func (dsc *DaemonSetsController) addPod(obj interface{}) {
pod := obj.(*v1.Pod)
// ...
if controllerRef := metav1.GetControllerOf(pod); controllerRef != nil {
ds := dsc.resolveControllerRef(pod.Namespace, controllerRef)
dsKey, err := controller.KeyFunc(ds)
dsc.expectations.CreationObserved(dsKey)
dsc.enqueueDaemonSet(ds)
return
}
// ...
} But FakePodControl does not create pod actually, so we send pods creation expectations in it which means daemonset satisfied expectations after func (dsc *DaemonSetsController) syncDaemonSet(key string) error {
if !dsc.expectations.SatisfiedExpectations(dsKey) {
return dsc.updateDaemonSetStatus(ds, hash, false) // originally returned from here.
}
err = dsc.manage(ds, hash)
if err != nil {
return err
}
if dsc.expectations.SatisfiedExpectations(dsKey) {
switch ds.Spec.UpdateStrategy.Type {
case apps.OnDeleteDaemonSetStrategyType:
case apps.RollingUpdateDaemonSetStrategyType:
err = dsc.rollingUpdate(ds, hash) // !! with rolling update strategy, update daemonset status and send an event.
}
if err != nil {
return err
}
}
err = dsc.cleanupHistory(ds, old)
if err != nil {
return fmt.Errorf("failed to clean up revisions of DaemonSet: %v", err)
}
return dsc.updateDaemonSetStatus(ds, hash, true) // send an event.
} So I change the two test case func TestSufficientCapacityWithTerminatedPodsDaemonLaunchesPod(t *testing.T) {
defer utilfeaturetesting.SetFeatureGateDuringTest(t, utilfeature.DefaultFeatureGate, features.ScheduleDaemonSetPods, false)()
{
strategy := newOnDeleteStrategy()
podSpec := resourcePodSpec("too-much-mem", "75M", "75m")
ds := newDaemonSet("foo")
ds.Spec.UpdateStrategy = *strategy
ds.Spec.Template.Spec = podSpec
manager, podControl, _, err := newTestController(ds)
if err != nil {
t.Fatalf("error creating DaemonSets controller: %v", err)
}
node := newNode("too-much-mem", nil)
node.Status.Allocatable = allocatableResources("100M", "200m")
manager.nodeStore.Add(node)
manager.podStore.Add(&v1.Pod{
Spec: podSpec,
Status: v1.PodStatus{Phase: v1.PodSucceeded},
})
manager.dsStore.Add(ds)
syncAndValidateDaemonSets(t, manager, ds, podControl, 1, 0, 1)
}
{
strategy := newRollbackStrategy()
podSpec := resourcePodSpec("too-much-mem", "75M", "75m")
ds := newDaemonSet("foo")
ds.Spec.UpdateStrategy = *strategy
ds.Spec.Template.Spec = podSpec
manager, podControl, _, err := newTestController(ds)
if err != nil {
t.Fatalf("error creating DaemonSets controller: %v", err)
}
node := newNode("too-much-mem", nil)
node.Status.Allocatable = allocatableResources("100M", "200m")
manager.nodeStore.Add(node)
manager.podStore.Add(&v1.Pod{
Spec: podSpec,
Status: v1.PodStatus{Phase: v1.PodSucceeded},
})
manager.dsStore.Add(ds)
syncAndValidateDaemonSets(t, manager, ds, podControl, 1, 0, 2)
}
} I worked on this for quite a period to get it done, and I have to say it's really hard to test and verify it. And I'm very happy to change it if someone gives me more context and advice.🤣 |
/hold cancel |
e339259
to
7a02eb3
Compare
/test pull-kubernetes-kubemark-e2e-gce-big |
Kindly ping @k82cn @janetkuo @mikedanese for review and approval. |
Agree that this is way to hard to understand, but the fix looks correct. I have one question about why we are seeing two events in those tests now when we weren't before. |
I think I explain the two events in the previous comment - when the stateful set update strategy is
When the expectations are satisfied, and the update strategy is rolling update, the logic will go into
|
Can you address my comments? |
7a02eb3
to
5f8dfdc
Compare
Hi @mikedanese, just resolved the comments, PTAL. |
/lgtm I see SlowStart got copied and pasted a few times. I filed #77436 to clean this up. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: draveness, mikedanese The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind bug
What this PR does / why we need it:
DaemonSetsController creates more expections when errors happens in the 2nd batch.
https://github.com/draveness/kubernetes/blame/master/pkg/controller/daemon/daemon_controller.go#L1027-L1084
skippedPods := createDiff - batchSize
expressions doesn't consider pods which created previously. However, in job_controller it decreases thediff
after each batch's creation.https://github.com/draveness/kubernetes/blame/master/pkg/controller/job/job_controller.go#L765-L809
Does this PR introduce a user-facing change?: