Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: wait for pods to get IPs #16001

Merged
merged 1 commit into from
Mar 22, 2023

Conversation

prezha
Copy link
Contributor

@prezha prezha commented Mar 8, 2023

fixes #15870
related to Mirantis/cri-dockerd#163

i could replicate the original issue with --driver=docker and --container-runtime=docker on both linux and macos and observed that the busybox pod on the cp node gets ip after ~5sec while the other busybox pod takes significantly longer but eventually (after ~40sec) gets ip on the second node

with this pr we retry until both pods get IPs (max 120sec) so that the multinode tests don't fail because of this delay

the below screenshot from linux (similarly on macos) shows that there's a delay between two pods getting ip - marked with a red rectangle

on a side note, but i think still interesting, both pods were marked as having Ready condition while only one had ip, which probably should not happen (based on the Pod Lifecycle), but that's potentially a separate topic - marked with amber rectangle on the screenshot, which is the reason why we cannot wait for the pods to become eg Ready but retry instead
linux

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Mar 8, 2023
@k8s-ci-robot k8s-ci-robot requested a review from afbjorklund March 8, 2023 21:05
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: prezha

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot requested a review from spowelljr March 8, 2023 21:05
@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 8, 2023
@prezha
Copy link
Contributor Author

prezha commented Mar 8, 2023

/ok-to-test

@k8s-ci-robot k8s-ci-robot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Mar 8, 2023
@prezha
Copy link
Contributor Author

prezha commented Mar 8, 2023

ok, so following the "side note" from the above, i think i've just confirmed that the issue is not with the cri-dockerd upgrade but actually, probably an issue/bug introduced in kubernetes 1.26.1+, as with 1.26.0 we didn't and don't have that delay issue - ie, both pods get IPs within few seconds and the multinode tests complete w/o issues (and without these pr changes) - as shown in the screenshot below:
Screenshot 2023-03-08 at 22 04 17
k8s-1.26.0-pass.log
k8s-1.26.1-fail.log
k8s-1.26.2-fail.log

@minikube-pr-bot
Copy link

kvm2 driver with docker runtime

+----------------+----------+---------------------+
|    COMMAND     | MINIKUBE | MINIKUBE (PR 16001) |
+----------------+----------+---------------------+
| minikube start | 53.7s    | 54.0s               |
| enable ingress | 25.2s    | 25.5s               |
+----------------+----------+---------------------+

Times for minikube start: 52.6s 53.4s 55.8s 53.2s 53.8s
Times for minikube (PR 16001) start: 56.2s 52.3s 54.6s 53.1s 53.7s

Times for minikube ingress: 24.7s 26.2s 25.7s 25.2s 24.2s
Times for minikube (PR 16001) ingress: 27.7s 24.8s 25.3s 24.2s 25.7s

docker driver with docker runtime

+----------------+----------+---------------------+
|    COMMAND     | MINIKUBE | MINIKUBE (PR 16001) |
+----------------+----------+---------------------+
| minikube start | 26.8s    | 26.3s               |
| enable ingress | 20.8s    | 21.8s               |
+----------------+----------+---------------------+

Times for minikube start: 26.1s 27.1s 26.9s 25.7s 28.1s
Times for minikube (PR 16001) start: 26.8s 26.7s 25.4s 26.2s 26.5s

Times for minikube ingress: 21.0s 21.0s 19.6s 20.5s 21.6s
Times for minikube (PR 16001) ingress: 25.1s 20.6s 23.1s 20.1s 20.1s

docker driver with containerd runtime

+----------------+----------+---------------------+
|    COMMAND     | MINIKUBE | MINIKUBE (PR 16001) |
+----------------+----------+---------------------+
| minikube start | 22.7s    | 22.5s               |
| enable ingress | 32.0s    | 32.6s               |
+----------------+----------+---------------------+

Times for minikube start: 21.9s 22.8s 22.1s 23.4s 23.2s
Times for minikube (PR 16001) start: 22.5s 22.4s 22.4s 22.6s 22.6s

Times for minikube ingress: 32.6s 31.6s 31.6s 32.6s 31.6s
Times for minikube (PR 16001) ingress: 31.5s 33.1s 33.0s 31.6s 33.6s

@minikube-pr-bot
Copy link

These are the flake rates of all failed tests.

Environment Failed Tests Flake Rate (%)
KVM_Linux_containerd TestStoppedBinaryUpgrade/Upgrade (gopogh) 1.79 (chart)
Hyper-V_Windows TestStartStop/group/no-preload/serial/AddonExistsAfterStop (gopogh) 2.60 (chart)
Hyper-V_Windows TestStartStop/group/no-preload/serial/SecondStart (gopogh) 2.60 (chart)
Hyper-V_Windows TestStartStop/group/no-preload/serial/UserAppExistsAfterStop (gopogh) 2.60 (chart)
Hyper-V_Windows TestStartStop/group/no-preload/serial/VerifyKubernetesImages (gopogh) 2.60 (chart)
KVM_Linux TestStartStop/group/embed-certs/serial/AddonExistsAfterStop (gopogh) 4.76 (chart)
KVM_Linux TestStartStop/group/embed-certs/serial/Pause (gopogh) 4.76 (chart)
KVM_Linux TestStartStop/group/embed-certs/serial/SecondStart (gopogh) 4.76 (chart)
KVM_Linux TestStartStop/group/embed-certs/serial/UserAppExistsAfterStop (gopogh) 4.76 (chart)
KVM_Linux TestStartStop/group/embed-certs/serial/VerifyKubernetesImages (gopogh) 4.76 (chart)
Hyper-V_Windows TestStartStop/group/no-preload/serial/Pause (gopogh) 5.19 (chart)
Hyper-V_Windows TestFunctional/parallel/ImageCommands/ImageLoadDaemon (gopogh) 52.56 (chart)
Hyper-V_Windows TestFunctional/parallel/ImageCommands/ImageLoadFromFile (gopogh) 52.56 (chart)
Hyper-V_Windows TestFunctional/parallel/ImageCommands/ImageReloadDaemon (gopogh) 52.56 (chart)
Hyper-V_Windows TestFunctional/parallel/ImageCommands/ImageSaveDaemon (gopogh) 52.56 (chart)
Hyper-V_Windows TestFunctional/parallel/ImageCommands/ImageSaveToFile (gopogh) 52.56 (chart)
Hyper-V_Windows TestFunctional/parallel/ImageCommands/ImageTagAndLoadDaemon (gopogh) 52.56 (chart)
Hyper-V_Windows TestFunctional/parallel/ImageCommands/Setup (gopogh) 52.56 (chart)
Hyper-V_Windows TestRunningBinaryUpgrade (gopogh) 70.51 (chart)
Hyper-V_Windows TestStoppedBinaryUpgrade/Upgrade (gopogh) 84.62 (chart)
Hyper-V_Windows TestNoKubernetes/serial/StartWithStopK8s (gopogh) 94.44 (chart)
Docker_macOS TestIngressAddonLegacy/serial/ValidateIngressAddonActivation (gopogh) 100.00 (chart)
Docker_macOS TestIngressAddonLegacy/serial/ValidateIngressAddons (gopogh) 100.00 (chart)
Docker_macOS TestIngressAddonLegacy/serial/ValidateIngressDNSAddonActivation (gopogh) 100.00 (chart)
Docker_macOS TestIngressAddonLegacy/StartLegacyK8sCluster (gopogh) 100.00 (chart)
Docker_macOS TestKubernetesUpgrade (gopogh) 100.00 (chart)
Docker_macOS TestMissingContainerUpgrade (gopogh) 100.00 (chart)
Docker_macOS TestRunningBinaryUpgrade (gopogh) 100.00 (chart)
Docker_macOS TestStartStop/group/old-k8s-version/serial/AddonExistsAfterStop (gopogh) 100.00 (chart)
Docker_macOS TestStartStop/group/old-k8s-version/serial/DeployApp (gopogh) 100.00 (chart)
More tests... Continued...

Too many tests failed - See test logs for more details.

To see the flake rates of all tests by environment, click here.

Copy link
Member

@medyagh medyagh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for this PR that fixes the failed test, is this something we can add to the minikube code it self as additional Verificaiton ? like if we do minkube start --wait=all would we need to add a new Wait componenet to make sure it waits for all ?

@medyagh
Copy link
Member

medyagh commented Mar 9, 2023

@prezha what is your guess that these tests started ti fail around same time as cri-dockerd change?

Mirantis/cri-dockerd#163

@medyagh
Copy link
Member

medyagh commented Mar 9, 2023

ok, so following the "side note" from the above, i think i've just confirmed that the issue is not with the cri-dockerd upgrade but actually, probably an issue/bug introduced in kubernetes 1.26.1+, as with 1.26.0 we didn't and don't have that delay issue - ie, both pods get IPs within few seconds and the multinode tests complete w/o issues (and without these pr changes) - as shown in the screenshot below:
Screenshot 2023-03-08 at 22 04 17
k8s-1.26.0-pass.log
k8s-1.26.1-fail.log
k8s-1.26.2-fail.log

Oh thats interesting. Should we create an issue on k8s repo?

@prezha
Copy link
Contributor Author

prezha commented Mar 9, 2023

@medyagh there could be a couple of things that interfere in a bad way - not sure which one exactly atm, but yeah: the same minikube current head (ie, containing the updated cri-dockerd v0.3.1) does work with k8s v1.26.0 (set via --kubernetes-version flag) and does not work with k8s v1.26.1+ (ie, having that delay of assigning ip addresses to pods deployed to non-cp nodes, with k8s v1.26.2 being the current default "not working" the same way) - that could be due to the changes introduced in k8s v1.26.1, a combination of cri-dockerd v0.3.1 and the changes introduced in k8s v1.26.1, and/or one/both of these two in combination with the cni (kindnet in this case)
i'll try to run some more tests to see if we can rule out at least some of these factors to narrow the problem a bit

you are most probably right in saying that the problem surfaced with cri-dockerd update to v0.3.1 but at that time k8s default version was already set to v1.26.1 - based on https://storage.googleapis.com/minikube-builds/logs/15752/27907/Docker_Linux.html#fail_TestMultiNode%2fserial%2fDeployApp2Nodes, we have:

I0215 18:24:40.207579 1415225 preload.go:132] Checking if preload exists for k8s version v1.26.1 and runtime docker

so it could be due to a combination of cri-dockerd v0.3.1 and k8s v1.26.1 (as k8s v1.26.0 works with cri-dockerd v0.3.1)

@medyagh
Copy link
Member

medyagh commented Mar 10, 2023

How about we try a diffrent cni and see if that gets passed the problem of pods getting ips late?

@prezha
Copy link
Contributor Author

prezha commented Mar 20, 2023

@medyagh i've run a number of TestMultiNode tests with docker/docker on different OSs (opensuse tumbleweed, ubuntu 20.04/22.04, macos m1), with different images (busybox, agnhost) and different CNIs (flannel, calico and also older kindnet images) and they all had the similar issue (the difference is only how big the delay is before all pods eventually get an ip, but the test, as it is now, would fail for most of the time - at times it just gets lucky to pass)

tl;dr: i think that this pr is valid and could help us mitigate this test's flakiness as it explicitly waits for all the test pods to get IPs
(which will eventually happen after some delay)

details:

as mentioned earlier, test pods (ie, busybox) are both marked with "Ready" status even though the pod on the worker node did not get ip - example snapshot:
TestMultinode
which is the reason why our awaiting ends prematurely (ie, kubectl rollout status has Watch the status of the rollout until it's done set to true by default) and then the test fails when the pods' IPs are checked right thereafter

now, based on the aforementioned Pod Lifecycle/Pod conditions:

Ready: the Pod is able to serve requests and should be added to the load balancing pools of all matching Services.

which i interpret as: if a pod is Ready, then it would probably also mean that it has to have an ip to be able to communicate over the network and serve requests

if i'm not wrong, kubernetes gets pod/container(s) status (amongst other things) via the kubelet's (generic atm) PLEG module that talks with the cr (cri-docker, in this case)

now, looking at the code, i think that cri-docker separates pod's status from whether it has ip or not, eg:
PodSandboxStatus func and toRuntimeAPISandboxState func (whereas libdocker.StatusRunningPrefix relies purely on having the "Up*" value in response)

as seen on the snapshot above, docker can report that container/pod is up/running even though it does not have an ip address

it then also seems that the kubelet/pleg accepts the status response from the runtime "as-is" and does not check if the Ready pod also has an ip assigned - eg, the updateCache func calls runtime to GetsPodStatus, then gets pod's ip(s) (if any, but does not check!) and then just sets the status as received in runtime's response - from increased-verbosity kubelet log re: snapshot above (level 7, attached "kubelet-m02.log" below in full) - line 2884:

Mar 19 20:53:16 multinode-877589-m02 kubelet[1337]: I0319 20:53:16.289327 1337 generic.go:453] "PLEG: Write status" pod="default/busybox-6b86dd6d48-ttnwd" podStatus=&{ID:6fe19c31-5d2c-444e-8562-69f4c31186f4 Name:busybox-6b86dd6d48-ttnwd Namespace:default IPs:[] ContainerStatuses:[] SandboxStatuses:[&PodSandboxStatus{Id:a926fc57856373944f563b0d8ebc1bd3befda7fe42e00932dc6ae0e8b97d91e7,Metadata:&PodSandboxMetadata{Name:busybox-6b86dd6d48-ttnwd,Uid:6fe19c31-5d2c-444e-8562-69f4c31186f4,Namespace:default,Attempt:0,},State:SANDBOX_READY,CreatedAt:1679259195437451087,Network:&PodSandboxNetworkStatus{Ip:,AdditionalIps:[]*PodIP{},},Linux:&LinuxPodSandboxStatus{Namespaces:&Namespace{Options:&NamespaceOption{Network:POD,Pid:CONTAINER,Ipc:POD,TargetId:,UsernsOptions:nil,},},},Labels:map[string]string{app: busybox,io.kubernetes.pod.name: busybox-6b86dd6d48-ttnwd,io.kubernetes.pod.namespace: default,io.kubernetes.pod.uid: 6fe19c31-5d2c-444e-8562-69f4c31186f4,pod-template-hash: 6b86dd6d48,},Annotations:map[string]string{kubernetes.io/config.seen: 2023-03-19T20:53:15.093241411Z,kubernetes.io/config.source: api,},RuntimeHandler:,}] TimeStamp:0001-01-01 00:00:00 +0000 UTC}

now, i'm not sure if Ready should also mean being able to communicate (i think it should), and if so, whose "responsibility" is to check whether the pod has an ip before stating it's finally Ready?

based on the cri runtime api:

RunPodSandbox creates and starts a pod-level sandbox. Runtimes must ensure the sandbox is in the ready state on success.

then, cri-docker implements:

RunPodSandbox creates and starts a pod-level sandbox. Runtimes should ensure the sandbox is in ready state.
For docker, PodSandbox is implemented by a container holding the network namespace for the pod.

and also this comment:

Step 5: Setup networking for the sandbox.
All pod networking is setup by a CNI plugin discovered at startup time.
This plugin assigns the pod ip, sets up routes inside the sandbox, creates interfaces etc. In theory, its jurisdiction ends with pod sandbox networking, but it might insert iptables rules or open ports on the host as well, to satisfy parts of the pod spec that aren't recognized by the CNI standard yet.

what we see in cri-docker logs (attached below), is the following info message:

Mar 19 20:53:15 multinode-877589-m02 cri-dockerd[1059]: time="2023-03-19T20:53:15Z" level=info msg="Failed to read pod IP from plugin/docker: networkPlugin cni failed on the status hook for pod "busybox-6b86dd6d48-ttnwd_default": unexpected command output Device "eth0" does not exist.\n with error: exit status 1"

that comes from the getIPs func, that's in turn, called by the PodSandboxStatus func highlighted above (notice the one-second difference!) - there's comment:

// If all else fails, warn but don't return an error, as pod status
// should generally not return anything except fatal errors
logrus.Infof("Failed to read pod IP from plugin/docker: %v", err)

which, i think, then allows the PodSandboxStatus to return Ready status to kubelet/PLEG regardless if container's/pod's networking is correctly initialised, as detailed above

finally, from the cri-docker upgrade changes from v0.3.0 to v0.3.1, it's not obvious (amongst thousands of lines changed including a bunch of external dependencies) if something is directly changed in the logic that worked before, so it also might be something in one of many libraries that are updated in this upgrade

attached are the kubelet (-v=7) and cri-docker logs that are related to the snapshot above:

bottom line: i might be wrong in the above analysis, but until this is clarified (and fixed somewhere upstream, if needed), this pr should help us reduce the TestMultiNode tests' flakiness (by waiting for test pods to get ip)

@medyagh
Copy link
Member

medyagh commented Mar 22, 2023

finally, from the cri-docker upgrade changes from v0.3.0 to v0.3.1, it's not obvious (amongst thousands of lines changed including a bunch of external dependencies) if something is directly changed in the logic that worked before, so it also might be something in one of many libraries that are updated in this upgrade

attached are the kubelet (-v=7) and cri-docker logs that are related to the snapshot above:

bottom line: i might be wrong in the above analysis, but until this is clarified (and fixed somewhere upstream, if needed), this pr should help us reduce the TestMultiNode tests' flakiness (by waiting for test pods to get ip)

Thank you for the analysis and report on this, would you mind sharing this investigation in this issue for cri-dockerd in a format that they could possibily benefit from it ?

Mirantis/cri-dockerd#163

@medyagh medyagh merged commit c025dfc into kubernetes:master Mar 22, 2023
@medyagh medyagh changed the title wait for pods to get IPs test: wait for pods to get IPs Mar 22, 2023
@prezha
Copy link
Contributor Author

prezha commented Mar 23, 2023

Thank you for the analysis and report on this, would you mind sharing this investigation in this issue for cri-dockerd in a format that they could possibily benefit from it ?

Mirantis/cri-dockerd#163

sure, @medyagh, i've tried to provide some details in the comment there

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Multi-node tests failing after updating cri-dockerd
4 participants