azure-cni may leak IP allocations after failing to ADD them to a pod #214

PatrickLang · 2018-08-02T20:02:41Z

Is this a request for help?: No

Is this an ISSUE or FEATURE REQUEST? (choose one): Issue

Which release version?: master + cherry-pick of #212

Which component (CNI/IPAM/CNM/CNS): CNI

Which Operating System (Linux/Windows): Windows Server version 1803

Which Orchestrator and version (e.g. Kubernetes, Docker): Kubernetes

What happened:

After scaling up a replica set, some containers failed to start. When this happens, the IPs were not freed. Here's an example of the end state after scaling back down. Only 1 pod IP should be in use on the node, but there are 3 marked as in use in the IPAM file

kubectl get pod -o wide
NAME                           READY     STATUS    RESTARTS   AGE       IP             NODE
psh-5d98ff98b5-qpbjv           1/1       Running   0          18h       10.240.0.141   k8s-linuxpool-13955535-1
whoami-1803-78fd64846f-lq9m7   1/1       Running   0          18h       10.240.0.99    13955k8s9001



# Run on 13955k8s9001
(get-content c:\k\azure-vnet-ipam.json | convertfrom-json).IPAM.AddressSpaces.local.Pools.'10.240.0.0/12'.Addresses


10.240.0.100 : @{ID=; Addr=10.240.0.100; InUse=False}
10.240.0.101 : @{ID=; Addr=10.240.0.101; InUse=False}
10.240.0.102 : @{ID=; Addr=10.240.0.102; InUse=False}
10.240.0.103 : @{ID=; Addr=10.240.0.103; InUse=False}
10.240.0.104 : @{ID=; Addr=10.240.0.104; InUse=False}
10.240.0.105 : @{ID=; Addr=10.240.0.105; InUse=False}
10.240.0.106 : @{ID=; Addr=10.240.0.106; InUse=False}
10.240.0.107 : @{ID=; Addr=10.240.0.107; InUse=False}
10.240.0.108 : @{ID=; Addr=10.240.0.108; InUse=False}
10.240.0.109 : @{ID=; Addr=10.240.0.109; InUse=False}
10.240.0.110 : @{ID=; Addr=10.240.0.110; InUse=False}
10.240.0.111 : @{ID=; Addr=10.240.0.111; InUse=False}
10.240.0.112 : @{ID=; Addr=10.240.0.112; InUse=False}
10.240.0.113 : @{ID=; Addr=10.240.0.113; InUse=True}
10.240.0.114 : @{ID=; Addr=10.240.0.114; InUse=False}
10.240.0.115 : @{ID=; Addr=10.240.0.115; InUse=False}
10.240.0.116 : @{ID=; Addr=10.240.0.116; InUse=False}
10.240.0.117 : @{ID=; Addr=10.240.0.117; InUse=False}
10.240.0.118 : @{ID=; Addr=10.240.0.118; InUse=False}
10.240.0.119 : @{ID=; Addr=10.240.0.119; InUse=False}
10.240.0.120 : @{ID=; Addr=10.240.0.120; InUse=False}
10.240.0.121 : @{ID=; Addr=10.240.0.121; InUse=False}
10.240.0.122 : @{ID=; Addr=10.240.0.122; InUse=False}
10.240.0.123 : @{ID=; Addr=10.240.0.123; InUse=False}
10.240.0.124 : @{ID=; Addr=10.240.0.124; InUse=False}
10.240.0.125 : @{ID=; Addr=10.240.0.125; InUse=True}
10.240.0.126 : @{ID=; Addr=10.240.0.126; InUse=False}
10.240.0.97  : @{ID=; Addr=10.240.0.97; InUse=False}
10.240.0.98  : @{ID=; Addr=10.240.0.98; InUse=False}
10.240.0.99  : @{ID=; Addr=10.240.0.99; InUse=True}

What you expected to happen:

No leaks

How to reproduce it (as minimally and precisely as possible):

# cordon all Windows nodes except 1
kubectl apply -f https://raw.githubusercontent.com/PatrickLang/Windows-K8s-Samples/master/HyperVExamples/whoami-1803.yaml
kubectl scale deploy whoami-1803 --replicas=6
# wait some time, not all 6 will start successfully
kubectl scale deploy whoami-1803 --replicas=1

Anything else we need to know:

Found this while testing fix for #195

The text was updated successfully, but these errors were encountered:

PatrickLang · 2018-08-02T23:56:53Z

just a note - we're still trying to determine if this is an azure-cni issue, which would affect both Windows+Linux, or if it's actually a Windows-only kubelet issue.

tamilmani1989 · 2018-08-03T17:59:53Z

@PatrickLang This is not even related to Linux.

tamilmani1989 · 2018-08-03T18:01:48Z

After debugging yesterday, we found its more of windows KUBERNETES issue and there is no del call to CNI when user removes pod. Please open a issue with windows KUBERNETES GitHub repo.

tamilmani1989 closed this as completed Aug 3, 2018

daschott mentioned this issue Aug 7, 2018

Azure CNI is leaking IPs leading to exhaustion #218

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

azure-cni may leak IP allocations after failing to ADD them to a pod #214

azure-cni may leak IP allocations after failing to ADD them to a pod #214

PatrickLang commented Aug 2, 2018 •

edited

Loading

PatrickLang commented Aug 2, 2018

tamilmani1989 commented Aug 3, 2018

tamilmani1989 commented Aug 3, 2018

azure-cni may leak IP allocations after failing to ADD them to a pod #214

azure-cni may leak IP allocations after failing to ADD them to a pod #214

Comments

PatrickLang commented Aug 2, 2018 • edited Loading

PatrickLang commented Aug 2, 2018

tamilmani1989 commented Aug 3, 2018

tamilmani1989 commented Aug 3, 2018

PatrickLang commented Aug 2, 2018 •

edited

Loading