Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

azure-cni may leak IP allocations after failing to ADD them to a pod #214

Closed
PatrickLang opened this issue Aug 2, 2018 · 3 comments
Closed

Comments

@PatrickLang
Copy link
Contributor

PatrickLang commented Aug 2, 2018

Is this a request for help?: No


Is this an ISSUE or FEATURE REQUEST? (choose one): Issue


Which release version?: master + cherry-pick of #212


Which component (CNI/IPAM/CNM/CNS): CNI


Which Operating System (Linux/Windows): Windows Server version 1803


Which Orchestrator and version (e.g. Kubernetes, Docker): Kubernetes


What happened:

After scaling up a replica set, some containers failed to start. When this happens, the IPs were not freed. Here's an example of the end state after scaling back down. Only 1 pod IP should be in use on the node, but there are 3 marked as in use in the IPAM file

kubectl get pod -o wide
NAME                           READY     STATUS    RESTARTS   AGE       IP             NODE
psh-5d98ff98b5-qpbjv           1/1       Running   0          18h       10.240.0.141   k8s-linuxpool-13955535-1
whoami-1803-78fd64846f-lq9m7   1/1       Running   0          18h       10.240.0.99    13955k8s9001



# Run on 13955k8s9001
(get-content c:\k\azure-vnet-ipam.json | convertfrom-json).IPAM.AddressSpaces.local.Pools.'10.240.0.0/12'.Addresses


10.240.0.100 : @{ID=; Addr=10.240.0.100; InUse=False}
10.240.0.101 : @{ID=; Addr=10.240.0.101; InUse=False}
10.240.0.102 : @{ID=; Addr=10.240.0.102; InUse=False}
10.240.0.103 : @{ID=; Addr=10.240.0.103; InUse=False}
10.240.0.104 : @{ID=; Addr=10.240.0.104; InUse=False}
10.240.0.105 : @{ID=; Addr=10.240.0.105; InUse=False}
10.240.0.106 : @{ID=; Addr=10.240.0.106; InUse=False}
10.240.0.107 : @{ID=; Addr=10.240.0.107; InUse=False}
10.240.0.108 : @{ID=; Addr=10.240.0.108; InUse=False}
10.240.0.109 : @{ID=; Addr=10.240.0.109; InUse=False}
10.240.0.110 : @{ID=; Addr=10.240.0.110; InUse=False}
10.240.0.111 : @{ID=; Addr=10.240.0.111; InUse=False}
10.240.0.112 : @{ID=; Addr=10.240.0.112; InUse=False}
10.240.0.113 : @{ID=; Addr=10.240.0.113; InUse=True}
10.240.0.114 : @{ID=; Addr=10.240.0.114; InUse=False}
10.240.0.115 : @{ID=; Addr=10.240.0.115; InUse=False}
10.240.0.116 : @{ID=; Addr=10.240.0.116; InUse=False}
10.240.0.117 : @{ID=; Addr=10.240.0.117; InUse=False}
10.240.0.118 : @{ID=; Addr=10.240.0.118; InUse=False}
10.240.0.119 : @{ID=; Addr=10.240.0.119; InUse=False}
10.240.0.120 : @{ID=; Addr=10.240.0.120; InUse=False}
10.240.0.121 : @{ID=; Addr=10.240.0.121; InUse=False}
10.240.0.122 : @{ID=; Addr=10.240.0.122; InUse=False}
10.240.0.123 : @{ID=; Addr=10.240.0.123; InUse=False}
10.240.0.124 : @{ID=; Addr=10.240.0.124; InUse=False}
10.240.0.125 : @{ID=; Addr=10.240.0.125; InUse=True}
10.240.0.126 : @{ID=; Addr=10.240.0.126; InUse=False}
10.240.0.97  : @{ID=; Addr=10.240.0.97; InUse=False}
10.240.0.98  : @{ID=; Addr=10.240.0.98; InUse=False}
10.240.0.99  : @{ID=; Addr=10.240.0.99; InUse=True}

What you expected to happen:

No leaks


How to reproduce it (as minimally and precisely as possible):

# cordon all Windows nodes except 1
kubectl apply -f https://raw.githubusercontent.com/PatrickLang/Windows-K8s-Samples/master/HyperVExamples/whoami-1803.yaml
kubectl scale deploy whoami-1803 --replicas=6
# wait some time, not all 6 will start successfully
kubectl scale deploy whoami-1803 --replicas=1

Anything else we need to know:

Found this while testing fix for #195

@PatrickLang
Copy link
Contributor Author

just a note - we're still trying to determine if this is an azure-cni issue, which would affect both Windows+Linux, or if it's actually a Windows-only kubelet issue.

@tamilmani1989
Copy link
Member

@PatrickLang This is not even related to Linux.

@tamilmani1989
Copy link
Member

After debugging yesterday, we found its more of windows KUBERNETES issue and there is no del call to CNI when user removes pod. Please open a issue with windows KUBERNETES GitHub repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants