leader election occasionally fails to reconnect to api server #66

msau42 · 2020-11-03T17:41:34Z

Exact root cause is still uncertain, but when apiserver is having problems, the csi sidecars will fail to get the leader election lease with this error:

"error retrieving resource lock kube-system/external-attacher-leader-my-driver: Get https://localhost:443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/external-attacher-leader-my-driver: write tcp [::1]:53540->[::1]:443: write: broken pipe"

Even after apiserver comes back up, this error continues and never recovers. This is apparently intended behavior, and the fix is to enable watchdog so that kubelet can restart the container: https://github.com/kubernetes/client-go/blob/master/tools/leaderelection/healthzadaptor.go#L25

In-tree controllers like kube-controller-manager already set this.

msau42 · 2020-11-03T17:41:43Z

/assign @verult

k8s-ci-robot assigned verult Nov 3, 2020

msau42 changed the title ~~leader election fails to reconnect to api server~~ leader election occasionally fails to reconnect to api server Nov 3, 2020

verult mentioned this issue Nov 12, 2020

Add health checker to leader election library #70

Merged

k8s-ci-robot closed this as completed in #70 Nov 18, 2020

maxime1907 mentioned this issue Jan 4, 2023

feat(chart): support probes for cert-manager and cainjector cert-manager/cert-manager#5670

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

leader election occasionally fails to reconnect to api server #66

leader election occasionally fails to reconnect to api server #66

msau42 commented Nov 3, 2020

msau42 commented Nov 3, 2020

leader election occasionally fails to reconnect to api server #66

leader election occasionally fails to reconnect to api server #66

Comments

msau42 commented Nov 3, 2020

msau42 commented Nov 3, 2020