⚠️ Refactor manager to avoid race conditions and provide clean shutdown #1695

vincepri · 2021-10-12T13:59:20Z

Signed-off-by: Vince Prignano [email protected]

This changeset provides a series of improvements and refactors how the
manager starts and stops. During testing with Cluster API (a user of
controller runtime), folks noticed that the manager which runs a series
of components can deadlock itself when using conversion webhooks, or
health checks, or won't cleanly shutdown and cleanup all the running
controller, runnables, caches, webhooks, and http servers.

In particular:

The Manager internal mutex didn't actually lock operations while the
manager was in the process of starting up. The manager internal
Start() implementation started off a series of goroutines internally
and then waits. Concurrent operations on the manager, like
AddHealthzCheck or AddReadyzCheck or AddMetricsExtraHandler modified
the internals map while or after their respective servers were being
configured, causing potential races or being ineffective.
Unclear ordering of the manager caused deadlock when the caches would
start up. Upon startup, conversion webhooks are required when
waiting for the cache initial List() call, which warms the internal
caches. If a webook server or a healthz/readyz probe didn't start in
time, the cache List() call fails because the webhooks would be
unavailable.
Manager would say it was Elected() (note: this is used regardless if
leader election is enabled or not) without waiting for all the caches
to warm up, which could result in failed client calls.
Stop proceduce cancelled everything at once regardless of ordering.
Previously, the context cancelled all the runnables regardless of
ordering which can also cause dependencies issues. With these changes,
if graceful shutdown is set, we try to cancel and wait for runnable
groups to be done in a strict order before proceeding to exit the
program.
Stop procedure cancelled leader election only if graceful shutdown was
set. This was probably an oversight, now we're cancelling leader
election regardless if graceful timeout is set or not.
The http.Server used throughout the codebase now properly sets idle
and read header timeout to match the api-server.

pkg/manager/internal.go

randomvariable · 2021-10-19T12:23:59Z

Testing on Cluster API AWS, this did seem to work on the first try:

I1019 12:21:51.131272       1 request.go:665] Waited for 1.004631656s due to client-side throttling, not priority and fairness, request: GET:https://10.96.0.1:443/apis/admissionregistration.k8s.io/v1?timeout=32s
I1019 12:21:51.663710       1 deleg.go:130] controller-runtime/metrics "msg"="Metrics server is starting to listen"  "addr"="localhost:8080"
I1019 12:21:51.673124       1 deleg.go:130] setup "msg"="enabling EKS controllers"
I1019 12:21:51.673419       1 deleg.go:130] setup "msg"="EventBridge notifications enabled. enabling AWSInstanceStateController"
I1019 12:21:51.673511       1 deleg.go:130] setup "msg"="AutoControllerIdentityCreator enabled"
I1019 12:21:51.673886       1 deleg.go:130] controller-runtime/builder "msg"="skip registering a mutating webhook, object does not implement admission.Defaulter or WithDefaulter wasn't called"  "GVK"={"Group":"infrastructure.cluster.x-k8s.io","Version":"v1beta1","Kind":"AWSMachineTemplate"}
I1019 12:21:51.673967       1 deleg.go:130] controller-runtime/builder "msg"="Registering a validating webhook"  "GVK"={"Group":"infrastructure.cluster.x-k8s.io","Version":"v1beta1","Kind":"AWSMachineTemplate"} "path"="/validate-infrastructure-cluster-x-k8s-io-v1beta1-awsmachinetemplate"
I1019 12:21:51.674125       1 server.go:146] controller-runtime/webhook "msg"="Registering webhook" "path"="/validate-infrastructure-cluster-x-k8s-io-v1beta1-awsmachinetemplate"
I1019 12:21:51.674369       1 server.go:146] controller-runtime/webhook "msg"="Registering webhook" "path"="/convert"
I1019 12:21:51.674467       1 deleg.go:130] controller-runtime/builder "msg"="Conversion webhook enabled"  "GVK"={"Group":"infrastructure.cluster.x-k8s.io","Version":"v1beta1","Kind":"AWSMachineTemplate"}
I1019 12:21:51.674518       1 deleg.go:130] controller-runtime/builder "msg"="skip registering a mutating webhook, object does not implement admission.Defaulter or WithDefaulter wasn't called"  "GVK"={"Group":"infrastructure.cluster.x-k8s.io","Version":"v1beta1","Kind":"AWSMachineTemplateList"}
I1019 12:21:51.674563       1 deleg.go:130] controller-runtime/builder "msg"="skip registering a validating webhook, object does not implement admission.Validator or WithValidator wasn't called"  "GVK"={"Group":"infrastructure.cluster.x-k8s.io","Version":"v1beta1","Kind":"AWSMachineTemplateList"}
I1019 12:21:51.674645       1 deleg.go:130] controller-runtime/builder "msg"="Conversion webhook enabled"  "GVK"={"Group":"infrastructure.cluster.x-k8s.io","Version":"v1beta1","Kind":"AWSMachineTemplateList"}
I1019 12:21:51.674707       1 deleg.go:130] controller-runtime/builder "msg"="Registering a mutating webhook"  "GVK"={"Group":"infrastructure.cluster.x-k8s.io","Version":"v1beta1","Kind":"AWSCluster"} "path"="/mutate-infrastructure-cluster-x-k8s-io-v1beta1-awscluster"
I1019 12:21:51.674831       1 server.go:146] controller-runtime/webhook "msg"="Registering webhook" "path"="/mutate-infrastructure-cluster-x-k8s-io-v1beta1-awscluster"
I1019 12:21:51.674968       1 deleg.go:130] controller-runtime/builder "msg"="Registering a validating webhook"  "GVK"={"Group":"infrastructure.cluster.x-k8s.io","Version":"v1beta1","Kind":"AWSCluster"} "path"="/validate-infrastructure-cluster-x-k8s-io-v1beta1-awscluster"
I1019 12:21:51.675084       1 server.go:146] controller-runtime/webhook "msg"="Registering webhook" "path"="/validate-infrastructure-cluster-x-k8s-io-v1beta1-awscluster"
I1019 12:21:51.675212       1 deleg.go:130] controller-runtime/builder "msg"="Conversion webhook enabled"  "GVK"={"Group":"infrastructure.cluster.x-k8s.io","Version":"v1beta1","Kind":"AWSCluster"}
I1019 12:21:51.675267       1 deleg.go:130] controller-runtime/builder "msg"="Registering a mutating webhook"  "GVK"={"Group":"infrastructure.cluster.x-k8s.io","Version":"v1beta1","Kind":"AWSClusterTemplate"} "path"="/mutate-infrastructure-cluster-x-k8s-io-v1beta1-awsclustertemplate"
I1019 12:21:51.675391       1 server.go:146] controller-runtime/webhook "msg"="Registering webhook" "path"="/mutate-infrastructure-cluster-x-k8s-io-v1beta1-awsclustertemplate"
I1019 12:21:51.675512       1 deleg.go:130] controller-runtime/builder "msg"="Registering a validating webhook"  "GVK"={"Group":"infrastructure.cluster.x-k8s.io","Version":"v1beta1","Kind":"AWSClusterTemplate"} "path"="/validate-infrastructure-cluster-x-k8s-io-v1beta1-awsclustertemplate"
I1019 12:21:51.675601       1 server.go:146] controller-runtime/webhook "msg"="Registering webhook" "path"="/validate-infrastructure-cluster-x-k8s-io-v1beta1-awsclustertemplate"
I1019 12:21:51.675766       1 deleg.go:130] controller-runtime/builder "msg"="Conversion webhook enabled"  "GVK"={"Group":"infrastructure.cluster.x-k8s.io","Version":"v1beta1","Kind":"AWSClusterTemplate"}
I1019 12:21:51.675813       1 deleg.go:130] controller-runtime/builder "msg"="skip registering a mutating webhook, object does not implement admission.Defaulter or WithDefaulter wasn't called"  "GVK"={"Group":"infrastructure.cluster.x-k8s.io","Version":"v1beta1","Kind":"AWSClusterList"}
I1019 12:21:51.675849       1 deleg.go:130] controller-runtime/builder "msg"="skip registering a validating webhook, object does not implement admission.Validator or WithValidator wasn't called"  "GVK"={"Group":"infrastructure.cluster.x-k8s.io","Version":"v1beta1","Kind":"AWSClusterList"}
I1019 12:21:51.675927       1 deleg.go:130] controller-runtime/builder "msg"="Conversion webhook enabled"  "GVK"={"Group":"infrastructure.cluster.x-k8s.io","Version":"v1beta1","Kind":"AWSClusterList"}
I1019 12:21:51.676005       1 deleg.go:130] controller-runtime/builder "msg"="Registering a mutating webhook"  "GVK"={"Group":"infrastructure.cluster.x-k8s.io","Version":"v1beta1","Kind":"AWSClusterControllerIdentity"} "path"="/mutate-infrastructure-cluster-x-k8s-io-v1beta1-awsclustercontrolleridentity"
I1019 12:21:51.676214       1 server.go:146] controller-runtime/webhook "msg"="Registering webhook" "path"="/mutate-infrastructure-cluster-x-k8s-io-v1beta1-awsclustercontrolleridentity"
I1019 12:21:51.676326       1 deleg.go:130] controller-runtime/builder "msg"="Registering a validating webhook"  "GVK"={"Group":"infrastructure.cluster.x-k8s.io","Version":"v1beta1","Kind":"AWSClusterControllerIdentity"} "path"="/validate-infrastructure-cluster-x-k8s-io-v1beta1-awsclustercontrolleridentity"
I1019 12:21:51.676511       1 server.go:146] controller-runtime/webhook "msg"="Registering webhook" "path"="/validate-infrastructure-cluster-x-k8s-io-v1beta1-awsclustercontrolleridentity"
I1019 12:21:51.676665       1 deleg.go:130] controller-runtime/builder "msg"="Conversion webhook enabled"  "GVK"={"Group":"infrastructure.cluster.x-k8s.io","Version":"v1beta1","Kind":"AWSClusterControllerIdentity"}
I1019 12:21:51.676731       1 deleg.go:130] controller-runtime/builder "msg"="Registering a mutating webhook"  "GVK"={"Group":"infrastructure.cluster.x-k8s.io","Version":"v1beta1","Kind":"AWSClusterRoleIdentity"} "path"="/mutate-infrastructure-cluster-x-k8s-io-v1beta1-awsclusterroleidentity"
I1019 12:21:51.676864       1 server.go:146] controller-runtime/webhook "msg"="Registering webhook" "path"="/mutate-infrastructure-cluster-x-k8s-io-v1beta1-awsclusterroleidentity"
I1019 12:21:51.676951       1 deleg.go:130] controller-runtime/builder "msg"="Registering a validating webhook"  "GVK"={"Group":"infrastructure.cluster.x-k8s.io","Version":"v1beta1","Kind":"AWSClusterRoleIdentity"} "path"="/validate-infrastructure-cluster-x-k8s-io-v1beta1-awsclusterroleidentity"
I1019 12:21:51.677079       1 server.go:146] controller-runtime/webhook "msg"="Registering webhook" "path"="/validate-infrastructure-cluster-x-k8s-io-v1beta1-awsclusterroleidentity"
I1019 12:21:51.677208       1 deleg.go:130] controller-runtime/builder "msg"="Conversion webhook enabled"  "GVK"={"Group":"infrastructure.cluster.x-k8s.io","Version":"v1beta1","Kind":"AWSClusterRoleIdentity"}
I1019 12:21:51.677262       1 deleg.go:130] controller-runtime/builder "msg"="Registering a mutating webhook"  "GVK"={"Group":"infrastructure.cluster.x-k8s.io","Version":"v1beta1","Kind":"AWSClusterStaticIdentity"} "path"="/mutate-infrastructure-cluster-x-k8s-io-v1beta1-awsclusterstaticidentity"
I1019 12:21:51.677364       1 server.go:146] controller-runtime/webhook "msg"="Registering webhook" "path"="/mutate-infrastructure-cluster-x-k8s-io-v1beta1-awsclusterstaticidentity"
I1019 12:21:51.677494       1 deleg.go:130] controller-runtime/builder "msg"="Registering a validating webhook"  "GVK"={"Group":"infrastructure.cluster.x-k8s.io","Version":"v1beta1","Kind":"AWSClusterStaticIdentity"} "path"="/validate-infrastructure-cluster-x-k8s-io-v1beta1-awsclusterstaticidentity"
I1019 12:21:51.677616       1 server.go:146] controller-runtime/webhook "msg"="Registering webhook" "path"="/validate-infrastructure-cluster-x-k8s-io-v1beta1-awsclusterstaticidentity"
I1019 12:21:51.677749       1 deleg.go:130] controller-runtime/builder "msg"="Conversion webhook enabled"  "GVK"={"Group":"infrastructure.cluster.x-k8s.io","Version":"v1beta1","Kind":"AWSClusterStaticIdentity"}
I1019 12:21:51.677802       1 deleg.go:130] controller-runtime/builder "msg"="Registering a mutating webhook"  "GVK"={"Group":"infrastructure.cluster.x-k8s.io","Version":"v1beta1","Kind":"AWSMachine"} "path"="/mutate-infrastructure-cluster-x-k8s-io-v1beta1-awsmachine"
I1019 12:21:51.677907       1 server.go:146] controller-runtime/webhook "msg"="Registering webhook" "path"="/mutate-infrastructure-cluster-x-k8s-io-v1beta1-awsmachine"
I1019 12:21:51.678017       1 deleg.go:130] controller-runtime/builder "msg"="Registering a validating webhook"  "GVK"={"Group":"infrastructure.cluster.x-k8s.io","Version":"v1beta1","Kind":"AWSMachine"} "path"="/validate-infrastructure-cluster-x-k8s-io-v1beta1-awsmachine"
I1019 12:21:51.678141       1 server.go:146] controller-runtime/webhook "msg"="Registering webhook" "path"="/validate-infrastructure-cluster-x-k8s-io-v1beta1-awsmachine"
I1019 12:21:51.678268       1 deleg.go:130] controller-runtime/builder "msg"="Conversion webhook enabled"  "GVK"={"Group":"infrastructure.cluster.x-k8s.io","Version":"v1beta1","Kind":"AWSMachine"}
I1019 12:21:51.678315       1 deleg.go:130] controller-runtime/builder "msg"="skip registering a mutating webhook, object does not implement admission.Defaulter or WithDefaulter wasn't called"  "GVK"={"Group":"infrastructure.cluster.x-k8s.io","Version":"v1beta1","Kind":"AWSMachineList"}
I1019 12:21:51.678348       1 deleg.go:130] controller-runtime/builder "msg"="skip registering a validating webhook, object does not implement admission.Validator or WithValidator wasn't called"  "GVK"={"Group":"infrastructure.cluster.x-k8s.io","Version":"v1beta1","Kind":"AWSMachineList"}
I1019 12:21:51.678430       1 deleg.go:130] controller-runtime/builder "msg"="Conversion webhook enabled"  "GVK"={"Group":"infrastructure.cluster.x-k8s.io","Version":"v1beta1","Kind":"AWSMachineList"}
I1019 12:21:51.678481       1 deleg.go:130] controller-runtime/builder "msg"="skip registering a mutating webhook, object does not implement admission.Defaulter or WithDefaulter wasn't called"  "GVK"={"Group":"infrastructure.cluster.x-k8s.io","Version":"v1beta1","Kind":"AWSClusterControllerIdentityList"}
I1019 12:21:51.678586       1 deleg.go:130] controller-runtime/builder "msg"="skip registering a validating webhook, object does not implement admission.Validator or WithValidator wasn't called"  "GVK"={"Group":"infrastructure.cluster.x-k8s.io","Version":"v1beta1","Kind":"AWSClusterControllerIdentityList"}
I1019 12:21:51.678674       1 deleg.go:130] controller-runtime/builder "msg"="Conversion webhook enabled"  "GVK"={"Group":"infrastructure.cluster.x-k8s.io","Version":"v1beta1","Kind":"AWSClusterControllerIdentityList"}
I1019 12:21:51.678723       1 deleg.go:130] controller-runtime/builder "msg"="skip registering a mutating webhook, object does not implement admission.Defaulter or WithDefaulter wasn't called"  "GVK"={"Group":"infrastructure.cluster.x-k8s.io","Version":"v1beta1","Kind":"AWSClusterRoleIdentityList"}
I1019 12:21:51.678763       1 deleg.go:130] controller-runtime/builder "msg"="skip registering a validating webhook, object does not implement admission.Validator or WithValidator wasn't called"  "GVK"={"Group":"infrastructure.cluster.x-k8s.io","Version":"v1beta1","Kind":"AWSClusterRoleIdentityList"}
I1019 12:21:51.678843       1 deleg.go:130] controller-runtime/builder "msg"="Conversion webhook enabled"  "GVK"={"Group":"infrastructure.cluster.x-k8s.io","Version":"v1beta1","Kind":"AWSClusterRoleIdentityList"}
I1019 12:21:51.678881       1 deleg.go:130] setup "msg"="enabling EKS webhooks"
I1019 12:21:51.678922       1 deleg.go:130] controller-runtime/builder "msg"="Registering a mutating webhook"  "GVK"={"Group":"controlplane.cluster.x-k8s.io","Version":"v1beta1","Kind":"AWSManagedControlPlane"} "path"="/mutate-controlplane-cluster-x-k8s-io-v1beta1-awsmanagedcontrolplane"
I1019 12:21:51.679025       1 server.go:146] controller-runtime/webhook "msg"="Registering webhook" "path"="/mutate-controlplane-cluster-x-k8s-io-v1beta1-awsmanagedcontrolplane"
I1019 12:21:51.679119       1 deleg.go:130] controller-runtime/builder "msg"="Registering a validating webhook"  "GVK"={"Group":"controlplane.cluster.x-k8s.io","Version":"v1beta1","Kind":"AWSManagedControlPlane"} "path"="/validate-controlplane-cluster-x-k8s-io-v1beta1-awsmanagedcontrolplane"
I1019 12:21:51.679230       1 server.go:146] controller-runtime/webhook "msg"="Registering webhook" "path"="/validate-controlplane-cluster-x-k8s-io-v1beta1-awsmanagedcontrolplane"
I1019 12:21:51.679380       1 deleg.go:130] controller-runtime/builder "msg"="Conversion webhook enabled"  "GVK"={"Group":"controlplane.cluster.x-k8s.io","Version":"v1beta1","Kind":"AWSManagedControlPlane"}
I1019 12:21:51.679433       1 deleg.go:130] controller-runtime/builder "msg"="Registering a mutating webhook"  "GVK"={"Group":"infrastructure.cluster.x-k8s.io","Version":"v1beta1","Kind":"AWSManagedMachinePool"} "path"="/mutate-infrastructure-cluster-x-k8s-io-v1beta1-awsmanagedmachinepool"
I1019 12:21:51.679545       1 server.go:146] controller-runtime/webhook "msg"="Registering webhook" "path"="/mutate-infrastructure-cluster-x-k8s-io-v1beta1-awsmanagedmachinepool"
I1019 12:21:51.679703       1 deleg.go:130] controller-runtime/builder "msg"="Registering a validating webhook"  "GVK"={"Group":"infrastructure.cluster.x-k8s.io","Version":"v1beta1","Kind":"AWSManagedMachinePool"} "path"="/validate-infrastructure-cluster-x-k8s-io-v1beta1-awsmanagedmachinepool"
I1019 12:21:51.679820       1 server.go:146] controller-runtime/webhook "msg"="Registering webhook" "path"="/validate-infrastructure-cluster-x-k8s-io-v1beta1-awsmanagedmachinepool"
I1019 12:21:51.679953       1 deleg.go:130] controller-runtime/builder "msg"="Conversion webhook enabled"  "GVK"={"Group":"infrastructure.cluster.x-k8s.io","Version":"v1beta1","Kind":"AWSManagedMachinePool"}
I1019 12:21:51.680011       1 deleg.go:130] setup "msg"="enabling webhook for AWSMachinePool"
I1019 12:21:51.680134       1 deleg.go:130] controller-runtime/builder "msg"="Registering a mutating webhook"  "GVK"={"Group":"infrastructure.cluster.x-k8s.io","Version":"v1beta1","Kind":"AWSMachinePool"} "path"="/mutate-infrastructure-cluster-x-k8s-io-v1beta1-awsmachinepool"
I1019 12:21:51.680357       1 server.go:146] controller-runtime/webhook "msg"="Registering webhook" "path"="/mutate-infrastructure-cluster-x-k8s-io-v1beta1-awsmachinepool"
I1019 12:21:51.680533       1 deleg.go:130] controller-runtime/builder "msg"="Registering a validating webhook"  "GVK"={"Group":"infrastructure.cluster.x-k8s.io","Version":"v1beta1","Kind":"AWSMachinePool"} "path"="/validate-infrastructure-cluster-x-k8s-io-v1beta1-awsmachinepool"
I1019 12:21:51.680708       1 server.go:146] controller-runtime/webhook "msg"="Registering webhook" "path"="/validate-infrastructure-cluster-x-k8s-io-v1beta1-awsmachinepool"
I1019 12:21:51.680854       1 deleg.go:130] controller-runtime/builder "msg"="Conversion webhook enabled"  "GVK"={"Group":"infrastructure.cluster.x-k8s.io","Version":"v1beta1","Kind":"AWSMachinePool"}
I1019 12:21:51.680902       1 deleg.go:130] setup "msg"="starting manager"  "version"=""
I1019 12:21:51.681017       1 server.go:214] controller-runtime/webhook/webhooks "msg"="Starting webhook server"
I1019 12:21:51.681193       1 internal.go:358]  "msg"="Starting server" "addr"={"IP":"::","Port":9440,"Zone":""} "kind"="health probe"
I1019 12:21:51.681240       1 internal.go:358]  "msg"="Starting server" "addr"={"IP":"127.0.0.1","Port":8080,"Zone":""} "kind"="metrics" "path"="/metrics"
I1019 12:21:51.681773       1 deleg.go:130] controller-runtime/certwatcher "msg"="Updated current TLS certificate"
I1019 12:21:51.682030       1 deleg.go:130] controller-runtime/webhook "msg"="Serving webhook server"  "host"="" "port"=9443
I1019 12:21:51.682147       1 deleg.go:130] controller-runtime/certwatcher "msg"="Starting certificate watcher"
E1019 12:21:52.820542       1 reflector.go:138] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1beta1.AWSMachine: failed to list *v1beta1.AWSMachine: conversion webhook for infrastructure.cluster.x-k8s.io/v1alpha3, Kind=AWSMachine failed: Post "https://capa-webhook-service.capa-system.svc:443/convert?timeout=30s": dial tcp 10.99.98.199:443: connect: connection refused
E1019 12:21:52.820991       1 reflector.go:138] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1beta1.AWSCluster: failed to list *v1beta1.AWSCluster: conversion webhook for infrastructure.cluster.x-k8s.io/v1alpha3, Kind=AWSCluster failed: Post "https://capa-webhook-service.capa-system.svc:443/convert?timeout=30s": dial tcp 10.99.98.199:443: connect: connection refused
E1019 12:21:54.962533       1 reflector.go:138] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1beta1.AWSMachine: failed to list *v1beta1.AWSMachine: conversion webhook for infrastructure.cluster.x-k8s.io/v1alpha3, Kind=AWSMachine failed: Post "https://capa-webhook-service.capa-system.svc:443/convert?timeout=30s": dial tcp 10.99.98.199:443: connect: connection refused
E1019 12:21:55.250037       1 reflector.go:138] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1beta1.AWSCluster: failed to list *v1beta1.AWSCluster: conversion webhook for infrastructure.cluster.x-k8s.io/v1alpha3, Kind=AWSCluster failed: Post "https://capa-webhook-service.capa-system.svc:443/convert?timeout=30s": dial tcp 10.99.98.199:443: connect: connection refused
I1019 12:21:58.184290       1 controller.go:178] controller/eksconfig "msg"="Starting EventSource" "reconciler group"="bootstrap.cluster.x-k8s.io" "reconciler kind"="EKSConfig" "source"="kind source: *v1beta1.EKSConfig"
I1019 12:21:58.184334       1 controller.go:178] controller/eksconfig "msg"="Starting EventSource" "reconciler group"="bootstrap.cluster.x-k8s.io" "reconciler kind"="EKSConfig" "source"="kind source: *v1beta1.Machine"
I1019 12:21:58.184340       1 controller.go:178] controller/awsmanagedcontrolplane "msg"="Starting EventSource" "reconciler group"="controlplane.cluster.x-k8s.io" "reconciler kind"="AWSManagedControlPlane" "source"="kind source: *v1beta1.AWSManagedControlPlane"
I1019 12:21:58.184355       1 controller.go:178] controller/eksconfig "msg"="Starting EventSource" "reconciler group"="bootstrap.cluster.x-k8s.io" "reconciler kind"="EKSConfig" "source"="kind source: *v1beta1.MachinePool"
I1019 12:21:58.184363       1 controller.go:178] controller/awsmanagedcontrolplane "msg"="Starting EventSource" "reconciler group"="controlplane.cluster.x-k8s.io" "reconciler kind"="AWSManagedControlPlane" "source"="kind source: *v1beta1.Cluster"
I1019 12:21:58.184376       1 controller.go:178] controller/eksconfig "msg"="Starting EventSource" "reconciler group"="bootstrap.cluster.x-k8s.io" "reconciler kind"="EKSConfig" "source"="kind source: *v1beta1.Cluster"
I1019 12:21:58.184381       1 controller.go:186] controller/awsmanagedcontrolplane "msg"="Starting Controller" "reconciler group"="controlplane.cluster.x-k8s.io" "reconciler kind"="AWSManagedControlPlane"
I1019 12:21:58.184391       1 controller.go:186] controller/eksconfig "msg"="Starting Controller" "reconciler group"="bootstrap.cluster.x-k8s.io" "reconciler kind"="EKSConfig"
I1019 12:21:58.184448       1 controller.go:178] controller/awscluster "msg"="Starting EventSource" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="AWSCluster" "source"="kind source: *v1beta1.AWSCluster"
I1019 12:21:58.184471       1 controller.go:178] controller/awscluster "msg"="Starting EventSource" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="AWSCluster" "source"="kind source: *v1beta1.Cluster"
I1019 12:21:58.184485       1 controller.go:186] controller/awscluster "msg"="Starting Controller" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="AWSCluster"
I1019 12:21:58.184592       1 controller.go:178] controller/awsmachinepool "msg"="Starting EventSource" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="AWSMachinePool" "source"="kind source: *v1beta1.AWSMachinePool"
I1019 12:21:58.184611       1 controller.go:178] controller/awsmachinepool "msg"="Starting EventSource" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="AWSMachinePool" "source"="kind source: *v1beta1.MachinePool"
I1019 12:21:58.184626       1 controller.go:186] controller/awsmachinepool "msg"="Starting Controller" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="AWSMachinePool"
I1019 12:21:58.184698       1 controller.go:178] controller/awsmanagedmachinepool "msg"="Starting EventSource" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="AWSManagedMachinePool" "source"="kind source: *v1beta1.AWSManagedMachinePool"
I1019 12:21:58.184718       1 controller.go:178] controller/awsmanagedmachinepool "msg"="Starting EventSource" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="AWSManagedMachinePool" "source"="kind source: *v1beta1.MachinePool"
I1019 12:21:58.184739       1 controller.go:178] controller/awsmanagedmachinepool "msg"="Starting EventSource" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="AWSManagedMachinePool" "source"="kind source: *v1beta1.AWSManagedControlPlane"
I1019 12:21:58.184757       1 controller.go:186] controller/awsmanagedmachinepool "msg"="Starting Controller" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="AWSManagedMachinePool"
I1019 12:21:58.184824       1 controller.go:178] controller/awscluster "msg"="Starting EventSource" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="AWSCluster" "source"="kind source: *v1beta1.AWSCluster"
I1019 12:21:58.184847       1 controller.go:178] controller/awscluster "msg"="Starting EventSource" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="AWSCluster" "source"="kind source: *v1beta1.AWSManagedControlPlane"
I1019 12:21:58.184865       1 controller.go:186] controller/awscluster "msg"="Starting Controller" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="AWSCluster"
I1019 12:21:58.185037       1 controller.go:178] controller/awscluster "msg"="Starting EventSource" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="AWSCluster" "source"="kind source: *v1beta1.AWSCluster"
I1019 12:21:58.185062       1 controller.go:186] controller/awscluster "msg"="Starting Controller" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="AWSCluster"
I1019 12:21:58.185167       1 controller.go:178] controller/awsmachine "msg"="Starting EventSource" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="AWSMachine" "source"="kind source: *v1beta1.AWSMachine"
I1019 12:21:58.185196       1 controller.go:178] controller/awsmachine "msg"="Starting EventSource" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="AWSMachine" "source"="kind source: *v1beta1.Machine"
I1019 12:21:58.185225       1 controller.go:178] controller/awsmachine "msg"="Starting EventSource" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="AWSMachine" "source"="kind source: *v1beta1.AWSCluster"
I1019 12:21:58.185250       1 controller.go:178] controller/awsmachine "msg"="Starting EventSource" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="AWSMachine" "source"="kind source: *v1beta1.Cluster"
I1019 12:21:58.185274       1 controller.go:186] controller/awsmachine "msg"="Starting Controller" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="AWSMachine"
E1019 12:21:59.218911       1 reflector.go:138] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1beta1.Machine: failed to list *v1beta1.Machine: conversion webhook for cluster.x-k8s.io/v1alpha3, Kind=Machine failed: Post "https://capi-webhook-service.capi-system.svc:443/convert?timeout=30s": dial tcp 10.111.194.132:443: connect: connection refused
E1019 12:21:59.218921       1 reflector.go:138] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1beta1.Cluster: failed to list *v1beta1.Cluster: conversion webhook for cluster.x-k8s.io/v1alpha3, Kind=Cluster failed: Post "https://capi-webhook-service.capi-system.svc:443/convert?timeout=30s": dial tcp 10.111.194.132:443: connect: connection refused
E1019 12:22:01.490059       1 reflector.go:138] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1beta1.Machine: failed to list *v1beta1.Machine: conversion webhook for cluster.x-k8s.io/v1alpha3, Kind=Machine failed: Post "https://capi-webhook-service.capi-system.svc:443/convert?timeout=30s": dial tcp 10.111.194.132:443: connect: connection refused
E1019 12:22:01.681495       1 reflector.go:138] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1beta1.Cluster: failed to list *v1beta1.Cluster: conversion webhook for cluster.x-k8s.io/v1alpha3, Kind=Cluster failed: Post "https://capi-webhook-service.capi-system.svc:443/convert?timeout=30s": dial tcp 10.111.194.132:443: connect: connection refused
I1019 12:22:04.887169       1 controller.go:220] controller/awsmanagedcontrolplane "msg"="Starting workers" "reconciler group"="controlplane.cluster.x-k8s.io" "reconciler kind"="AWSManagedControlPlane" "worker count"=5
I1019 12:22:04.887262       1 controller.go:220] controller/awscluster "msg"="Starting workers" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="AWSCluster" "worker count"=5
I1019 12:22:04.887265       1 controller.go:220] controller/awscluster "msg"="Starting workers" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="AWSCluster" "worker count"=5
I1019 12:22:04.890511       1 controller.go:220] controller/awsmachine "msg"="Starting workers" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="AWSMachine" "worker count"=10
I1019 12:22:04.893467       1 controller.go:220] controller/awscluster "msg"="Starting workers" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="AWSCluster" "worker count"=5
I1019 12:22:04.894905       1 controller.go:220] controller/eksconfig "msg"="Starting workers" "reconciler group"="bootstrap.cluster.x-k8s.io" "reconciler kind"="EKSConfig" "worker count"=5
I1019 12:22:04.908767       1 controller.go:220] controller/awsmanagedmachinepool "msg"="Starting workers" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="AWSManagedMachinePool" "worker count"=5
I1019 12:22:04.914212       1 controller.go:220] controller/awsmachinepool "msg"="Starting workers" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="AWSMachinePool" "worker count"=5
I1019 12:22:05.009918       1 awsmachine_controller.go:456]  "msg"="Reconciling AWSMachine"
I1019 12:22:05.011666       1 awsmachine_controller.go:456]  "msg"="Reconciling AWSMachine"
I1019 12:22:05.021160       1 awscluster_controller.go:192] controller/awscluster "msg"="Reconciling AWSCluster" "cluster"="clusterctl-upgrade-rql6ye" "name"="clusterctl-upgrade-rql6ye" "namespace"="clusterctl-upgrade" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="AWSCluster"
I1019 12:22:05.201662       1 subnets.go:47] controller/awscluster "msg"="Reconciling subnets" "cluster"="clusterctl-upgrade-rql6ye" "name"="clusterctl-upgrade-rql6ye" "namespace"="clusterctl-upgrade" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="AWSCluster"
I1019 12:22:05.253310       1 awsmachine_controller.go:456]  "msg"="Reconciling AWSMachine"
I1019 12:22:05.312345       1 awsmachine_controller.go:456]  "msg"="Reconciling AWSMachine"
I1019 12:22:06.002195       1 awsmachine_controller.go:456]  "msg"="Reconciling AWSMachine"
I1019 12:22:06.004659       1 awsmachine_controller.go:456]  "msg"="Reconciling AWSMachine"
I1019 12:22:06.152397       1 awsmachine_controller.go:456]  "msg"="Reconciling AWSMachine"
I1019 12:22:06.221352       1 awsmachine_controller.go:456]  "msg"="Reconciling AWSMachine"
I1019 12:22:06.287444       1 awsmachine_controller.go:456]  "msg"="Reconciling AWSMachine"
I1019 12:22:06.350740       1 awscluster_controller.go:192] controller/awscluster "msg"="Reconciling AWSCluster" "cluster"="clusterctl-upgrade-rql6ye" "name"="clusterctl-upgrade-rql6ye" "namespace"="clusterctl-upgrade" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="AWSCluster"
I1019 12:22:06.382689       1 awsmachine_controller.go:456]  "msg"="Reconciling AWSMachine"
I1019 12:22:06.393855       1 awsmachine_controller.go:456]  "msg"="Reconciling AWSMachine"
I1019 12:22:06.473634       1 subnets.go:47] controller/awscluster "msg"="Reconciling subnets" "cluster"="clusterctl-upgrade-rql6ye" "name"="clusterctl-upgrade-rql6ye" "namespace"="clusterctl-upgrade" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="AWSCluster"
I1019 12:22:10.549201       1 awsmachine_controller.go:139] controller/awsmachine "msg"="Machine Controller has not yet set OwnerRef" "name"="clusterctl-upgrade-rql6ye-md-0-ch2j9" "namespace"="clusterctl-upgrade" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="AWSMachine"
I1019 12:22:10.566524       1 awsmachine_controller.go:139] controller/awsmachine "msg"="Machine Controller has not yet set OwnerRef" "name"="clusterctl-upgrade-rql6ye-md-0-ch2j9" "namespace"="clusterctl-upgrade" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="AWSMachine"
I1019 12:22:10.583656       1 awsmachine_controller.go:139] controller/awsmachine "msg"="Machine Controller has not yet set OwnerRef" "name"="clusterctl-upgrade-rql6ye-md-0-ch2j9" "namespace"="clusterctl-upgrade" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="AWSMachine"
I1019 12:22:10.591208       1 awsmachine_controller.go:139] controller/awsmachine "msg"="Machine Controller has not yet set OwnerRef" "name"="clusterctl-upgrade-rql6ye-md-0-ch2j9" "namespace"="clusterctl-upgrade" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="AWSMachine"
I1019 12:22:10.639649       1 awsmachine_controller.go:456]  "msg"="Reconciling AWSMachine"
I1019 12:22:10.641903       1 awsmachine_controller.go:481]  "msg"="Bootstrap data secret reference is not yet available"
I1019 12:22:10.679086       1 awsmachine_controller.go:456]  "msg"="Reconciling AWSMachine"
I1019 12:22:10.681075       1 awsmachine_controller.go:481]  "msg"="Bootstrap data secret reference is not yet available"
I1019 12:22:10.698554       1 awsmachine_controller.go:456]  "msg"="Reconciling AWSMachine"
I1019 12:22:10.698893       1 awsmachine_controller.go:481]  "msg"="Bootstrap data secret reference is not yet available"
I1019 12:22:22.945045       1 awsmachine_controller.go:456]  "msg"="Reconciling AWSMachine"
I1019 12:22:22.945092       1 awsmachine_controller.go:481]  "msg"="Bootstrap data secret reference is not yet available"
I1019 12:22:23.018312       1 awsmachine_controller.go:456]  "msg"="Reconciling AWSMachine"
I1019 12:22:23.124861       1 awsmachine_controller.go:659]  "msg"="Creating EC2 instance"

sbueringer · 2021-10-19T12:27:46Z

Just to also mention it here. I ran full CAPI/CAPD e2e tests with and without leader election and they were all successful.

pkg/manager/internal.go

joelanford

Looks pretty good to me. Just a few questions and comments.

joelanford · 2021-10-20T16:53:05Z

pkg/manager/manager_test.go

+					fakeCluster.informer.mu.Lock()
+					defer fakeCluster.informer.mu.Unlock()
+					return fakeCluster.informer.wasStarted && fakeCluster.informer.wasSynced


Does this need Eventually() now? Just wondering because before, it looks like we expect these values to be true the first time we check them.

pkg/manager/runnable_group.go

joelanford · 2021-10-20T17:02:33Z

pkg/manager/runnable_group.go

+	default:
+		return r.LeaderElection.Add(fn, ready)


Just double checking. Unknown runnable types are started after leader election? I (maybe naively) expected unknown types to be added with r.Others.Add.

See #1695 (comment)

joelanford · 2021-10-20T17:13:58Z

pkg/manager/runnable_group.go

+			go func() {
+				if rn.Check(r.ctx) {
+					r.group.Store(rn, true)
+				}
+			}()


I'm not sure I'm fully grokking this goroutine. We're running a Check() in the background, and meanwhile we Start() the runnable.

Check() is supposed to block until the runnable is ready, and then returns true, at which point we set the runnable as being ready in the group store? If Check() returns false, the runnable is never set as ready?

at which point we set the runnable as being ready in the group store?

Once checks returns true, the runnable is going to be marked as ready; if it never becomes ready we block execution and eventually timeout the Waits

If Check() returns false, the runnable is never set as ready?

Correct, the runnable should then exit and return an error on its own which is then propagated to the errChan which exits the manager

pkg/manager/runnable_group.go

joelanford · 2021-10-20T17:15:21Z

pkg/manager/runnable_group.go

+			// or returned an error to the channel.
+			//
+			// We should always decrement the WaitGroup and
+			// mark the runnable as ready.


Expand on why the runnable always needs to be marked as ready when we return?

joelanford · 2021-10-20T17:19:21Z

pkg/manager/runnable_group.go

+// WaitReady polls until the group is ready or until the context is cancelled.
+func (r *runnableGroup) WaitReady(ctx context.Context) error {
+	return wait.PollImmediateInfiniteWithContext(ctx,
+		100*time.Millisecond,


Is this wait interval too long? It seems like most runnables (other then hasCache runnables) will be ready within a few milliseconds. Perhaps we could have a backoff here as well, where the first few polls happen pretty quickly (maybe starting at 1ms)?

With 4-5 runnable groups, we could take up to a half second in this phase of the operator startup, which is fairly fast, but seems like it could be faster.

100ms was a somewhat a good balance between looking at too quickly and not fast enough, I can reduce it if you prefer although the delay is mostly on manager startup

joelanford · 2021-10-20T17:26:09Z

pkg/manager/runnable_group_test.go

+		Expect(found.Runnable).To(BeAssignableToTypeOf(runnable))
+		Expect(found.Runnable.Start(context.Background())).To(MatchError(err))
+	})
+})


Add tests for runnables that implement LeaderElectionRunnable and return both true and false from NeedLeaderElection() ?

randomvariable · 2021-11-02T00:06:34Z

Needs linter fixup

pkg/manager/runnable_group.go:67: unnecessary leading newline (whitespace)
			}, func() (bool, error) {

vincepri · 2021-11-02T15:25:49Z

@alvaroaleman @joelanford ptal

vincepri · 2021-11-02T15:26:04Z

/retitle ⚠️ Refactor manager to avoid race conditions and provide clean shutdown

pkg/manager/internal.go

pkg/manager/runnable_group.go

alvaroaleman · 2021-11-03T14:09:43Z

pkg/manager/runnable_group.go

+}
+
+// WaitReady polls until the group is ready or until the context is cancelled.
+func (r *runnableGroup) WaitReady(ctx context.Context) error {


Its conceptionally slightly weird that we internally have this notion of a ready runnable but don't make that part of our api in any way. Is it still correct that we only need this for the cache? If so, can we document it like that?

I also have to say I am not a huge fan of the poll-based architecture we end up with here and would prefer it it if we found a way to push this instead. Maybe make reconcile first loop over an initial channel and then an additional one for runnables that are added after the manaer is started and close a ready chan in between?

Its conceptionally slightly weird that we internally have this notion of a ready runnable but don't make that part of our api in any way. Is it still correct that we only need this for the cache? If so, can we document it like that?

For now yes, although it could be expanded later to have more built-in checks for the webhook server for example, or give the capability to runnables to have their own.

I also have to say I am not a huge fan of the poll-based architecture we end up with here and would prefer it it if we found a way to push this instead. Maybe make reconcile first loop over an initial channel and then an additional one for runnables that are added after the manaer is started and close a ready chan in between?

Agreed, I pushed an update to it to use channels instead and only wait for the initial (before-start) runnables.

sbueringer · 2021-11-03T18:03:47Z

@vincepri full CAPI e2e tests with and without leader election are successful with the latest commit:

[WIP] Reproduce v1alpha3=>v1beta1 upgrade issue (CR release-0.10) cluster-api#5421
[WIP] Reproduce v1alpha3=>v1beta1 upgrade issue (CR release-0.10, with leader-election) cluster-api#5407

alvaroaleman · 2021-11-03T21:12:56Z

Modulo a squash, this looks good to go from my end 🚢

This changeset provides a series of improvements and refactors how the manager starts and stops. During testing with Cluster API (a user of controller runtime), folks noticed that the manager which runs a series of components can deadlock itself when using conversion webhooks, or health checks, or won't cleanly shutdown and cleanup all the running controller, runnables, caches, webhooks, and http servers. In particular: - The Manager internal mutex didn't actually lock operations while the manager was in the process of starting up. The manager internal Start() implementation started off a series of goroutines internally and then waits. Concurrent operations on the manager, like AddHealthzCheck or AddReadyzCheck or AddMetricsExtraHandler modified the internals map while or after their respective servers were being configured, causing potential races or being ineffective. - Unclear ordering of the manager caused deadlock when the caches would start up. Upon startup, conversion webhooks are required when waiting for the cache initial List() call, which warms the internal caches. If a webook server or a healthz/readyz probe didn't start in time, the cache List() call fails because the webhooks would be unavailable. - Manager would say it was Elected() (note: this is used regardless if leader election is enabled or not) without waiting for all the caches to warm up, which could result in failed client calls. - Stop proceduce cancelled everything at once regardless of ordering. Previously, the context cancelled all the runnables regardless of ordering which can also cause dependencies issues. With these changes, if graceful shutdown is set, we try to cancel and wait for runnable groups to be done in a strict order before proceeding to exit the program. - Stop procedure cancelled leader election only if graceful shutdown was set. This was probably an oversight, now we're cancelling leader election regardless if graceful timeout is set or not. - The http.Server used throughout the codebase now properly sets idle and read header timeout to match the api-server. Signed-off-by: Vince Prignano <[email protected]>

vincepri · 2021-11-03T22:02:22Z

@alvaroaleman Thanks! The modulo has been satisfied ✔️ 😊

alvaroaleman

thanks!

k8s-ci-robot · 2021-11-04T13:11:30Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alvaroaleman, vincepri

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [alvaroaleman,vincepri]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot requested review from alvaroaleman and varshaprasad96 October 12, 2021 13:59

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 12, 2021

vincepri force-pushed the runnable-group branch from 7c59ac6 to e98ac23 Compare October 12, 2021 14:00

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 12, 2021

vincepri force-pushed the runnable-group branch 3 times, most recently from 48a59d3 to 21128d1 Compare October 12, 2021 16:21

sbueringer reviewed Oct 12, 2021

View reviewed changes

pkg/manager/internal.go Outdated Show resolved Hide resolved

vincepri force-pushed the runnable-group branch 17 times, most recently from 7a08e72 to 66670ea Compare October 12, 2021 20:58

vincepri mentioned this pull request Oct 18, 2021

Webhooks are not guaranteed to start before cache sync is started #1685

Closed

vincepri force-pushed the runnable-group branch from 77f47df to 51c8abd Compare October 19, 2021 15:47

sbueringer reviewed Oct 19, 2021

View reviewed changes

pkg/manager/internal.go Outdated Show resolved Hide resolved

joelanford reviewed Oct 20, 2021

View reviewed changes

vincepri force-pushed the runnable-group branch from 51c8abd to adeca0f Compare October 27, 2021 18:48

fabriziopandini mentioned this pull request Oct 29, 2021

🐛 Start web hooks first #1690

Merged

vincepri force-pushed the runnable-group branch from adeca0f to 38f1b90 Compare November 2, 2021 15:25

k8s-ci-robot changed the title ~~🐛 Refactor manager to avoid race conditions and provide clean shutdown~~ ⚠️ Refactor manager to avoid race conditions and provide clean shutdown Nov 2, 2021

alvaroaleman reviewed Nov 3, 2021

View reviewed changes

vincepri force-pushed the runnable-group branch from 33b61fa to 423ec74 Compare November 3, 2021 15:32

vincepri force-pushed the runnable-group branch from 423ec74 to 612e9b2 Compare November 3, 2021 22:02

alvaroaleman approved these changes Nov 4, 2021

View reviewed changes

k8s-ci-robot assigned alvaroaleman Nov 4, 2021

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 4, 2021

k8s-ci-robot merged commit 6c7c1d7 into kubernetes-sigs:master Nov 4, 2021

k8s-ci-robot added this to the v0.10.x milestone Nov 4, 2021

sbueringer mentioned this pull request Nov 10, 2021

When leader election is disabled controllers fail to start conversion webhooks when upgrading from v1alpha3 to v1beta1 kubernetes-sigs/cluster-api-provider-aws#2834

Closed

FillZpp mentioned this pull request Dec 17, 2021

Race Condition in controllerManager.waitForRunnable #1442

Closed

pjestin-sym mentioned this pull request Jan 11, 2023

Resources are sometimes manipulated with the wrong API group operator-framework/operator-sdk#6220

Closed

wallrj mentioned this pull request Dec 6, 2023

Mitigate potential Slowloris attacks by setting ReadHeaderTimeout in all http.Server instances cert-manager/cert-manager#6534

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

⚠️ Refactor manager to avoid race conditions and provide clean shutdown #1695

⚠️ Refactor manager to avoid race conditions and provide clean shutdown #1695

vincepri commented Oct 12, 2021 •

edited

Loading

randomvariable commented Oct 19, 2021

sbueringer commented Oct 19, 2021

joelanford left a comment

joelanford Oct 20, 2021

joelanford Oct 20, 2021

vincepri Oct 27, 2021

joelanford Oct 20, 2021

vincepri Oct 27, 2021

joelanford Oct 20, 2021

vincepri Oct 27, 2021

joelanford Oct 20, 2021

vincepri Oct 27, 2021

joelanford Oct 20, 2021

randomvariable commented Nov 2, 2021

vincepri commented Nov 2, 2021

vincepri commented Nov 2, 2021

alvaroaleman Nov 3, 2021

vincepri Nov 3, 2021

sbueringer commented Nov 3, 2021

alvaroaleman commented Nov 3, 2021

vincepri commented Nov 3, 2021

alvaroaleman left a comment

k8s-ci-robot commented Nov 4, 2021

⚠️ Refactor manager to avoid race conditions and provide clean shutdown #1695

⚠️ Refactor manager to avoid race conditions and provide clean shutdown #1695

Conversation

vincepri commented Oct 12, 2021 • edited Loading

randomvariable commented Oct 19, 2021

sbueringer commented Oct 19, 2021

joelanford left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

randomvariable commented Nov 2, 2021

vincepri commented Nov 2, 2021

vincepri commented Nov 2, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sbueringer commented Nov 3, 2021

alvaroaleman commented Nov 3, 2021

vincepri commented Nov 3, 2021

alvaroaleman left a comment

Choose a reason for hiding this comment

k8s-ci-robot commented Nov 4, 2021

vincepri commented Oct 12, 2021 •

edited

Loading