-
Notifications
You must be signed in to change notification settings - Fork 39.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
allow failed discovery on initial quota controller start #67433
allow failed discovery on initial quota controller start #67433
Conversation
@@ -161,10 +161,11 @@ func NewResourceQuotaController(options *ResourceQuotaControllerOptions) (*Resou | |||
|
|||
rq.quotaMonitor = qm | |||
|
|||
// do initial quota monitor setup | |||
// do initial quota monitor setup. If we have a discovery failure here, it's ok. We'll discover more resources when | |||
// a later sync happens. | |||
resources, err := GetQuotableResources(options.DiscoveryFunc) | |||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this only tolerate IsGroupDiscoveryFailedError
errors?
@@ -421,7 +422,9 @@ func (rq *ResourceQuotaController) Sync(discoveryFunc NamespacedResourcesFunc, p | |||
// Something has changed, so track the new state and perform a sync. | |||
oldResources := make(map[schema.GroupVersionResource]struct{}) | |||
wait.Until(func() { | |||
// Get the current resource list from discovery. | |||
// Get the current resource list from discovery. A failure here will (should?) prevent updates to the sync list. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is tolerating partial discovery on start but not on resync what we want? I expected them to match. ErrGroupDiscoveryFailed
errors expose which group/versions failed, if we want to use that info
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
even without inspecting the details of the error, we could do something like this as a simple start. letting the quota controller continue running without using available resource info from discovery doesn't seem great.
// Get the current resource list from discovery.
newResources, err := GetQuotableResources(discoveryFunc)
if err != nil {
utilruntime.HandleError(err)
if discovery.IsGroupDiscoveryFailedError(err) && len(newResources) > 0 {
// In partial discovery cases, don't remove any existing informers, just add new ones
for k, v := range oldResources {
newResources[k] = v
}
} else {
// short circuit in non-discovery error cases or if discovery returned zero resources
return
}
}
/assign @yliaog |
dbbfb7d
to
4c8e9de
Compare
@liggitt comments addressed. |
/retest |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: deads2k, yliaog The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/test all [submit-queue is verifying that this PR is safe to merge] |
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here. |
or @deads2k: cherry-pick to |
…2-upstream-release-1.10 Automatic merge from submit-queue. Automated cherry pick of #66932 #67433: Include unavailable API services in discovery response, allow failed discovery on initial quota controller start Cherry pick of #66932 on release-1.10. #66932: Include unavailable API services in discovery response #67433: allow failed discovery on initial quota controller start ```release-note kube-apiserver now includes all registered API groups in discovery, including registered extension API group/versions for unavailable extension API servers. kube-controller-manager can now start the quota controller when discovery results can only be partially determined. ```
…2-upstream-release-1.11 Automatic merge from submit-queue. Automated cherry pick of #66932 #67433: Include unavailable API services in discovery response, allow failed discovery on initial quota controller start Cherry pick of #66932 #67433 on release-1.11. #66932: Include unavailable API services in discovery response #67433: allow failed discovery on initial quota controller start ```release-note kube-apiserver now includes all registered API groups in discovery, including registered extension API group/versions for unavailable extension API servers. kube-controller-manager can now start the quota controller when discovery results can only be partially determined. ```
…33-upstream-release-1.9 Automatic merge from submit-queue. allow failed discovery on initial quota controller start Cherry pick of #67433 on release-1.9. #67433: allow failed discovery on initial quota controller start kube-controller-manager can now start the quota controller when discovery results can only be partially determined. ```release-note NONE ```
Fixes #65005
Aggregated API servers now correctly provide 503s on discovery endpoints for groups that cannot be reached. This means that the kube-controller-manager process is now sensitive to discovery failures in the quota controller. This change allows discovery failures in the initial quota replenishment controller resource discovery.
@liggitt suspects similar races exist to those he found GC last release, but this pull doesn't make that better or worse.
@kubernetes/sig-api-machinery-bugs