-
Notifications
You must be signed in to change notification settings - Fork 39.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Node status updater now deletes the node entry in attach updates... #45923
Node status updater now deletes the node entry in attach updates... #45923
Conversation
Hi @verult. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
nodeName, | ||
err) | ||
nsu.actualStateOfWorld.SetNodeStatusUpdateNeeded(nodeName) | ||
nsu.actualStateOfWorld.RemoveNodeFromAttachUpdates(nodeName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need to take different actions for nodeObj==nil and err != nil
When nodeObj==nil, it shows that API server does not have this object anymore, it should be safe to removeNode. But for err != nil, that indicates something wrong then retrieving the node object and node status updater should try it again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nodeLister.Get() returns nil object only when there's an error, and returns an error only when the object is nil. However I can add in as a safeguard against future changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error is different if node does not exist errors.NewNotFound(v1.Resource("node"), name)
you can check the error type to determine whether it is because of node exist or not.
8bc9fee
to
f41c953
Compare
@k8s-bot ok to test |
@verult: you can't request testing unless you are a kubernetes member. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
@k8s-bot ok to test |
@k8s-bot ok to test |
/approve |
e5f5944
to
5c28946
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of comments.
@@ -64,14 +66,19 @@ func (nsu *nodeStatusUpdater) UpdateNodeStatuses() error { | |||
nodesToUpdate := nsu.actualStateOfWorld.GetVolumesToReportAttached() | |||
for nodeName, attachedVolumes := range nodesToUpdate { | |||
nodeObj, err := nsu.nodeLister.Get(string(nodeName)) | |||
if nodeObj == nil || err != nil { | |||
statusErr, isStatusError := err.(*errors.StatusError) | |||
if isStatusError && statusErr.Status().Reason == metav1.StatusReasonNotFound { | |||
// If node does not exist, its status cannot be updated, log error and | |||
// reset flag statusUpdateNeeded back to true to indicate this node status | |||
// needs to be updated again |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Update this comment.
|
||
// Removes the given node from the record of attach updates. The node's entire | ||
// volumesToReportAsAttached list is removed. | ||
RemoveNodeFromAttachUpdates(nodeName types.NodeName) error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this operation does not apply to both attachdetach and volumemanager actual_state_of_the_world, just add it to the ASW interface in controller/volume/attachdetach/cache/actual_state_of_world.go instead. Then you don't need to put a noop version of the method in volumemanager/cache/actual_state_of_world.go
Make sure to keep the 1.5 version in sync with any changes in this PR as well |
@@ -64,14 +66,19 @@ func (nsu *nodeStatusUpdater) UpdateNodeStatuses() error { | |||
nodesToUpdate := nsu.actualStateOfWorld.GetVolumesToReportAttached() | |||
for nodeName, attachedVolumes := range nodesToUpdate { | |||
nodeObj, err := nsu.nodeLister.Get(string(nodeName)) | |||
if nodeObj == nil || err != nil { | |||
statusErr, isStatusError := err.(*errors.StatusError) | |||
if isStatusError && statusErr.Status().Reason == metav1.StatusReasonNotFound { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can use error.IsNotFound() to check this
5c28946
to
455dba2
Compare
/lgtm |
Make sure to fix the release note in your first comment on this page or change the release note label to no release note. |
455dba2
to
0a0d758
Compare
/release-note-none |
@k8s-bot pull-kubernetes-kubemark-e2e-gce test this |
… node is missing in NodeInformer cache. Fixes kubernetes#42438. - Added RemoveNodeFromAttachUpdates as part of node status updater operations.
0a0d758
to
f9dc2d5
Compare
@verult: The following test(s) failed:
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jingxu97, saad-ali, verult
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
Automatic merge from submit-queue (batch tested with PRs 46383, 45645, 45923, 44884, 46294) |
Commit found in the "release-1.6" branch appears to be this PR. Removing the "cherrypick-candidate" label. If this is an error find help to get your PR picked. |
Automatic merge from submit-queue Node status updater now deletes the node entry in attach updates when node is missing in NodeInformer cache. - Added RemoveNodeFromAttachUpdates as part of node status updater operations. **What this PR does / why we need it**: Fixes issue of unnecessary node status updates when node is deleted. **Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #42438 **Special notes for your reviewer**: v1.5 version of the fix addressed by PR #45923. This is necessary because NodeLister did not exist prior to 1.6, thus node status updater requires a slightly different node existence check. **Release note**: ```release-note NONE ```
… when node is missing in NodeInformer cache.
What this PR does / why we need it: Fixes issue of unnecessary node status updates when node is deleted.
Which issue this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close that issue when PR gets merged): fixes #42438Special notes for your reviewer: Unit tested added, but a more comprehensive test involving the attach detach controller requires certain testing functionality that is currently absent, and will require larger effort. Will be added at a later time.
There is an edge case caused by the following steps:
This would make the pod stuck in ContainerCreating state. This is low-pri since it's a specific edge case that can be avoided.
Release note: