-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Handle affinity assistant deadlock for node maintenance #6584
Conversation
Prior to this commit, when cordoning a node for maintenance, the affinity assistant can deadlock. This happens because if the placeholder pod is scheduled to a node which is then marked unschedulable, new TaskRun pods cannot schedule or trigger scaleup for the cluster autoscaler because they have inter-pod affinity for the placeholder pod on the unschedulable node. This commit adds a new controller which watches for nodes. If the nodes become unschedulable, it deletes any affinity assistant pods running on them (but leaves any TaskRun pods). The affinity assistant statefulset will then recreate the placeholder pod, which will be scheduled to an available node (or trigger scale-up if there's a volume node affinity conflict). Existing TaskRuns cannot be scheduled until the placeholder pod is re-scheduled. This commit only handles situations where nodes are unschedulable. It doesn't handle situations where nodes run out of resources or reach their cap on the number of pods. Tested locally, it appears to work.
Skipping CI for Draft Pull Request. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@skaegi @pritidesai hoping this could address your concerns? This feels a bit hacky but would like to hear your thoughts. @jlpettersson you might have thoughts as well? |
This sounds good to me. It is an improvement from the current situation. 👍 |
Closing in favor of @pritidesai's alternate approach #6596, thanks Priti! |
Prior to this commit, when cordoning a node for maintenance, the affinity assistant can deadlock. This happens because if the placeholder pod is scheduled to a node which is then marked unschedulable, new TaskRun pods cannot schedule or trigger scaleup for the cluster autoscaler because they have inter-pod affinity for the placeholder pod on the unschedulable node.
This commit adds a new controller which watches for nodes. If the nodes become unschedulable, it deletes any affinity assistant pods running on them (but leaves any TaskRun pods). The affinity assistant statefulset will then recreate the placeholder pod, which will be scheduled to an available node (or trigger scale-up if there's a volume node affinity conflict). Existing TaskRuns cannot be scheduled until the placeholder pod is re-scheduled.
Prerequisite for #6543.
This commit only handles situations where nodes are unschedulable. It doesn't handle situations where nodes run out of resources or reach their cap on the number of pods (e.g. #4699).
Tested locally, it appears to work.
Closes #6586.
/kind bug
Submitter Checklist
As the author of this PR, please check off the items in this checklist:
/kind <type>
. Valid types are bug, cleanup, design, documentation, feature, flake, misc, question, tepRelease Notes