RFC: Handle affinity assistant deadlock for node maintenance #6584

lbernick · 2023-04-26T16:16:50Z

Prior to this commit, when cordoning a node for maintenance, the affinity assistant can deadlock. This happens because if the placeholder pod is scheduled to a node which is then marked unschedulable, new TaskRun pods cannot schedule or trigger scaleup for the cluster autoscaler because they have inter-pod affinity for the placeholder pod on the unschedulable node.

This commit adds a new controller which watches for nodes. If the nodes become unschedulable, it deletes any affinity assistant pods running on them (but leaves any TaskRun pods). The affinity assistant statefulset will then recreate the placeholder pod, which will be scheduled to an available node (or trigger scale-up if there's a volume node affinity conflict). Existing TaskRuns cannot be scheduled until the placeholder pod is re-scheduled.

Prerequisite for #6543.

This commit only handles situations where nodes are unschedulable. It doesn't handle situations where nodes run out of resources or reach their cap on the number of pods (e.g. #4699).

Tested locally, it appears to work.
Closes #6586.

/kind bug

Submitter Checklist

As the author of this PR, please check off the items in this checklist:

Has Docs if any changes are user facing, including updates to minimum requirements e.g. Kubernetes version bumps
Has Tests included if any functionality added or changed
Follows the commit message standard
Meets the Tekton contributor standards (including functionality, content, code)
Has a kind label. You can add one by adding a comment on this PR that contains /kind <type>. Valid types are bug, cleanup, design, documentation, feature, flake, misc, question, tep
Release notes block below has been updated with any user facing changes (API changes, bug fixes, changes requiring upgrade notices or deprecation warnings). See some examples of good release notes.
Release notes contains the string "action required" if the change requires additional action from users switching to the new release

Release Notes

NONE

Prior to this commit, when cordoning a node for maintenance, the affinity assistant can deadlock. This happens because if the placeholder pod is scheduled to a node which is then marked unschedulable, new TaskRun pods cannot schedule or trigger scaleup for the cluster autoscaler because they have inter-pod affinity for the placeholder pod on the unschedulable node. This commit adds a new controller which watches for nodes. If the nodes become unschedulable, it deletes any affinity assistant pods running on them (but leaves any TaskRun pods). The affinity assistant statefulset will then recreate the placeholder pod, which will be scheduled to an available node (or trigger scale-up if there's a volume node affinity conflict). Existing TaskRuns cannot be scheduled until the placeholder pod is re-scheduled. This commit only handles situations where nodes are unschedulable. It doesn't handle situations where nodes run out of resources or reach their cap on the number of pods. Tested locally, it appears to work.

tekton-robot · 2023-04-26T16:16:53Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

tekton-robot · 2023-04-26T16:16:57Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please ask for approval from lbernick after the PR has been reviewed.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

lbernick · 2023-04-26T16:17:43Z

@skaegi @pritidesai hoping this could address your concerns? This feels a bit hacky but would like to hear your thoughts.

@jlpettersson you might have thoughts as well?

jlpettersson · 2023-04-26T16:23:30Z

This sounds good to me. It is an improvement from the current situation. 👍

pkg/reconciler/node/node.go

lbernick · 2023-05-09T13:07:22Z

Closing in favor of @pritidesai's alternate approach #6596, thanks Priti!

tekton-robot added release-note-none Denotes a PR that doesnt merit a release note. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/bug Categorizes issue or PR as related to a bug. labels Apr 26, 2023

tekton-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Apr 26, 2023

tekton-robot requested review from pritidesai and vdemeester April 26, 2023 16:17

lbernick assigned pritidesai and skaegi Apr 26, 2023

pritidesai reviewed Apr 26, 2023

View reviewed changes

pkg/reconciler/node/node.go Show resolved Hide resolved

afrittoli self-assigned this Apr 26, 2023

lbernick mentioned this pull request Apr 26, 2023

TEP-0135: Per-PipelineRun (instead of per-workspace) affinity assistant #6543

Closed

dibyom self-assigned this Apr 27, 2023

lbernick mentioned this pull request May 8, 2023

update affinity assistant creation implementation #6596

Merged

7 tasks

lbernick closed this May 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Handle affinity assistant deadlock for node maintenance #6584

RFC: Handle affinity assistant deadlock for node maintenance #6584

lbernick commented Apr 26, 2023 •

edited

Loading

tekton-robot commented Apr 26, 2023

tekton-robot commented Apr 26, 2023

lbernick commented Apr 26, 2023

jlpettersson commented Apr 26, 2023

lbernick commented May 9, 2023

RFC: Handle affinity assistant deadlock for node maintenance #6584

RFC: Handle affinity assistant deadlock for node maintenance #6584

Conversation

lbernick commented Apr 26, 2023 • edited Loading

Submitter Checklist

Release Notes

tekton-robot commented Apr 26, 2023

tekton-robot commented Apr 26, 2023

lbernick commented Apr 26, 2023

jlpettersson commented Apr 26, 2023

lbernick commented May 9, 2023

lbernick commented Apr 26, 2023 •

edited

Loading