Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Affinity assistant deadlock during node maintenance #6586

Closed
lbernick opened this issue Apr 26, 2023 · 1 comment · Fixed by #6596
Closed

Affinity assistant deadlock during node maintenance #6586

lbernick opened this issue Apr 26, 2023 · 1 comment · Fixed by #6596
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@lbernick
Copy link
Member

lbernick commented Apr 26, 2023

Expected Behavior

If a node is cordoned (marked as unschedulable) for maintenance, any PipelineRuns with TaskRuns running on that node should run to completion.

(Out of scope = nodes going down or pods being evicted)

Actual Behavior

This situation can result in deadlock when the affinity assistant is enabled. Subsequent TaskRun pods have affinity for the placeholder pod, which is on an unschedulable node. These pods cannot be scheduled and do not trigger scale-up, so they just pend until the TaskRuns time out. (Reported by @skaegi and @pritidesai.)

Related: #4699

With the affinity assistant disabled, you can cordon a node, wait for existing TaskRuns to finish before you delete any pods, and then the cluster autoscaler will trigger a scale up, creating a new node matching the node affinity terms of the original PV. Subsequent TaskRun pods get scheduled on the new node and the PipelineRun completes successfully, i.e. this is not a problem.

Steps to Reproduce the Problem

  1. enable the affinity assistant
  2. Create the following PipelineRun (sequential tasks sharing the same PVC workspace):
apiVersion: tekton.dev/v1
kind: PipelineRun
metadata:
  generateName: good-morning-run-
spec:
  workspaces:
  - name: source
    volumeClaimTemplate:
      spec:
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 10Mi
  pipelineSpec:
    workspaces:
    - name: source
    tasks:
    - name: first
      taskSpec:
        workspaces:
        - name: source
        steps:
        - image: busybox
          script: |
            echo $(workspaces.source.path)
            sleep 60
      workspaces:
      - name: source
    - name: last
      taskSpec:
        workspaces:
        - name: source
        steps:
        - image: busybox
          script: |
            echo $(workspaces.source.path)
            sleep 60
      runAfter: ["first"]
      workspaces:
      - name: source
  1. Determine what node the affinity assistant pod is running on:
$ kubectl get pods -l app.kubernetes.io/component=affinity-assistant -o=custom-columns=NAME:.metadata.name,NODE:.spec.nodeName
NAME                              NODE
affinity-assistant-6d8794b076-0   gke-test-cluster-default-pool-2b351b27-vrl5
  1. Cordon the node:
$ kubectl cordon gke-test-cluster-default-pool-2b351b27-vrl5
node/gke-test-cluster-default-pool-2b351b27-vrl5 cordoned
  1. When the second TaskRun is created, its pod is stuck in pending status:
$ kubectl get po
NAME                               READY   STATUS      RESTARTS   AGE
affinity-assistant-6d8794b076-0    1/1     Running     0          117s
good-morning-run-kcsr4-first-pod   0/1     Completed   0          117s
good-morning-run-kcsr4-last-pod    0/1     Pending     0          26s

$ kubectl get events -n default --field-selector involvedObject.name=good-morning-run-kcsr4-last-pod
LAST SEEN   TYPE      REASON              OBJECT                                MESSAGE
60s         Warning   FailedScheduling    pod/good-morning-run-kcsr4-last-pod   0/4 nodes are available: 1 node(s) didn't match pod affinity rules, 1 node(s) were unschedulable, 2 node(s) had volume node affinity conflict. preemption: 0/4 nodes are available: 4 Preemption is not helpful for scheduling.
60s         Normal    NotTriggerScaleUp   pod/good-morning-run-kcsr4-last-pod   pod didn't trigger scale-up: 2 node(s) had volume node affinity conflict, 1 node(s) didn't match pod affinity rules

Additional Info

  • Kubernetes version:
Client Version: v1.25.4
Kustomize Version: v4.5.7
Server Version: v1.24.10-gke.2300
  • Tekton Pipeline version:

main

@lbernick lbernick added the kind/bug Categorizes issue or PR as related to a bug. label Apr 26, 2023
@pritidesai
Copy link
Member

Thanks a bunch @lbernick for creating this issue, appreciate it 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
2 participants