TaskRun that uses multiple PVC-backed workspaces will fail to schedule when using Affinity Assistant #3545

bioball · 2020-11-19T19:18:38Z

Expected Behavior

Pipelines that use multiple workspaces in a TaskRun should run when affinity assistants are enabled

Actual Behavior

The underlying pods fail to schedule due to multi-attach errors

Steps to Reproduce the Problem

Here's a sample pipeline that reproduces this issue:

apiVersion: tekton.dev/v1beta1
kind: Pipeline
metadata:
  name: multi-workspace
spec:
  tasks:
  - name: my-task
    taskRef:
      name: my-task
    workspaces:
    - name: workspacea
      workspace: workspacea
    - name: workspaceb
      workspace: workspaceb
  workspaces:
  - name: workspacea
  - name: workspaceb
---
apiVersion: tekton.dev/v1beta1
kind: PipelineRun
metadata:
  name: multi-workspace-run
spec:
  pipelineRef:
    name: multi-workspace
  workspaces:
  - name: workspacea
    volumeClaimTemplate:
      spec:
        resources:
          requests:
            storage: 100M
        accessModes:
        - ReadWriteOnce
  - name: workspaceb
    volumeClaimTemplate:
      spec:
        resources:
          requests:
            storage: 100M
        accessModes:
        - ReadWriteOnce
---
apiVersion: tekton.dev/v1beta1
kind: Task
metadata:
  name: my-task
spec:
  steps:
  - image: busybox
    name: hello
    script: echo "Hello"
  workspaces:
  - name: workspacea
  - name: workspaceb

If applied to a multi-node cluster, I see this error within the pod's events:

Multi-Attach error for volume "us-west-3a-7d69ae3197354075a037" Volume is already used by pod(s) affinity-assistant-workspacea-multi-workspace-run-0

It seems like the root cause is that the affinity assistants were scheduled onto different nodes. These are the pods that I'm seeing:

NAME                                                  STATUS      NODE
affinity-assistant-workspacea-multi-workspace-run-0   Running     knode0750.********
affinity-assistant-workspaceb-multi-workspace-run-0   Running     knode0688.********

Additional Info

Kubernetes version:

Output of kubectl version:

Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.3", GitCommit:"1e11e4a2108024935ecfcb2912226cedeafd99df", GitTreeState:"clean", BuildDate:"2020-10-14T22:35:52Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.9", GitCommit:"94f372e501c973a7fa9eb40ec9ebd2fe7ca69848", GitTreeState:"archive", BuildDate:"2020-10-02T05:55:28Z", GoVersion:"go1.14.4", Compiler:"gc", Platform:"linux/amd64"}

Tekton Pipeline version:

v0.13.0-apple.1

The text was updated successfully, but these errors were encountered:

vdemeester · 2020-11-20T10:32:14Z

This is by design, although I don't find something explicit in the documentation that it cannot work with 2 workspace with different PVC 🤔
cc @jlpettersson

skaegi · 2020-12-16T14:03:52Z

This seems strange. I thought we would only generate one affinity assistant for the pipeline run to effectively co-locate all pods that use the workspace(s) on one node. Perhaps the relationship is one affinity assistant per PVC? @jlpettersson can you help out here

jlpettersson · 2020-12-16T14:12:08Z

The current implementation creates an affinity assistant per PVC in the PipelineRun.

But an alternative is to create an affinity assistant per PipelineRun. They have different pros and cons. I elaborated about this in the new design doc Task parallelism when using workspace

And the placeholder pod also mount the PVC and "repels" from other placeholder pods - with this, two PVCs can not be mounted by a PVC. If changed to create an affinity assistant per PipelineRun it would be slightly different, but two different PipelineRuns trying to mount the same PVC could not be executed at the same time - which I think is the main reason to use more than one PVC?

tekton-robot · 2021-03-16T14:38:11Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale with a justification.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

tekton-robot · 2021-04-15T14:46:42Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

tekton-robot · 2021-05-15T14:57:09Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen with a justification.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/close

Send feedback to tektoncd/plumbing.

tekton-robot · 2021-05-15T14:57:11Z

@tekton-robot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen with a justification.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/close

Send feedback to tektoncd/plumbing.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

jimmykarily · 2021-09-14T12:03:46Z

The current implementation creates an affinity assistant per PVC in the PipelineRun.

We are using v0.23.0 and what we are seeing is that we get one affinity assistant per "use" of a PVC.

See here and here (it's the same PVC mounted with 2 different subPaths). If it helps I can send the final rendered PipelineRun so you don't need to figure it out from the Golang code.

Is this expected? I see that at some point it didn't pass validation, but that was fixed. I think the correct behavior is, to also check when creating the affinity assistant and only create one instance per PVC (not per "mount"). In our case we had to disable the affinity assistant. Although we only had one PVC, the 2 assistants sometimes landed on different nodes and this failed with an error. See this commit for more: epinio/epinio@ef6d457

jlpettersson · 2021-09-14T17:46:36Z

@jimmykarily I think you are right.

The affinity assistant only looks at the combination of Workspace Name and PipelineRun Name, as per https://github.com/tektoncd/pipeline/blob/main/pkg/reconciler/pipelinerun/affinity_assistant.go#L56

So if you have two Workspaces, with the same PVC but different subPath set in WorkspaceBinding - it will not work as you correctly point out. On the other hand, this would not allow for two concurrent runs of the same pipeline, as they use the same PVC (possibly "pinned" to different nodes) - but if the AA could tolerate different subpaths on the PVC as you suggests, it would allow for parallel Task execution within the same PipelineRun.

You could open a feature request about this and it could be better documented. Thanks for the comment.

jimmykarily · 2021-09-15T06:24:40Z

@jlpettersson you are right about the concurrent runs of the pipeline and we indeed avoid this situation in our code. In our case this makes sense because we use Tekton for application staging and we don't allow 2 staging processes to run for the same app at the same time.

That said, this limitation comes from the fact that we are using a pre-existing PVC and having 2 affinity assistants doesn't help with that in any way. Actually the 2 assistants prevent the pipeline from running when they land on a different node.
In my opinion one of the 2 should happen:

Create only one AA per PVC (even when mounted multiple times). This doesn't introduce any additional limitations but the limitations of using an existing PVC still apply (no concurrent runs of the pipeline). This will avoid our situation in which the pipeline doesn't even run and having an affinity assistant will allow parallel tasks within the same pipeline.

or

Document that the AA should be disabled when using a pre-existing PVC mounted multiple times in the same pipeline because it may prevent the pipeline from running. Given the AA setting is a global, one has to decide to miss the affinity assistant functionality for every pipeline they have, even when only one pipeline can't live with it. That's why I think the previous solution may be better.

I could open a PR that updates the documentation (option #2) but I'm not sure I can get the other proposed solution done fast. I would rather open a new issue for that.

bioball added the kind/bug Categorizes issue or PR as related to a bug. label Nov 19, 2020

bioball changed the title ~~Tekton fails to schedule a TaskRun that uses multiple workspaces when using Affinity Assistant~~ TaskRun that uses multiple workspaces will deadlock when using Affinity Assistant Nov 19, 2020

bioball changed the title ~~TaskRun that uses multiple workspaces will deadlock when using Affinity Assistant~~ TaskRun that uses multiple workspaces will fail to schedule when using Affinity Assistant Nov 19, 2020

bioball changed the title ~~TaskRun that uses multiple workspaces will fail to schedule when using Affinity Assistant~~ TaskRun that uses multiple PVC-backed workspaces will fail to schedule when using Affinity Assistant Nov 19, 2020

tekton-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 16, 2021

tekton-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 15, 2021

tekton-robot closed this as completed May 15, 2021

jimmykarily mentioned this issue Sep 14, 2021

Retry more API requests and disable Affinity Assistant epinio/epinio#798

Merged

lbernick mentioned this issue Apr 14, 2023

TEP-0135: Per-PipelineRun (instead of per-workspace) affinity assistant #6543

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TaskRun that uses multiple PVC-backed workspaces will fail to schedule when using Affinity Assistant #3545

TaskRun that uses multiple PVC-backed workspaces will fail to schedule when using Affinity Assistant #3545

bioball commented Nov 19, 2020

vdemeester commented Nov 20, 2020

skaegi commented Dec 16, 2020

jlpettersson commented Dec 16, 2020 •

edited

Loading

tekton-robot commented Mar 16, 2021

tekton-robot commented Apr 15, 2021

tekton-robot commented May 15, 2021

tekton-robot commented May 15, 2021

jimmykarily commented Sep 14, 2021

jlpettersson commented Sep 14, 2021

jimmykarily commented Sep 15, 2021

TaskRun that uses multiple PVC-backed workspaces will fail to schedule when using Affinity Assistant #3545

TaskRun that uses multiple PVC-backed workspaces will fail to schedule when using Affinity Assistant #3545

Comments

bioball commented Nov 19, 2020

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Additional Info

vdemeester commented Nov 20, 2020

skaegi commented Dec 16, 2020

jlpettersson commented Dec 16, 2020 • edited Loading

tekton-robot commented Mar 16, 2021

tekton-robot commented Apr 15, 2021

tekton-robot commented May 15, 2021

tekton-robot commented May 15, 2021

jimmykarily commented Sep 14, 2021

jlpettersson commented Sep 14, 2021

jimmykarily commented Sep 15, 2021

jlpettersson commented Dec 16, 2020 •

edited

Loading