[TEP-0135] Coschedule per (Isolated) PipelineRun e2e support #6927

QuanZhang-William · 2023-07-13T20:50:09Z

Changes

Part of #6740. TEP-0135 introduces a feature that allows a cluster operator to ensure that all of a PipelineRun's pods are scheduled to the same node.

This commit consumes the functions added in #6819 and implements end to end support of coschedule:pipelineruns coscheduling mode, where all the PipelineRun pods are scheduled to the same node.

/kind feature

Submitter Checklist

As the author of this PR, please check off the items in this checklist:

Has Docs if any changes are user facing, including updates to minimum requirements e.g. Kubernetes version bumps
Has Tests included if any functionality added or changed
Follows the commit message standard
Meets the Tekton contributor standards (including functionality, content, code)
Has a kind label. You can add one by adding a comment on this PR that contains /kind <type>. Valid types are bug, cleanup, design, documentation, feature, flake, misc, question, tep
Release notes block below has been updated with any user facing changes (API changes, bug fixes, changes requiring upgrade notices or deprecation warnings). See some examples of good release notes.
Release notes contains the string "action required" if the change requires additional action from users switching to the new release

Release Notes

[TEP-0135]: Support `coschedule: pipelineruns` and `coschedule: isolate-pipelinerun` coschedule modes.
Users can now opt in this new feature to schedule all the pods in the same node and to optionally enforce one running pipelinerun in a node at the same time.

tekton-robot · 2023-07-13T20:50:10Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

tekton-robot · 2023-07-13T20:55:59Z

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File	Old Coverage	New Coverage	Delta
pkg/reconciler/pipelinerun/affinity_assistant.go	97.2%	97.4%	0.2
pkg/reconciler/pipelinerun/pipelinerun.go	91.8%	91.4%	-0.4
pkg/reconciler/taskrun/taskrun.go	84.8%	85.2%	0.4

tekton-robot · 2023-07-13T20:56:55Z

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File	Old Coverage	New Coverage	Delta
pkg/reconciler/pipelinerun/affinity_assistant.go	97.2%	97.4%	0.2
pkg/reconciler/pipelinerun/pipelinerun.go	91.8%	91.4%	-0.4
pkg/reconciler/taskrun/taskrun.go	84.8%	85.2%	0.4

tekton-robot · 2023-07-13T21:52:29Z

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File	Old Coverage	New Coverage	Delta
pkg/reconciler/pipelinerun/affinity_assistant.go	97.2%	97.4%	0.2
pkg/reconciler/pipelinerun/pipelinerun.go	91.8%	91.4%	-0.4
pkg/reconciler/taskrun/taskrun.go	84.8%	85.2%	0.4

tekton-robot · 2023-07-14T14:54:07Z

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File	Old Coverage	New Coverage	Delta
pkg/reconciler/pipelinerun/affinity_assistant.go	97.2%	97.4%	0.2
pkg/reconciler/pipelinerun/pipelinerun.go	91.8%	92.3%	0.5
pkg/reconciler/taskrun/taskrun.go	84.8%	85.2%	0.4

tekton-robot · 2023-07-14T14:55:57Z

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File	Old Coverage	New Coverage	Delta
pkg/reconciler/pipelinerun/affinity_assistant.go	97.2%	97.4%	0.2
pkg/reconciler/pipelinerun/pipelinerun.go	91.8%	92.3%	0.5
pkg/reconciler/taskrun/taskrun.go	84.8%	85.2%	0.4

QuanZhang-William · 2023-07-14T18:49:15Z

It could make more sense to merge #6929 first, so that this PR can have e2e supports for both coschedule: pipelineruns and coschedule: isolate-pipelinerun modes.

/cc @lbernick

pkg/reconciler/pipelinerun/affinity_assistant.go

lbernick · 2023-07-14T21:26:10Z

pkg/reconciler/pipelinerun/affinity_assistant.go

 			}
 		}
+	case aa.AffinityAssistantPerPipelineRun, aa.AffinityAssistantPerPipelineRunWithIsolation:
+		affinityAssistantName := GetAffinityAssistantName("", pr.Name)
+		if err := c.KubeClientSet.AppsV1().StatefulSets(pr.Namespace).Delete(ctx, affinityAssistantName, metav1.DeleteOptions{}); err != nil && !apierrors.IsNotFound(err) {


I think we also need to delete the PVCs created by the statefulsets, and update integration tests to ensure those PVCs are actually deleted.

Thanks Lee! I have created a separate PR: #6940 to deal with the cleanup logic (as well as the PVC deletion behaviors) as we discussed earlier.

I think it it makes more sense to merged #6940 first, and I can add integration test in this PR

/hold

pkg/reconciler/pipelinerun/affinity_assistant.go

lbernick · 2023-07-14T21:29:06Z

pkg/reconciler/pipelinerun/affinity_assistant_test.go

@@ -159,12 +161,16 @@ func TestCreateAndDeleteOfAffinityAssistantPerPipelineRun(t *testing.T) {
 	}, {
 		name:                  "other Workspace type",
 		pr:                    testPRWithEmptyDir,
-		expectStatefulSetSpec: nil,
+		expectStatefulSetSpec: &appsv1.StatefulSetSpec{},


nit: It doesn't really make sense to differentiate between nil and a pointer to an empty struct; I'd have tests treat them as equivalent rather than asserting a specific behavior.

Prior to this commit, we don't create AffinityAssistant for TaskRuns without a pvc based workspace in coschedule per pipelinerun mode, the statefulset is NOT expected to be created (nil).

In this commit, we create AffinityAssistant for all TaskRuns, so we do expect an AffinityAssitant is created (but there is no pvc or volumeclaimtemplate in the StatefulSetSpec)

I'm a bit confused, why would we want to validate that a statefulset is created with an empty spec? how is that different from a nil pointer?

Sorry for the confusion. The confusing point is that we had a StatefulSetSpec filter here:

pipeline/pkg/reconciler/pipelinerun/affinity_assistant_test.go

Line 1218 in 53e71bf

if d := cmp.Diff(expectStatefulSetSpec, &aa.Spec, podSpecFilter, podTemplateSpecFilter); d != "" {

, which ignores the Replica and Selector before.

In the test case, it is actually a spec with Replica and Selector but not an empty spec (we didn't have it before since they will not be validated anyways due to the filter). I think the best way address this confusion is to remote this filter and validate the selector and replicas (and we get more validated coverage 😄 )

pkg/reconciler/pipelinerun/affinity_assistant_test.go

pkg/reconciler/taskrun/taskrun_test.go

lbernick · 2023-07-14T21:42:08Z

test/affinity_assistant_test.go

+	}
+}
+
+func resetFeatureFlagAndCleanup(ctx context.Context, t *testing.T, c *clients, namespace string) {


This is interesting-- how do you prevent the integration test from interfering with the other integration tests? Do they just run sequentially?

Yeah, I'm following the discussion here: #6079 to set/revert the feature flags and run tests sequentially.

test/affinity_assistant_test.go

lbernick · 2023-07-14T21:44:38Z

Thanks @QuanZhang-William! One more note; I think the existing release note might be hard for users to understand, and they might not know why it might be a good idea to enable this new feature. Can you put the release notes in terms of the functionality the user is interested in, i.e. scheduling pods to the right nodes?

Also, are all the docs for this feature up to date? I wouldn't want to have a release note saying the feature is ready but have our docs say it's not functional yet.

QuanZhang-William · 2023-07-17T15:19:59Z

/hold until #6929 is merged so that this PR can have e2e support for both coschedule pipelineruns and coschedule isolate-pipelinerun modes

tekton-robot · 2023-07-21T18:56:57Z

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File	Old Coverage	New Coverage	Delta
pkg/reconciler/pipelinerun/affinity_assistant.go	94.4%	96.8%	2.4
pkg/reconciler/pipelinerun/pipelinerun.go	91.8%	92.4%	0.5
pkg/reconciler/taskrun/taskrun.go	84.8%	85.2%	0.4

QuanZhang-William · 2023-07-21T19:57:06Z

Thanks @QuanZhang-William! One more note; I think the existing release note might be hard for users to understand, and they might not know why it might be a good idea to enable this new feature. Can you put the release notes in terms of the functionality the user is interested in, i.e. scheduling pods to the right nodes?

No problem, Lee. I have updated the release not

Also, are all the docs for this feature up to date? I wouldn't want to have a release note saying the feature is ready but have our docs say it's not functional yet.

We have this open PR #6892 for the documentation. I will remove all the WIP and Warnings in the documentation PR after merging this one.

lbernick · 2023-07-21T20:17:03Z

pkg/reconciler/pipelinerun/affinity_assistant_test.go

+				t.Errorf("expected err type mismatching, expecting %v but got: %v", ErrAffinityAssistantCreationFailed, err)
+			}
+		}
+		if d := cmp.Diff(tc.expectedErr.Error(), err.Error()); d != "" {


you can do this with cmpopts.EquateErrors I think? This will check that the error is the right type which is more important than validating the exact error message

pkg/reconciler/pipelinerun/affinity_assistant_test.go

lbernick · 2023-07-21T20:18:33Z

pkg/reconciler/pipelinerun/affinity_assistant_test.go

@@ -159,12 +161,16 @@ func TestCreateAndDeleteOfAffinityAssistantPerPipelineRun(t *testing.T) {
 	}, {
 		name:                  "other Workspace type",
 		pr:                    testPRWithEmptyDir,
-		expectStatefulSetSpec: nil,
+		expectStatefulSetSpec: &appsv1.StatefulSetSpec{},


I'm a bit confused, why would we want to validate that a statefulset is created with an empty spec? how is that different from a nil pointer?

tekton-robot · 2023-07-21T20:20:36Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: lbernick

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [lbernick]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

tekton-robot · 2023-07-21T22:18:20Z

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File	Old Coverage	New Coverage	Delta
pkg/reconciler/pipelinerun/affinity_assistant.go	94.4%	96.8%	2.4
pkg/reconciler/pipelinerun/pipelinerun.go	91.8%	92.4%	0.5
pkg/reconciler/taskrun/taskrun.go	84.8%	85.2%	0.4

tekton-robot · 2023-07-21T22:20:31Z

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File	Old Coverage	New Coverage	Delta
pkg/reconciler/pipelinerun/affinity_assistant.go	94.4%	96.8%	2.4
pkg/reconciler/pipelinerun/pipelinerun.go	91.8%	92.4%	0.5
pkg/reconciler/taskrun/taskrun.go	84.8%	85.2%	0.4

Yongxuanzhang · 2023-07-24T15:53:47Z

/assign

QuanZhang-William · 2023-07-24T15:56:52Z

/hold until v0.50 LTS is released

QuanZhang-William · 2023-07-24T16:00:23Z

/assign @JeromeJu

tekton-robot · 2023-07-24T16:07:49Z

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File	Old Coverage	New Coverage	Delta
pkg/reconciler/pipelinerun/affinity_assistant.go	94.4%	96.8%	2.4
pkg/reconciler/pipelinerun/pipelinerun.go	91.8%	92.4%	0.5
pkg/reconciler/taskrun/taskrun.go	84.8%	85.2%	0.4

tekton-robot · 2023-07-24T16:08:12Z

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File	Old Coverage	New Coverage	Delta
pkg/reconciler/pipelinerun/affinity_assistant.go	94.4%	96.8%	2.4
pkg/reconciler/pipelinerun/pipelinerun.go	91.8%	92.4%	0.5
pkg/reconciler/taskrun/taskrun.go	84.8%	85.2%	0.4

Part of [tektoncd#6740]. [TEP-0135][tep-0135] introduces a feature that allows a cluster operator to ensure that all of a PipelineRun's pods are scheduled to the same node. This commit consumes the functions added in [tektoncd#6819] to implement end to end support of `Coschedule:PipelineRuns` where all the `PipelineRun pods` are scheduled to the same node, and the `Coschedule:isolate-pipelinerun` coschedule modes where only 1 PipelineRun is allowed to run in a node at the same time. /kind feature [tektoncd#6819]: tektoncd#6819 [tektoncd#6740]: tektoncd#6740 [tep-0135]: https://github.com/tektoncd/community/blob/main/teps/0135-coscheduling-pipelinerun-pods.md

QuanZhang-William · 2023-07-26T15:21:18Z

/hold cancel
v0.50 LTS is released

tekton-robot · 2023-07-26T15:26:27Z

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File	Old Coverage	New Coverage	Delta
pkg/reconciler/pipelinerun/affinity_assistant.go	96.0%	96.9%	0.9
pkg/reconciler/pipelinerun/pipelinerun.go	91.7%	92.4%	0.7
pkg/reconciler/taskrun/taskrun.go	84.8%	85.2%	0.4

tekton-robot · 2023-07-26T15:28:15Z

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File	Old Coverage	New Coverage	Delta
pkg/reconciler/pipelinerun/affinity_assistant.go	96.0%	96.9%	0.9
pkg/reconciler/pipelinerun/pipelinerun.go	91.7%	92.4%	0.7
pkg/reconciler/taskrun/taskrun.go	84.8%	85.2%	0.4

JeromeJu · 2023-07-26T20:40:43Z

/lgtm

tekton-robot requested review from pritidesai and wlynch July 13, 2023 20:50

QuanZhang-William changed the title ~~[TEP-0135] Coschedule per PipelineRun E2E support~~ [TEP-0135] Coschedule per PipelineRun e2e support Jul 13, 2023

QuanZhang-William force-pushed the tep-0135-e2e-pipelinerun branch from a4315b0 to 200b7b0 Compare July 13, 2023 21:44

lbernick self-assigned this Jul 14, 2023

QuanZhang-William force-pushed the tep-0135-e2e-pipelinerun branch from 200b7b0 to 255f1e5 Compare July 14, 2023 14:47

QuanZhang-William marked this pull request as ready for review July 14, 2023 14:47

tekton-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 14, 2023

tekton-robot requested review from dibyom and imjasonh July 14, 2023 14:47

tekton-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 14, 2023

tekton-robot requested a review from lbernick July 14, 2023 18:49

lbernick reviewed Jul 14, 2023

View reviewed changes

QuanZhang-William mentioned this pull request Jul 17, 2023

[TEP-0135] coschedule isolate pipelinerun #6929

Merged

7 tasks

tekton-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 17, 2023

QuanZhang-William force-pushed the tep-0135-e2e-pipelinerun branch from 255f1e5 to 1318420 Compare July 18, 2023 18:35

tekton-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 18, 2023

lbernick approved these changes Jul 21, 2023

View reviewed changes

tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 21, 2023

QuanZhang-William mentioned this pull request Jul 21, 2023

[TEP-0135] Update Affinity Assistant documentation #6892

Merged

7 tasks

QuanZhang-William force-pushed the tep-0135-e2e-pipelinerun branch from e0d1168 to 53e71bf Compare July 21, 2023 22:12

QuanZhang-William changed the title ~~[TEP-0135] Coschedule per PipelineRun e2e support~~ [TEP-0135] Coschedule per (Isolated) PipelineRun e2e support Jul 24, 2023

tekton-robot assigned Yongxuanzhang Jul 24, 2023

tekton-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 24, 2023

QuanZhang-William force-pushed the tep-0135-e2e-pipelinerun branch from 53e71bf to c8a8c7f Compare July 24, 2023 15:59

tekton-robot assigned JeromeJu Jul 24, 2023

QuanZhang-William mentioned this pull request Jul 24, 2023

[TEP-0135] Refactor CreatePVCsForWorkspaces #6921

Merged

7 tasks

QuanZhang-William force-pushed the tep-0135-e2e-pipelinerun branch from c8a8c7f to 224cba3 Compare July 26, 2023 15:20

tekton-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 26, 2023

tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Jul 26, 2023

tekton-robot merged commit 5703d8f into tektoncd:main Jul 26, 2023

QuanZhang-William mentioned this pull request Jul 25, 2023

TEP-0135: Coscheduling PipelineRun pods Implementation #6740

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TEP-0135] Coschedule per (Isolated) PipelineRun e2e support #6927

[TEP-0135] Coschedule per (Isolated) PipelineRun e2e support #6927

QuanZhang-William commented Jul 13, 2023 •

edited

Loading

tekton-robot commented Jul 13, 2023

tekton-robot commented Jul 13, 2023

tekton-robot commented Jul 13, 2023

tekton-robot commented Jul 13, 2023

tekton-robot commented Jul 14, 2023

tekton-robot commented Jul 14, 2023

QuanZhang-William commented Jul 14, 2023

lbernick Jul 14, 2023

QuanZhang-William Jul 18, 2023

lbernick Jul 14, 2023

QuanZhang-William Jul 21, 2023

lbernick Jul 21, 2023

QuanZhang-William Jul 21, 2023

lbernick Jul 14, 2023

QuanZhang-William Jul 21, 2023

lbernick commented Jul 14, 2023

QuanZhang-William commented Jul 17, 2023

tekton-robot commented Jul 21, 2023

QuanZhang-William commented Jul 21, 2023

lbernick Jul 21, 2023

lbernick Jul 21, 2023

tekton-robot commented Jul 21, 2023

tekton-robot commented Jul 21, 2023

tekton-robot commented Jul 21, 2023

Yongxuanzhang commented Jul 24, 2023

QuanZhang-William commented Jul 24, 2023

QuanZhang-William commented Jul 24, 2023

tekton-robot commented Jul 24, 2023

tekton-robot commented Jul 24, 2023

QuanZhang-William commented Jul 26, 2023

tekton-robot commented Jul 26, 2023

tekton-robot commented Jul 26, 2023

JeromeJu commented Jul 26, 2023

[TEP-0135] Coschedule per (Isolated) PipelineRun e2e support #6927

[TEP-0135] Coschedule per (Isolated) PipelineRun e2e support #6927

Conversation

QuanZhang-William commented Jul 13, 2023 • edited Loading

Changes

Submitter Checklist

Release Notes

tekton-robot commented Jul 13, 2023

tekton-robot commented Jul 13, 2023

tekton-robot commented Jul 13, 2023

tekton-robot commented Jul 13, 2023

tekton-robot commented Jul 14, 2023

tekton-robot commented Jul 14, 2023

QuanZhang-William commented Jul 14, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lbernick commented Jul 14, 2023

QuanZhang-William commented Jul 17, 2023

tekton-robot commented Jul 21, 2023

QuanZhang-William commented Jul 21, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tekton-robot commented Jul 21, 2023

tekton-robot commented Jul 21, 2023

tekton-robot commented Jul 21, 2023

Yongxuanzhang commented Jul 24, 2023

QuanZhang-William commented Jul 24, 2023

QuanZhang-William commented Jul 24, 2023

tekton-robot commented Jul 24, 2023

tekton-robot commented Jul 24, 2023

QuanZhang-William commented Jul 26, 2023

tekton-robot commented Jul 26, 2023

tekton-robot commented Jul 26, 2023

JeromeJu commented Jul 26, 2023

QuanZhang-William commented Jul 13, 2023 •

edited

Loading