Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TEP-0135] Coschedule per (Isolated) PipelineRun e2e support #6927

Merged

Conversation

QuanZhang-William
Copy link
Member

@QuanZhang-William QuanZhang-William commented Jul 13, 2023

Changes

Part of #6740. TEP-0135 introduces a feature that allows a cluster operator to ensure that all of a PipelineRun's pods are scheduled to the same node.

This commit consumes the functions added in #6819 and implements end to end support of coschedule:pipelineruns coscheduling mode, where all the PipelineRun pods are scheduled to the same node.

/kind feature

Submitter Checklist

As the author of this PR, please check off the items in this checklist:

  • Has Docs if any changes are user facing, including updates to minimum requirements e.g. Kubernetes version bumps
  • Has Tests included if any functionality added or changed
  • Follows the commit message standard
  • Meets the Tekton contributor standards (including functionality, content, code)
  • Has a kind label. You can add one by adding a comment on this PR that contains /kind <type>. Valid types are bug, cleanup, design, documentation, feature, flake, misc, question, tep
  • Release notes block below has been updated with any user facing changes (API changes, bug fixes, changes requiring upgrade notices or deprecation warnings). See some examples of good release notes.
  • Release notes contains the string "action required" if the change requires additional action from users switching to the new release

Release Notes

[TEP-0135]: Support `coschedule: pipelineruns` and `coschedule: isolate-pipelinerun` coschedule modes.
Users can now opt in this new feature to schedule all the pods in the same node and to optionally enforce one running pipelinerun in a node at the same time.

@tekton-robot
Copy link
Collaborator

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@tekton-robot tekton-robot added kind/feature Categorizes issue or PR as related to a new feature. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jul 13, 2023
@QuanZhang-William QuanZhang-William changed the title [TEP-0135] Coschedule per PipelineRun E2E support [TEP-0135] Coschedule per PipelineRun e2e support Jul 13, 2023
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/affinity_assistant.go 97.2% 97.4% 0.2
pkg/reconciler/pipelinerun/pipelinerun.go 91.8% 91.4% -0.4
pkg/reconciler/taskrun/taskrun.go 84.8% 85.2% 0.4

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/affinity_assistant.go 97.2% 97.4% 0.2
pkg/reconciler/pipelinerun/pipelinerun.go 91.8% 91.4% -0.4
pkg/reconciler/taskrun/taskrun.go 84.8% 85.2% 0.4

@QuanZhang-William QuanZhang-William force-pushed the tep-0135-e2e-pipelinerun branch from a4315b0 to 200b7b0 Compare July 13, 2023 21:44
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/affinity_assistant.go 97.2% 97.4% 0.2
pkg/reconciler/pipelinerun/pipelinerun.go 91.8% 91.4% -0.4
pkg/reconciler/taskrun/taskrun.go 84.8% 85.2% 0.4

@lbernick lbernick self-assigned this Jul 14, 2023
@QuanZhang-William QuanZhang-William force-pushed the tep-0135-e2e-pipelinerun branch from 200b7b0 to 255f1e5 Compare July 14, 2023 14:47
@QuanZhang-William QuanZhang-William marked this pull request as ready for review July 14, 2023 14:47
@tekton-robot tekton-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 14, 2023
@tekton-robot tekton-robot requested review from dibyom and imjasonh July 14, 2023 14:47
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/affinity_assistant.go 97.2% 97.4% 0.2
pkg/reconciler/pipelinerun/pipelinerun.go 91.8% 92.3% 0.5
pkg/reconciler/taskrun/taskrun.go 84.8% 85.2% 0.4

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/affinity_assistant.go 97.2% 97.4% 0.2
pkg/reconciler/pipelinerun/pipelinerun.go 91.8% 92.3% 0.5
pkg/reconciler/taskrun/taskrun.go 84.8% 85.2% 0.4

@tekton-robot tekton-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 14, 2023
@QuanZhang-William
Copy link
Member Author

It could make more sense to merge #6929 first, so that this PR can have e2e supports for both coschedule: pipelineruns and coschedule: isolate-pipelinerun modes.

/cc @lbernick

@tekton-robot tekton-robot requested a review from lbernick July 14, 2023 18:49
pkg/reconciler/pipelinerun/affinity_assistant.go Outdated Show resolved Hide resolved
}
}
case aa.AffinityAssistantPerPipelineRun, aa.AffinityAssistantPerPipelineRunWithIsolation:
affinityAssistantName := GetAffinityAssistantName("", pr.Name)
if err := c.KubeClientSet.AppsV1().StatefulSets(pr.Namespace).Delete(ctx, affinityAssistantName, metav1.DeleteOptions{}); err != nil && !apierrors.IsNotFound(err) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we also need to delete the PVCs created by the statefulsets, and update integration tests to ensure those PVCs are actually deleted.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Lee! I have created a separate PR: #6940 to deal with the cleanup logic (as well as the PVC deletion behaviors) as we discussed earlier.

I think it it makes more sense to merged #6940 first, and I can add integration test in this PR

/hold

@@ -159,12 +161,16 @@ func TestCreateAndDeleteOfAffinityAssistantPerPipelineRun(t *testing.T) {
}, {
name: "other Workspace type",
pr: testPRWithEmptyDir,
expectStatefulSetSpec: nil,
expectStatefulSetSpec: &appsv1.StatefulSetSpec{},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: It doesn't really make sense to differentiate between nil and a pointer to an empty struct; I'd have tests treat them as equivalent rather than asserting a specific behavior.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prior to this commit, we don't create AffinityAssistant for TaskRuns without a pvc based workspace in coschedule per pipelinerun mode, the statefulset is NOT expected to be created (nil).

In this commit, we create AffinityAssistant for all TaskRuns, so we do expect an AffinityAssitant is created (but there is no pvc or volumeclaimtemplate in the StatefulSetSpec)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused, why would we want to validate that a statefulset is created with an empty spec? how is that different from a nil pointer?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the confusion. The confusing point is that we had a StatefulSetSpec filter here:

if d := cmp.Diff(expectStatefulSetSpec, &aa.Spec, podSpecFilter, podTemplateSpecFilter); d != "" {
, which ignores the Replica and Selector before.

In the test case, it is actually a spec with Replica and Selector but not an empty spec (we didn't have it before since they will not be validated anyways due to the filter). I think the best way address this confusion is to remote this filter and validate the selector and replicas (and we get more validated coverage 😄 )

pkg/reconciler/pipelinerun/affinity_assistant_test.go Outdated Show resolved Hide resolved
pkg/reconciler/taskrun/taskrun_test.go Outdated Show resolved Hide resolved
pkg/reconciler/taskrun/taskrun_test.go Outdated Show resolved Hide resolved
}
}

func resetFeatureFlagAndCleanup(ctx context.Context, t *testing.T, c *clients, namespace string) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting-- how do you prevent the integration test from interfering with the other integration tests? Do they just run sequentially?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm following the discussion here: #6079 to set/revert the feature flags and run tests sequentially.

test/affinity_assistant_test.go Outdated Show resolved Hide resolved
@lbernick
Copy link
Member

Thanks @QuanZhang-William! One more note; I think the existing release note might be hard for users to understand, and they might not know why it might be a good idea to enable this new feature. Can you put the release notes in terms of the functionality the user is interested in, i.e. scheduling pods to the right nodes?

Also, are all the docs for this feature up to date? I wouldn't want to have a release note saying the feature is ready but have our docs say it's not functional yet.

@QuanZhang-William
Copy link
Member Author

/hold until #6929 is merged so that this PR can have e2e support for both coschedule pipelineruns and coschedule isolate-pipelinerun modes

@tekton-robot tekton-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 17, 2023
@QuanZhang-William QuanZhang-William force-pushed the tep-0135-e2e-pipelinerun branch from 255f1e5 to 1318420 Compare July 18, 2023 18:35
@tekton-robot tekton-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 18, 2023
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/affinity_assistant.go 94.4% 96.8% 2.4
pkg/reconciler/pipelinerun/pipelinerun.go 91.8% 92.4% 0.5
pkg/reconciler/taskrun/taskrun.go 84.8% 85.2% 0.4

@QuanZhang-William
Copy link
Member Author

Thanks @QuanZhang-William! One more note; I think the existing release note might be hard for users to understand, and they might not know why it might be a good idea to enable this new feature. Can you put the release notes in terms of the functionality the user is interested in, i.e. scheduling pods to the right nodes?

No problem, Lee. I have updated the release not

Also, are all the docs for this feature up to date? I wouldn't want to have a release note saying the feature is ready but have our docs say it's not functional yet.

We have this open PR #6892 for the documentation. I will remove all the WIP and Warnings in the documentation PR after merging this one.

t.Errorf("expected err type mismatching, expecting %v but got: %v", ErrAffinityAssistantCreationFailed, err)
}
}
if d := cmp.Diff(tc.expectedErr.Error(), err.Error()); d != "" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can do this with cmpopts.EquateErrors I think? This will check that the error is the right type which is more important than validating the exact error message

@@ -159,12 +161,16 @@ func TestCreateAndDeleteOfAffinityAssistantPerPipelineRun(t *testing.T) {
}, {
name: "other Workspace type",
pr: testPRWithEmptyDir,
expectStatefulSetSpec: nil,
expectStatefulSetSpec: &appsv1.StatefulSetSpec{},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused, why would we want to validate that a statefulset is created with an empty spec? how is that different from a nil pointer?

@tekton-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: lbernick

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tekton-robot tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 21, 2023
@QuanZhang-William QuanZhang-William force-pushed the tep-0135-e2e-pipelinerun branch from e0d1168 to 53e71bf Compare July 21, 2023 22:12
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/affinity_assistant.go 94.4% 96.8% 2.4
pkg/reconciler/pipelinerun/pipelinerun.go 91.8% 92.4% 0.5
pkg/reconciler/taskrun/taskrun.go 84.8% 85.2% 0.4

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/affinity_assistant.go 94.4% 96.8% 2.4
pkg/reconciler/pipelinerun/pipelinerun.go 91.8% 92.4% 0.5
pkg/reconciler/taskrun/taskrun.go 84.8% 85.2% 0.4

@QuanZhang-William QuanZhang-William changed the title [TEP-0135] Coschedule per PipelineRun e2e support [TEP-0135] Coschedule per (Isolated) PipelineRun e2e support Jul 24, 2023
@Yongxuanzhang
Copy link
Member

/assign

@QuanZhang-William
Copy link
Member Author

/hold until v0.50 LTS is released

@tekton-robot tekton-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 24, 2023
@QuanZhang-William QuanZhang-William force-pushed the tep-0135-e2e-pipelinerun branch from 53e71bf to c8a8c7f Compare July 24, 2023 15:59
@QuanZhang-William
Copy link
Member Author

/assign @JeromeJu

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/affinity_assistant.go 94.4% 96.8% 2.4
pkg/reconciler/pipelinerun/pipelinerun.go 91.8% 92.4% 0.5
pkg/reconciler/taskrun/taskrun.go 84.8% 85.2% 0.4

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/affinity_assistant.go 94.4% 96.8% 2.4
pkg/reconciler/pipelinerun/pipelinerun.go 91.8% 92.4% 0.5
pkg/reconciler/taskrun/taskrun.go 84.8% 85.2% 0.4

Part of [tektoncd#6740]. [TEP-0135][tep-0135] introduces a feature that allows a cluster operator to ensure
that all of a PipelineRun's pods are scheduled to the same node.

This commit consumes the functions added in [tektoncd#6819] to implement end to end support of `Coschedule:PipelineRuns` where all the `PipelineRun pods` are scheduled to the same node,
and the `Coschedule:isolate-pipelinerun` coschedule modes where only 1 PipelineRun is allowed to run in a node at the same time.

/kind feature

[tektoncd#6819]: tektoncd#6819
[tektoncd#6740]: tektoncd#6740
[tep-0135]: https://github.com/tektoncd/community/blob/main/teps/0135-coscheduling-pipelinerun-pods.md
@QuanZhang-William QuanZhang-William force-pushed the tep-0135-e2e-pipelinerun branch from c8a8c7f to 224cba3 Compare July 26, 2023 15:20
@QuanZhang-William
Copy link
Member Author

/hold cancel
v0.50 LTS is released

@tekton-robot tekton-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 26, 2023
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/affinity_assistant.go 96.0% 96.9% 0.9
pkg/reconciler/pipelinerun/pipelinerun.go 91.7% 92.4% 0.7
pkg/reconciler/taskrun/taskrun.go 84.8% 85.2% 0.4

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/affinity_assistant.go 96.0% 96.9% 0.9
pkg/reconciler/pipelinerun/pipelinerun.go 91.7% 92.4% 0.7
pkg/reconciler/taskrun/taskrun.go 84.8% 85.2% 0.4

@JeromeJu
Copy link
Member

/lgtm

@tekton-robot tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Jul 26, 2023
@tekton-robot tekton-robot merged commit 5703d8f into tektoncd:main Jul 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/feature Categorizes issue or PR as related to a new feature. lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants