Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync the pipelinerun status from the informers #2573

Merged
merged 1 commit into from
May 21, 2020

Conversation

afrittoli
Copy link
Member

Changes

When we reconcile a pipelinerun, we should ensure that the
pipelinerun status is always in sync with the actual list of taskruns
that can be provided by the taskrun informer.

The only way to filter taskruns is by labels tekton.dev/pipelinerun.
In case an orphaned taskrun is found, we can use the other labels
on the taskrun to reconstruct the missing entry in the pipelinerun
status.

Fixes #2558

Submitter Checklist

These are the criteria that every PR should meet, please check them off as you
review them:

See the contribution guide for more details.

Double check this list of stuff that's easy to miss:

Reviewer Notes

If API changes are included, additive changes must be approved by at least two OWNERS and backwards incompatible changes must be approved by more than 50% of the OWNERS, and they must first be added in a backwards compatible way.

@tekton-robot tekton-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 7, 2020
@tekton-robot tekton-robot requested review from bobcatfish and dibyom May 7, 2020 14:48
@tekton-robot tekton-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label May 7, 2020
@afrittoli
Copy link
Member Author

@n3wscott @mattmoor any feedback on this would be really welcome :)

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/pipelinerun.go 72.4% 72.3% -0.1

@afrittoli afrittoli added the kind/bug Categorizes issue or PR as related to a bug. label May 8, 2020
@afrittoli afrittoli force-pushed the issues/2558 branch 2 times, most recently from 66628f8 to 55216e6 Compare May 11, 2020 20:55
@afrittoli afrittoli changed the title WIP Sync the pipelinerun status from the informers Sync the pipelinerun status from the informers May 11, 2020
@tekton-robot tekton-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 11, 2020
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/pipelinerun.go 72.0% 73.5% 1.4

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/pipelinerun.go 72.0% 73.5% 1.4

@afrittoli
Copy link
Member Author

/test pull-tekton-pipeline-integration-tests

Copy link
Member

@vdemeester vdemeester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me
/meow

pkg/reconciler/pipelinerun/pipelinerun_test.go Outdated Show resolved Hide resolved
@tekton-robot
Copy link
Collaborator

@vdemeester: cat image

In response to this:

Looks good to me
/meow

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tekton-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: vdemeester

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tekton-robot tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 12, 2020
@afrittoli afrittoli force-pushed the issues/2558 branch 3 times, most recently from 32df491 to fd37ed4 Compare May 12, 2020 16:10
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/pipelinerun.go 73.3% 74.6% 1.4

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/pipelinerun.go 73.3% 74.6% 1.4

@afrittoli
Copy link
Member Author

>> Waiting for tests to finish for pipelinerun
ERROR: tests timed out

@vdemeester perhaps we are at the time limit with alpha + beta + the new test? Shall I increase the timeout a bit?

@afrittoli
Copy link
Member Author

/test pull-tekton-pipeline-integration-tests

@afrittoli
Copy link
Member Author

I wonder if the new pool could be related to the timeout @bobcatfish?
I couldn't find a way to see if there is any variation in test duration... or even to see for sure how the last test run took :(

@vdemeester
Copy link
Member

/test pull-tekton-pipeline-integration-tests

@afrittoli afrittoli added kind/bug Categorizes issue or PR as related to a bug. and removed kind/bug Categorizes issue or PR as related to a bug. labels May 14, 2020
@tekton-robot
Copy link
Collaborator

This PR cannot be merged: expecting exactly one kind/ label

Available kind/ labels are:

kind/bug: Categorizes issue or PR as related to a bug.
kind/flake: Categorizes issue or PR as related to a flakey test
kind/cleanup: Categorizes issue or PR as related to cleaning up code, process, or technical debt.
kind/design: Categorizes issue or PR as related to design.
kind/documentation: Categorizes issue or PR as related to documentation.
kind/feature: Categorizes issue or PR as related to a new feature.
kind/misc: Categorizes issue or PR as a miscellaneuous one.

@afrittoli
Copy link
Member Author

/test pull-tekton-pipeline-integration-tests

@afrittoli
Copy link
Member Author

The integration tests are failing way too often on this PR, perhaps it's better to hold this one until I get a better understanding of the reason for that.
/hold

@tekton-robot tekton-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 14, 2020
}
}
// Then loop by pipelinetask name over all the TaskRuns associated to Conditions
for pipelineTaskName, actualConditionTaskRuns := range conditionTaskRuns {
Copy link
Member

@pritidesai pritidesai May 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@afrittoli pr.Status.TaskRuns holds an entry for the pipelineTask associated to a condition (could be more than one condition as well) with nil TaskRunStatus and one or more checks under ConditionChecks as in pr.Status.TaskRuns[PipelineTaskRunName].ConditionChecks.

Condition containers are always under ConditionChecks and not part of the map pr.Status.TaskRuns.

For example, for a pipelineTask with two conditions, conditionTaskRuns contains two containers, one for each condition and after these condition containers are created, this query returns them as valid taskRuns associated with the current pipeline:
c.taskRunLister.TaskRuns(pr.Namespace).List(labels.SelectorFromSet(pipelineRunLabels))

so in the looping over those tasks, as they are conditionalChecks, line 974 is not executed and assumes the pipelineTask was not found in line 987 and creates a new taskRun name.

Hope its making sense 😉

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for looking into this!

L974 is designed explicitly to only track TaskRun not associated to conditions, but I agree the logic is broken there. It works fine in case of orphaned conditions, but it doesn't in case of non-orphaned ones.

If the conditions are created and still running, there will be TaskRuns for them in the list, they can be identified as condition ones from the labels. The main TaskRun does not exists yet, however its name, under normal conditions, will already be in the PipelineRun status so that it may contain the status of the conditions:

taskrun-name (not yet created)
    PipelineTaskName: pipelineTaskName,
    Status:                    nil,
    ConditionChecks:  [array of condition checks],

Since my loops are based on what I get from the TaskRun list, I do not discover the existing TaskRunName with no real TaskRun under which conditions are hosted. I create a new one, and I end up having two TaskRuns with the same conditions - I think.

I will work on fixing this. I now separated my function in two parts, one which doesn't need the controller and takes the list of taskruns as input, so that it should be easier to write extra tests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed the issue now by initializing taskRunByPipelineTask from the known PR status, and adding to it as orphaned TaskRuns are recovered.
I added much more test coverage, hopefully it will work ok this time.

@tekton-robot tekton-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 15, 2020
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/pipelinerun.go 73.5% 75.3% 1.8

@afrittoli
Copy link
Member Author

/hold cancel

@tekton-robot tekton-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 15, 2020
When we reconcile a pipelinerun, we should ensure that the
pipelinerun status is always in sync with the actual list of taskruns
that can be provided by the taskrun informer.

The only way to filter taskruns is by labels tekton.dev/pipelinerun.
In case an orphaned taskrun is found, we can use the other labels
on the taskrun to reconstruct the missing entry in the pipelinerun
status, whether it's a missing taskrun or a missing condition check.
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/pipelinerun.go 73.5% 75.3% 1.8

@afrittoli
Copy link
Member Author

/kind feature

@tekton-robot tekton-robot added the kind/feature Categorizes issue or PR as related to a new feature. label May 15, 2020
@tekton-robot
Copy link
Collaborator

This PR cannot be merged: expecting exactly one kind/ label

Available kind/ labels are:

kind/bug: Categorizes issue or PR as related to a bug.
kind/flake: Categorizes issue or PR as related to a flakey test
kind/cleanup: Categorizes issue or PR as related to cleaning up code, process, or technical debt.
kind/design: Categorizes issue or PR as related to design.
kind/documentation: Categorizes issue or PR as related to documentation.
kind/feature: Categorizes issue or PR as related to a new feature.
kind/misc: Categorizes issue or PR as a miscellaneuous one.

@afrittoli afrittoli removed the kind/bug Categorizes issue or PR as related to a bug. label May 15, 2020
@afrittoli
Copy link
Member Author

/test pull-tekton-pipeline-integration-tests

@afrittoli
Copy link
Member Author

@pritidesai @bobcatfish this should be ready now - CI is green and test coverage much improved now.

@pritidesai
Copy link
Member

sorry @afrittoli havent got chance to review it, its in my todo list for tomorrow :)

@pritidesai
Copy link
Member

thanks @afrittoli
/lgtm

@tekton-robot tekton-robot added the lgtm Indicates that a PR is ready to be merged. label May 20, 2020
@tekton-robot tekton-robot merged commit 1805671 into tektoncd:master May 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/feature Categorizes issue or PR as related to a new feature. lgtm Indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TaskRuns may be orphaned by a PipelineRun
4 participants