TEP-0050 - ignore task failures #342

pritidesai · 2021-02-05T19:20:10Z

TEP-0040 is proposing a TEP to ignore step errors. We have identified a need for two separate proposals during the multiple rounds of discussion in TEP-0040:

(1) ignore errors at the step level (TEP-0040)
(2) ignore failures at the task level

Proposing a TEP for (2) to ignore failures at the task level and allow pipeline authors to unblock execution after a single task failure.

bobcatfish

/assign

bobcatfish · 2021-02-05T20:31:56Z

teps/0050-ignore-task-failures.md

+
+* As a pipeline author, I would like to design a pipeline where a task running
+  [unit tests](https://github.com/tektoncd/catalog/tree/master/task/golang-test/0.1) might fail,
+  but can continue running integration tests, so that my pipeline can identify failures in both the tests.


is it possible to explain a bit more here? one way this could be addressed today is to run the unit tests and integration tests in parallel; that way the unit tests can fail but the integration tests will still complete

@pritidesai i was thinking about this more and i have a couple ideas of how we can tweak the use cases so that running the tests in parallel won't solve the problem (maybe this is what you already had in mind)

A "warning only" use case - e.g. a pipeline wants to run some tests, and in parallel it wants to run something option, e.g. maybe some really intense linting, and the pipeline doesn't need to fail if the linting fails

| | v v linting unit tests | v integration tests

There's another very similar use case, where you have the same pipeline as above, but you DO want linting to fail the pipeline, you just don't want it to interrupt the unit tests -> integration tests branch <-- for that case i wonder if pipelines in pipelines is more appropriate? not sure!

thanks @bobcatfish for looking into it 😻

Yes, pipeline in pipeline can be an appropriate solution to not interrupt the combo of unit tests -> integration tests. I will add both the points (running in parallel and pipeline in pipeline) in the TEP.

But, I was thinking of a use case in which I don't want linting failure or unit test failure to interrupt deploying an application to a staging cluster so that the new features in the app can be shared for early feedback.

| | | v v v linting unit tests integration tests | v deploying to staging cluster

One more use case I was thinking of having one single GitHub repo with multiple runtimes, and building/deploying an image for each runtime, and I don't want failure in one runtime interrupt the other runtime. But we can argue that this use case can fit well with pipeline in pipeline approach.

There's another very similar use case, where you have the same pipeline as above, but you DO want linting to fail the pipeline, you just don't want it to interrupt the unit tests -> integration tests branch <-- for that case i wonder if pipelines in pipelines is more appropriate? not sure!

@bobcatfish @pritidesai could this be handled at the pipeline level too —aka some configuration of the pipeline that says "don't interup other branches" execution ?
I see two different problem we are trying to tackle here:

Do not fail the pipeline if a task fail (and continue the execution, …)

Do fail the pipeline if a task fail but do not interrupt other branches (or even the current one, also this would seem weird to me)

Do we want to tackle those at the same time, in the TEP ? I also tend to agree that something might be better handle by pipeline in pipeline approach.

Do fail the pipeline if a task fail but do not interrupt other branches (or even the current one, also this would seem weird to me)

I agree - I think we're trying to tackle (1) and could leave (2) as separate (maybe pipelines in pipelines or something there is the best solution for (2))

thanks @bobcatfish @vdemeester
yup sure, I just updated the TEP to include only (1)

bobcatfish · 2021-02-05T20:32:36Z

teps/0050-ignore-task-failures.md

+  but can continue running tests, so that my pipeline can report failures from the linting and all the tests.
+
+* As a new Tekton user, I want to migrate existing workflows from the other CI/CD systems that allowed a
+  similar task unit of failure.


is it possible to give some examples? i think a lot of systems that folks are migrating from dont have the same parallelism that Pipelines support, so I'm wondering if that might be the answer to most of these cases

Jenkins (a Jenkinsfile) would be an example 😛

Maybe we can include an example of a Jenkinsfile (with a failure allowed) that we'd be migrating? I'm struggling with trying to understand if one of the main motivations around the migration use case is that many CD systems do not allow for parallel execution, so it'd be great to see a counter example

sure, I can add an example, there are other tools as well, Travis, Argo Workflow, etc

pritidesai · 2021-02-08T16:59:01Z

/kind tep

jerop · 2021-02-10T17:08:09Z

/assign

vdemeester

As commented, I am a bit fuzzy if we are tackling two use cases at the same time or not, and if we should.

vdemeester · 2021-02-11T10:29:16Z

teps/0050-ignore-task-failures.md

+
+* As a pipeline author, I would like to design a pipeline where a task running
+  [unit tests](https://github.com/tektoncd/catalog/tree/master/task/golang-test/0.1) might fail,
+  but can continue running integration tests, so that my pipeline can identify failures in both the tests.


There's another very similar use case, where you have the same pipeline as above, but you DO want linting to fail the pipeline, you just don't want it to interrupt the unit tests -> integration tests branch <-- for that case i wonder if pipelines in pipelines is more appropriate? not sure!

@bobcatfish @pritidesai could this be handled at the pipeline level too —aka some configuration of the pipeline that says "don't interup other branches" execution ?
I see two different problem we are trying to tackle here:

Do not fail the pipeline if a task fail (and continue the execution, …)

Do fail the pipeline if a task fail but do not interrupt other branches (or even the current one, also this would seem weird to me)

Do we want to tackle those at the same time, in the TEP ? I also tend to agree that something might be better handle by pipeline in pipeline approach.

vdemeester · 2021-02-11T10:29:45Z

teps/0050-ignore-task-failures.md

+  but can continue running tests, so that my pipeline can report failures from the linting and all the tests.
+
+* As a new Tekton user, I want to migrate existing workflows from the other CI/CD systems that allowed a
+  similar task unit of failure.


Jenkins (a Jenkinsfile) would be an example 😛

bobcatfish · 2021-02-18T21:58:56Z

From my end, the only thing missing to merge the problem statement is to be clear on which of the 2 cases (or both) that @vdemeester pointed out that we want to tackle and then i think we'll be gtg!

afrittoli · 2021-02-22T17:11:46Z

/assign @vdemeester

bobcatfish

Looks good to me!

/approve

just waiting for @vdemeester to approve (and maybe @jerop since you're assigned?)

/hold

bobcatfish · 2021-02-23T22:05:19Z

teps/0050-ignore-task-failures.md

+        }
+    }
+  }
+  ```


nice!! thanks this example really helps

tekton-robot · 2021-02-23T22:07:04Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bobcatfish

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~teps/OWNERS~~ [bobcatfish]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Adding a tep to ignore task failures, allowing pipeline authors to unblock execution after a single failure

jerop · 2021-02-25T21:52:48Z

teps/0050-ignore-task-failures.md

+  but the pipeline can continue running integration tests and deploying an application to a staging cluster, so that the
+  application can be shared with other developers for early feedback.
+
+  For example, do not fail the pipeline if `unit tests` fail. Continue deploying to a staging cluster if integration tests


@pritidesai does do not fail the pipeline mean that not only does execution continue but also the pipelinerun status should not be failure?

or is it just about the execution flow and it doesn't affect the overall status?

Hey @jerop the high level idea here is to ignore/allow pipelineTask failure, continue execution and do not fail the pipelineRun.

@pritidesai does do not fail the pipeline mean that not only does execution continue but also the pipelinerun status should not be failure?

Yes, pipelineRun status should not be failure. I will cover this in the design proposal with potential alternatives. We could set the overall pipelineRun status to success, completed, success with warning, or something more relevant.

jerop · 2021-02-25T22:06:10Z

teps/0050-ignore-task-failures.md

+  In this example, `linting` and `unit tests` are executed in parallel. This specific use case can be
+  supported by `pipeline in pipeline` approach by creating two separate pipelines,
+  (1) `linting` -> `report linter output` and (2) `unit tests` -> `integration tests`.
+  `pipeline in pipeline` approach would be better fit when you want `linting` to fail the pipeline.


is the pipeline here referring to the first sub-pipeline or the whole pipeline?

if pipelines in pipelines approach is used, it might be better for report linter output to be a finally task which would then report the linting output and the tests sub-pipeline would execute independently in parallel so linting failure won't affect it

what do you think?

is the pipeline here referring to the first sub-pipeline or the whole pipeline?

Its referring to the sub pipeline and in turn the whole pipeline since the sub pipeline status would be reflected in the overall pipelineRun status.

what do you think?

Yup, having report linter output under finally won't be affected in terms of the execution by the linting failure but that sub pipeline would still result in failure.

Here the idea is to treat linting as a warning. linting is part of the pipeline but its failure has no impact on the execution of the rest of the pipeline or the overall status.

Edit: And, if we combine two features pipeline in pipeline and ignore/allow failure of the linter sub-pipeline (when defined as a pipelineTask) would not impact overall status as well.

vdemeester

One question, how does this relates to TEP-0007 ?

Design a task failure strategy so that the pipeline author can control the behavior of the underlying tasks
and decide whether to continue executing the rest of the graph in the event of failure.

This goal seems to be aligned with TEP-0007. I guess the only difference is the final status of the Pipeline (succeeded in this TEP, still fail in 0007).

If we go with this approach, is the other TEP required ? and vice-versa ?

jerop · 2021-03-03T17:14:55Z

One question, how does this relates to TEP-0007 ?

Design a task failure strategy so that the pipeline author can control the behavior of the underlying tasks
and decide whether to continue executing the rest of the graph in the event of failure.

This goal seems to be aligned with TEP-0007. I guess the only difference is the final status of the Pipeline (succeeded in this TEP, still fail in 0007).

If we go with this approach, is the other TEP required ? and vice-versa ?

yes yes, the two are aligned and that's why I'd put #258 on hold to explore failure strategies more broadly beyond allowing skipping a single task, which led to creating Pipelines in Pipelines to solve for some of the failure scenarios

if you define failures as:

not successful, then skip is a failure and the two discuss the same issue
got an error, then another difference is TEP-0007: Conditions Beta Skipping - When Scope #258 is about ignoring task skips while this one is about ignoring task failures

which one did you intend to address here @pritidesai?

regardless, these issues should/will be solved together and there are some thoughts about the overlap in this doc I shared with @pritidesai

pritidesai · 2021-03-03T18:48:15Z

Thanks @vdemeester and @jerop

One question, how does this relates to TEP-0007 ?

Design a task failure strategy so that the pipeline author can control the behavior of the underlying tasks
and decide whether to continue executing the rest of the graph in the event of failure.

This goal seems to be aligned with TEP-0007. I guess the only difference is the final status of the Pipeline (succeeded in this TEP, still fail in 0007).

TEP-0007 is limited to a skipped task i.e. a task is skipped and not executed. The main goal of this TEP is to ignore failure of a task i.e. a task is executed and declared failure after exhausting all the retries, such task would be considered as a success, skipped tasks remain skipped.

If we go with this approach, is the other TEP required ? and vice-versa ?

Yes both the TEPs are required, as both are addressing the flow control based on the different statuses of a pipelineTask. And at the same time, it is possible to combine features from both the TEPs.

In the following example, skip-this-task will be skipped since when expressions evaluates to false and since it is not executed, ignoreFailure will have no effect. The overall status of the pipeline stays completed.

tasks:
  - name: skip-this-task
    when:
      - input: "foo"
        operator: in
        values: ["bar"]
    ignoreFailure: true

Now, in this example, ignore-failure-of-this-task is executed and exited with failure, ignoreFailure will declare this task as a success and run-after-ignore-failure-of-this-task is executed. The overall status of the pipeline will be success.

tasks:
  - name: ignore-failure-of-this-task
    when:
      - input: "foo"
        operator: in
        values: ["foo"]
    ignoreFailure: true
  - name: run-after-ignore-failure-of-this-task
    runAfter: [ "ignore-failure-of-this-task" ]

Now, with TEP-0007, skip-this-task is skipped because of the when expressions evaluating to false(and ignoreFailure has no impact) but continueAfterSkip allows execution of the run-after-skip-this-task.

tasks:
  - name: skip-this-task
    when:
      - input: "foo"
        operator: in
        values: ["bar"]
    continueAfterSkip: true
    ignoreFailure: true
  - name: run-after-skip-this-task
    runAfter: [ "skip-this-task" ]

pritidesai · 2021-03-03T18:50:13Z

got an error, then another difference is #258 is about ignoring task skips while this one is about ignoring task failures

which one did you intend to address here @pritidesai?

thanks @jerop, this TEP is focused on ignoring task failures after getting an error.

vdemeester · 2021-03-04T09:32:12Z

/lgtm

pritidesai · 2021-03-04T18:40:12Z

thanks for the reviews @bobcatfish @vdemeester @jerop 🙏

/hold cancel

tekton-robot requested review from nikhil-thomas and a user February 5, 2021 19:20

tekton-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Feb 5, 2021

pritidesai force-pushed the ignore-task-failure branch from b0fb546 to 07372ce Compare February 5, 2021 19:30

bobcatfish reviewed Feb 5, 2021

View reviewed changes

tekton-robot assigned bobcatfish Feb 5, 2021

pritidesai force-pushed the ignore-task-failure branch from 07372ce to 7a384a4 Compare February 6, 2021 00:41

tekton-robot added the kind/tep Categorizes issue or PR as related to a TEP (or needs a TEP). label Feb 8, 2021

tekton-robot assigned jerop Feb 10, 2021

vdemeester reviewed Feb 11, 2021

View reviewed changes

pritidesai force-pushed the ignore-task-failure branch 6 times, most recently from 8f9766f to eaecdea Compare February 19, 2021 22:56

tekton-robot assigned vdemeester Feb 22, 2021

tekton-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 23, 2021

bobcatfish reviewed Feb 23, 2021

View reviewed changes

tekton-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 23, 2021

tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 23, 2021

tep to ignore task failures

ff0c820

Adding a tep to ignore task failures, allowing pipeline authors to unblock execution after a single failure

pritidesai force-pushed the ignore-task-failure branch from eaecdea to ff0c820 Compare February 23, 2021 23:07

tekton-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 23, 2021

jerop reviewed Feb 25, 2021

View reviewed changes

vdemeester reviewed Mar 3, 2021

View reviewed changes

vdemeester mentioned this pull request Mar 3, 2021

Add TEP-0044: Decouple Task Composition from Scheduling ➰ #316

Merged

tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Mar 4, 2021

tekton-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 4, 2021

tekton-robot merged commit 050ea47 into tektoncd:main Mar 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TEP-0050 - ignore task failures #342

TEP-0050 - ignore task failures #342

pritidesai commented Feb 5, 2021

bobcatfish left a comment

bobcatfish Feb 5, 2021

bobcatfish Feb 8, 2021

pritidesai Feb 9, 2021 •

edited

Loading

vdemeester Feb 11, 2021

bobcatfish Feb 18, 2021

pritidesai Feb 18, 2021

bobcatfish Feb 5, 2021

vdemeester Feb 11, 2021

bobcatfish Feb 18, 2021

pritidesai Feb 18, 2021

pritidesai commented Feb 8, 2021

jerop commented Feb 10, 2021

vdemeester left a comment

vdemeester Feb 11, 2021

vdemeester Feb 11, 2021

bobcatfish commented Feb 18, 2021

afrittoli commented Feb 22, 2021

bobcatfish left a comment

bobcatfish Feb 23, 2021

tekton-robot commented Feb 23, 2021

jerop Feb 25, 2021

jerop Feb 25, 2021

pritidesai Feb 26, 2021

jerop Feb 25, 2021

pritidesai Feb 26, 2021 •

edited

Loading

vdemeester left a comment

jerop commented Mar 3, 2021

pritidesai commented Mar 3, 2021

pritidesai commented Mar 3, 2021

vdemeester commented Mar 4, 2021

pritidesai commented Mar 4, 2021

TEP-0050 - ignore task failures #342

TEP-0050 - ignore task failures #342

Conversation

pritidesai commented Feb 5, 2021

bobcatfish left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pritidesai Feb 9, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pritidesai commented Feb 8, 2021

jerop commented Feb 10, 2021

vdemeester left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bobcatfish commented Feb 18, 2021

afrittoli commented Feb 22, 2021

bobcatfish left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tekton-robot commented Feb 23, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pritidesai Feb 26, 2021 • edited Loading

Choose a reason for hiding this comment

vdemeester left a comment

Choose a reason for hiding this comment

jerop commented Mar 3, 2021

pritidesai commented Mar 3, 2021

pritidesai commented Mar 3, 2021

vdemeester commented Mar 4, 2021

pritidesai commented Mar 4, 2021

pritidesai Feb 9, 2021 •

edited

Loading

pritidesai Feb 26, 2021 •

edited

Loading