Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove Default Linear Pipeline Task Execution. #178

Closed
wants to merge 2 commits into from

Conversation

tejal29
Copy link
Contributor

@tejal29 tejal29 commented Oct 23, 2018

Fixes #168
Previously, pipelinerun.Reconciler would executed tasks in a pipeline linearly.
The pipelinerun.resources.GetNextTask would linearly go through all tasks Pipeline.Spec.Tasks and
create a TaskRun only if the previous task in order is complete.
In this change, for each run (once per 30 seconds), pipeline.Reconciler will create TaskRun for the
first task in the Pipeline.Spec.Tasks whose inputs are ready.
Inputs for a given task in the Pipeline.Spec.Tasks are ready when

  • they do not depend on any other task and they passedConstraints specified on the input source is empty.
  • the task or tasks specified in passedConstraints for the input source have completed successfully.

Along with that, this chnges refactors the pipelinerun.Reconciler.reconcile method.
It first reconciles the pipeline run status which can be used to quickly determine if the pipeline has
already completed or failed. There by we save some processing time.

@knative-prow-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: tejal29

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@knative-prow-robot knative-prow-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Oct 23, 2018
@tejal29 tejal29 force-pushed the rm_linear_pipeline_exec branch from 7bd76cb to 284b84b Compare October 23, 2018 06:54
Copy link
Member

@nader-ziada nader-ziada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, but there is a unit test failing

}
} else if canTaskRun(prtr.PipelineTask, state) {
logger.Infof("TaskRun %s should be started for PipelineRun %s", prtr.TaskRunName, prName)
if prtr.TaskRun == nil && canTaskRun(prtr.PipelineTask, state) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unit test failure indicates that we are missing the cases where TaskRun is not nil, but hasn't run yet or failed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will look in to this. I haven't looked at the test failures yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think the reason why this test was failing is because, i applied the follwing diff

@@ -61,7 +46,7 @@ func canTaskRun(pt *v1alpha1.PipelineTask, state []*PipelineRunTaskRun) bool {
                        for _, constrainingTaskName := range input.PassedConstraints {
                                for _, prtr := range state {
                                        // the constraining task must have a successful task run to allow this task to run
-                                       if prtr.Task.Name == constrainingTaskName {
+                                       if prtr.PipelineTask.Name == constrainingTaskName {
                                                if prtr.TaskRun == nil {

I changed the constrainingTaskName to match the PipelineTask.Name instead of prtr.Task.Name.
The reason being, you can have multiple tasks in your pipeline which point to same task name. The PipelineTask.Name is the unique identifier.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i also changed this in the integration test for helm-deploy

@bobcatfish
Copy link
Collaborator

In this change, for each run (once per 30 seconds), pipeline.Reconciler will create TaskRun for the
first task in the Pipeline.Spec.Tasks whose inputs are ready.

Since this is only every 30 seconds, we should probably start all TaskRuns that are ready simultaneously? (Since this is Go this should be pretty easy!) It would mean changing the test cases pretty significantly from what we've currently got.

Also I highly recommend changing the data structure - for example if the pipeline state object were itself a graph, the logic to interact with it could probably be super simple!

(Or maybe you're planning to tackle this incrementally and we'd do these changes next?)

@bobcatfish
Copy link
Collaborator

Also I highly recommend changing the data structure - for example if the pipeline state object were itself a graph, the logic to interact with it could probably be super simple!

Elaborating on this a bit more, what is currently happening is:

  • make a list of tasks, in the order they are declared in the pipeline
  • iterate over them, find the first one that can run (via canruntask)

A more efficient and i think easier to understand and test implementation would be to:

  • create a graph of tasks, by looking at their dependencies
  • traverse the graph when looking for what to run next

@bobcatfish
Copy link
Collaborator

p.s. we might want to hold off on merging this until after @dlorenc 's demo since this will completely change pipeline behaviour (unless he's cool with it)

@nader-ziada
Copy link
Member

+1 for the graph idea, will be way more efficient when pipelines get bigger

@nader-ziada
Copy link
Member

As mentioned in the issue #178, I think it is a good idea to keep the linear execution as an option on a pipeline, but add a flag for that on the pipeline type.

@bobcatfish
Copy link
Collaborator

bobcatfish commented Oct 23, 2018

moved my comment to issue #168

@tejal29
Copy link
Contributor Author

tejal29 commented Oct 23, 2018

@bobcatfish, @pivotal-nader-ziada
My only objection to using another object/state to store a pipeline graph is the memory footprint.
Computation is cheap especially in our case and hence i went with the stateless pipeline controller.
(Meaning, build and then traverse the graph again and see what is next task to run)
We also do not want to rely on cluster for stateful pipeline controller. Its very easy to edit any objects in a cluster.
Designing a Stateful Pipeline Controller should be a tackled as a different task and we should look spend some time on discussing

  • pros and cons?
  • what state to store?
  • how to represent it? etc..

WDYT?

We already have a stateless pipeline Executor. This PR fixes the default linear execution to what should be the right execution order.

@tejal29 tejal29 changed the title Remove Linear Pipeline Task Execution. Remove Default Linear Pipeline Task Execution. Oct 23, 2018
@tejal29
Copy link
Contributor Author

tejal29 commented Oct 23, 2018

/retest

Previously, `pipelinerun.Reconciler` would executed tasks in a pipeline linearly.
The `pipelinerun.resources.GetNextTask` would linearly go through all tasks `Pipeline.Spec.Tasks` and
create a `TaskRun` only if the previous task in order is complete.
In this change, for each run (once per 30 seconds),  `pipeline.Reconciler` will create `TaskRun` for the
first task in the `Pipeline.Spec.Tasks` whose inputs are ready.
Inputs for a given task in the `Pipeline.Spec.Tasks` are ready when
 - they do not depend on any other task and the `passedConstraints` specified on the input source is empty.
 - the task or tasks specified in `passedConstraints` for the input source have completed successfully.

Along with that, this change refactors the `pipelinerun.Reconciler.reconcile` method.
It first reconciles the pipeline run status which can be used to quickly determine if the pipeline has
already completed or failed. This saves us some processing time by not having to go through the `pipeline.Spec.Tasks`.
The `passedConstraints` points to the `Pipelines.Spec.PipelineTask.Name`
instead of `Pipeline.Spec.PipelineTask.TaskRef.Name`. Used duckv1alpha1
helper functions `IsUnknown` to check if pipeline is still running.
@tejal29 tejal29 force-pushed the rm_linear_pipeline_exec branch from 8137a58 to 6da6fcc Compare October 23, 2018 21:01
@bobcatfish
Copy link
Collaborator

Designing a Stateful Pipeline Controller should be a tackled as a different task and we should look spend some time on discussing

I didn't mean to suggest that we would store any state, what I'm suggesting is that we construct the graph on each run of the reconcile loop.

On reconcile, the first step would be to look at the Tasks in the Pipeline, look at their requirements, and construct a graph based on that. We're going to need this logic anyway to validate graphs (probably in the webhook?) and certainly to visualize the graph later on.

I think that if we construct the graph first, it will make all of the subsequent logic simpler (easier to write, easier to test, easier to reason about).

If you don't want to do that in this PR I'm okay with iterating on it but I feel strongly this is the direction we should go.

Since this is only every 30 seconds, we should probably start all TaskRuns that are ready simultaneously? (Since this is Go this should be pretty easy!)

If it's possible to make this change in this PR I think we should though.

@tejal29
Copy link
Contributor Author

tejal29 commented Oct 24, 2018

Designing a Stateful Pipeline Controller should be a tackled as a different task and we should look spend some time on discussing

I didn't mean to suggest that we would store any state, what I'm suggesting is that we construct the
graph on each run of the reconcile loop.

On reconcile, the first step would be to look at the Tasks in the Pipeline, look at their requirements, and > construct a graph based on that. We're going to need this logic anyway to validate graphs (probably in > the webhook?) and certainly to visualize the graph later on.

I think that if we construct the graph first, it will make all of the subsequent logic simpler (easier to write, easier to test, easier to reason about).

Good point. Just wanted to note, we will end up doing some extra work if we do construct a graph every-time.
Right now we are using a greedy approach. We do not need a complete graph to be build for knowing what task should be kicked off next.

Since this is only every 30 seconds, we should probably start all TaskRuns that are ready simultaneously? (Since this is Go this should be pretty easy!)

If it's possible to make this change in this PR I think we should though.

Sure, i can make this kick off multiple tasks. For that, graph does makes more sense since you end up traversing the whole pipeline.

@bobcatfish
Copy link
Collaborator

Right now we are using a greedy approach. We do not need a complete graph to be build for knowing what task should be kicked off next.

The downside to the current implementation is that the order the tasked are traversed in has no (guaranteed) relation to the order they will execute in, so I'm not sure it's any more efficient.

Just wanted to note, we will end up doing some extra work if we do construct a graph every-time.

I think with the right data structures we could do it in one pass over the Tasks 🤞 so I think we can avoid too much extra work :D (or even any extra work? maybe a bit of memory?)

@tejal29
Copy link
Contributor Author

tejal29 commented Oct 25, 2018

closing this in favour of #187

@tejal29 tejal29 closed this Oct 25, 2018
pradeepitm12 pushed a commit to pradeepitm12/pipeline that referenced this pull request Jan 28, 2021
In tektoncd#178 we decided to support JSONPath syntax for
expressions in TriggerBindings. This commit adds
functions to support JSONPath using the
`k8s.io/client-go/util/jsonpath` library.

There are a few deviations from the default behavior of the library:

1. The JSONPath expressions have to be wrapped in the Tekton variable
interpolation syntax i.e `$()` : `$(body)`.

2. We  support the `RelaxedJSONPathExpresion` syntax used in
`kubectl get -o custom-columns`. This means that the curly braces `{}`
and the leading dot `.` can be omitted i.e we support both `$(body.key)`
as well as `$({.body.key})

3. We return valid JSON values when the value is a JSON array or
map. By default, the library returns a string containing an internal
go representation. So, if the JSONPath expression selects `{"a": "b"}`,
the library will return `map[a: b]` while we return the valid JSON
string i.e `{"a":"b"}`

In order to support 3, we have to copy a couple of unexported functions
from the library (printResults, and evalToText).

Signed-off-by: Dibyo Mukherjee <[email protected]>
pradeepitm12 pushed a commit to pradeepitm12/pipeline that referenced this pull request Jan 28, 2021
This commit switches the expression syntax in
TriggerBindings from GJson to JSONPath.

Fixes tektoncd#178

Signed-off-by: Dibyo Mukherjee <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants