-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use dag execution instead of linear one #473
Use dag execution instead of linear one #473
Conversation
15fccdb
to
2ae2366
Compare
test/dag_test.go
Outdated
) | ||
|
||
const ( | ||
// :(((((( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bobcatfish actually it's from your initial test, it should be quick now 😝
@vdemeester do you think it's okay to squish some of the commits? seems some of them can be merged. |
@pivotal-nader-ziada yes I definitely intend to squash them 😉 I just want to clean things a bit and I'll squash them when removing the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple things I think you're already aware of:
- docs 😇
- There are a couple of functions which could probably use their own unit tests:
GetSchedulable
,GetPreivousTasks
😇
Bigger thought:
Do you think Nodes map[string]*Node
is the best data structure to use going forward? We can totally iterate on this (esp. since it's an implementation detail the user will hopefully never notice :D) but I wonder if we could find a data structure that would hold the Pipeline Tasks in sorted order from the very beginning (i think @tejal29 suggested a topological sort?). I also think it could be v cool if we could run the "resolution" phase on this structure, such that very early in the reconciliation, we have a data structure that:
- holds the pipeline tasks in order (such that getting the next ones to run is just some kind of "pop" operation)
- has resolved all of the references in the pipeline tasks
This is something we could do in later PRs tho too :)
@@ -299,20 +310,22 @@ func (c *Reconciler) reconcile(ctx context.Context, pr *v1alpha1.PipelineRun) er | |||
} | |||
|
|||
serviceAccount := pr.Spec.ServiceAccount | |||
rprt := resources.GetNextTask(pr.Name, pipelineState, c.Logger) | |||
rprts := resources.GetNextTasks(pr.Name, d, pipelineState, c.Logger) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you think about the objects pipelinestate
vs. d
- do you think there's any potential to use the same data structure for both, or do you think it's better with them separate?
(i wonder if some of the functionality supported for pipelinestate
would make sense as methods on the dag
itself?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is definitely some duplication between the two, and I feel like they should be "the same" at some point. Didn't want to go that path yet (as I wanted to keep the changeset small) but this is definitely something to lean forward to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kk sounds reasonable! :D
@@ -93,6 +93,58 @@ func (g *DAG) GetPreviousTasks(pt string) []v1alpha1.PipelineTask { | |||
return v.getPrevTasks() | |||
} | |||
|
|||
// GetSchedulable returns a list of PipelineTask that can be scheduled, | |||
// given a list of "done" task. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you eleaborate here on what a 'list of done tasks' is? (i think t
is a list of pipeline task names that have completed - successfully?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, indeed, it will be successful tasks here (only) as if one fails, the pipeline will be marked failed and the taskruns too. I should also rename the variables 😅
Most likely yes — it's definitely not "optimized", but I'm not sure of the level of optimization we need yet 😝 The
Yes, |
4a0bb87
to
5e3a989
Compare
haha kk, sorry for trying to optimize prematurely XD - definitely easier to iterate on this once we get an initial implementation in :) |
"d": {Task: dDependsOnA, Prev: []*Node{{Task: a}}}, | ||
"e": {Task: eDependsOnA, Prev: []*Node{{Task: a}}}, | ||
"f": {Task: fDependsOnDAndE, Prev: []*Node{{Task: dDependsOnA, Prev: []*Node{{Task: a}}}, {Task: eDependsOnA, Prev: []*Node{{Task: a}}}}}, | ||
"g": {Task: gDependOnF, Prev: []*Node{{Task: fDependsOnDAndE, Prev: []*Node{{Task: dDependsOnA, Prev: []*Node{{Task: a}}}, {Task: eDependsOnA, Prev: []*Node{{Task: a}}}}}}}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you think it would be crazy to have some comments that try to depict the graphs in these tests? 😇
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lookin good so far! 🎉
5013509
to
11128bf
Compare
Cleaned a bit and rebased, there is still work to do on the |
11128bf
to
8b126ad
Compare
Okay I still need to:
But the bulk of the changes are here so PTAL :D |
Since we're going to be adding 2 ways to specify the graph (i.e. adding `runAfter` in addition to `from`), updated this test to be agnostic to how the graph was constructed. Also updated the test to be table driven, to try to make it easier to look at each case.
While starting to add `runAfter` functionality, I wanted to update the tests so that the tests for the building are separate from the tests that get the next schedulable Task. While doing that I found what I thought was a bug: if a graph has multiple roots, GetSchedulable would sometimes (if there are no completed tasks in the other root ) only return next tasks from one of the roots. For example: b a | / \ | | x | | / | | y | \ / z w If you pass `x` to GetSchedulable, `b` wont' be returned, even though theoretically this would be ready to run as well. Also, if a Task is done, this implies everything before it is also done, even if it isn't explicitly provided. Eventually (a whole day later 😅) I realized that @vdemeester had optimized by making the assumption that if anything beyond the roots was "done", the roots themselves must already be executing, therefore we were explicitly handling these two cases: 1. Nothing has started, so start all the roots 2. Otherwise look at everything, and if all its previous Tasks have completed, it must be ready to go. I have changed this so that we instead walk the graph from the roots and return the first schedulable nodes we encounter. This is (only slightly) an improvement because: 1. If a root wasn't started successfully (i.e. if the controller itself encounters an error before starting one of the roots), we'll still try to start the next node on the next reconcile 2. If GetSchedulable is called with "done" Tasks that don't make sense (e.g. if we jump straight into the middle of the graph without completing the rest), we can catch that This required adding `Next` pointers as well as `Prev` pointers - which unfortunately meant we can't compare graphs with `cmp.Diff` because it will have a stack overflow 😅
Now users can express graph ordering using either `from` on resources or using `runAfter` if there is no relationship between the resources. Also removed `errors.go` package in this PR b/c: 1. we aren't using this approach anywhere else in our codebase 2. the error would have to be passed up completely as-is to be useful but the layers involved will nest it and so using a an apierror type is not useful in the current approach We should probably come back to this later and use better error typing and nesting.
Signed-off-by: Vincent Demeester <[email protected]>
Recommendation from @shashwathi - it's a bit easier to interact with the schedulable tasks if they are in a map instead of a slice
- Updated example PipelineRuns so that it uses `runAfter` (to make unit tests run first) in addition to using `from` - Updated pipelinerun reconcile tests to also use `runAfter` - Updated Helm end to end test to actually use the image it builds
0cc67bb
to
2f5fde4
Compare
Okay @vdemeester everything is done i think except for the webhook validation! I could even see doing that in another PR 🤷♀️ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yay looks good to me
@bobcatfish agreeing, we can make a follow-up for the validation part 😉
I'll let @shashwathi or @pivotal-nader-ziada take another look (/me doesn't want to LGTM his own PR :joy_cat:)
hm looks like some of our test coverage went down - im going to see what i can do to fix that! |
kk, created #559 for that! |
/meow boxes |
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Added unit tests for `GetNextTasks` by: 1) splitting it into two functions, one of which is now `SuccesfulPipelineTaskNames` on the new `PipelineRunState` type 2) Adding unit tests for `SuccessfulPipelineTaskNames` 3) Adding unit tests for `GetNextTasks` - some of these cases shouldn't happen (e.g. failure should actually halt the Run) but including for completeness (Removed `firstFinishedState` test case because it was the same as `oneFinishedState`).
Okay coverage diffs look a bit more reasonable now, PTAL @shashwathi @pivotal-nader-ziada 🎉 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome pr!
@@ -21,9 +21,19 @@ import ( | |||
"strings" | |||
|
|||
"github.com/knative/build-pipeline/pkg/apis/pipeline/v1alpha1" | |||
errors "github.com/knative/build-pipeline/pkg/errors" | |||
"github.com/knative/build-pipeline/pkg/reconciler/v1alpha1/taskrun/list" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the files in pipelineRun
reconciler should not be using functions from taskRun
reconciler. The list
package should be extracted out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pivotal-nader-ziada hum good point 🤔 I think taskrun/list
package should be extracted in a upper package (as it really doesn't depend on taskrun
).
@bobcatfish agreeing ? (I can take care of it 😉)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense @vdemeester
@@ -522,7 +522,7 @@ func TestAddResourceToBuild(t *testing.T) { | |||
wantErr: false, | |||
want: buildv1alpha1.BuildSpec{ | |||
Steps: []corev1.Container{{ | |||
Name: "create-dir-workspace-mz4c7", | |||
Name: "create-dir-workspace-0-0-mz4c7", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this 0-0
added? container name should not have changed by this feature
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is related to the following item of the commit message
Make sure `create-dir` containers gets a unique name (the same way `copy-` containers do)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but they have a random string appended to them, why do we need the index?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pivotal-nader-ziada ah good point, that was done "before" we had that random string… 😅 We should remove that then 👼
go test -v -count=1 -tags=e2e ./test | ||
go test -v -tags=e2e -count=1 ./test --kubeconfig ~/special/kubeconfig --cluster myspecialcluster | ||
go test -v -count=1 -tags=e2e -timeout=20m ./test | ||
go test -v -count=1 -tags=e2e -timeout=20m ./test --kubeconfig ~/special/kubeconfig --cluster myspecialcluster |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe tests e2e tests already take more then 20m and growing, can we change it to 30
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are all running in parallel so adding more tests should not result in 10m increase of test timing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😓
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
@pivotal-nader-ziada has comments so I will leave up to him to approve the PR
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: shashwathi, vdemeester The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
… code :) Signed-off-by: Vincent Demeester <[email protected]>
A random suffix is generated for the name of the pod, so we don't need to add the index as part of the name to make sure they are unique. Signed-off-by: Vincent Demeester <[email protected]>
/test pull-knative-build-pipeline-go-coverage |
The following is the coverage report on pkg/.
|
looks good, thanks for the changes @vdemeester /lgtm |
Closes #168
This switch
build-pipeline
to use thedag
code for execution instead of the current linear behavior.update
dag
with aGetSchedulable
task which returns the schedulable tasksfix
dag
false-positive cycle:The current DAG implementation was marking some graph as invalid even
though they were. For example, the following DAG is valid.
But the
dag.Build
would say there is a cycle in there. This is fixedby appending the visited node with the node currently "checked", that
way, we can go through the same task twice, but from different path.
use
dag.Build
anddag.GetSchedulable
in the reconcilierIt doesn't implement any maximum number of parallel task. We may want to create an issue for that and do that in a follow-up 👼
This is still a wip:
add webhook admission control validationcreated Add validation on Pipeline creation for DAG #559/cc @bobcatfish @tejal29