Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get simple PipelineRun implementation working #128

Merged

Conversation

bobcatfish
Copy link
Collaborator

Sorry this is a big one, with even more refactoring 😅The net effect is that when we create a PipelineRun, it actually creates and executes the required Task runs! 🎉

I tried to break it up into separate commits, so might be easier to review that way:

  1. Combine logic to grab Tasks and their Runs for a PipelineRun

    It turns out that we need to look at TaskRuns for a few reasons, including 1) figuring out what to run next and 2) determining the status of the PipelineRun, so I've refactored the logic that grabs these to collect a bunch of related state that can be reused. When the graph becomes more sophisticated, we will need to make this structure more than just a list.

  2. Check status of TaskRuns when finding TaskRun to start
    Added logic to check statuses of other TaskRuns when deciding if a new one should be started for Implement simple PipelineRun #61

  3. Add condition status to PipelineRun
    PipelineRun status will be based on the condition of the TaskRuns which it has created, for Implement simple PipelineRun #61. If any TaskRuns have failed, the PipelineRun has failed. If all are successful, it is successful. If any are in progress, it is in progress. This is assuming a linear Pipeline, we will have to tweak this a bit
    when we implement the graph (for Pipeline uses passedConstraints to provide correct inputs and outputs #65)

  4. Create TaskRun from PipelineRun that runs a Task Added the Task reference to the TaskRun so that when a PipelineRun creates a TaskRun, it actually executes! (For Implement simple PipelineRun #61)
    While running the integration test, noticed that the PipelineRuns weren't getting reconciled quickly enough, but adding a tracker which will invoke reconcile when the created TaskRuns are updated fixed this - however it did still take > 1 minute to create 3 helloworld TaskRuns and wait for them to complete, so since 3 was arbitrary, reduced to 2. Also cleaned up the TaskRun controller a bit: using the Logger object on the controller/reconciler itself, made the log messages a bit more descriptive.

Fixes #61

@knative-prow-robot knative-prow-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Oct 11, 2018
@knative-prow-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bobcatfish

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@knative-prow-robot knative-prow-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 11, 2018
Copy link
Contributor

@tejal29 tejal29 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of Code and Lots of Test!!! Yey!!!

@@ -91,6 +95,11 @@ func NewController(
UpdateFunc: controller.PassNew(impl.Enqueue),
DeleteFunc: impl.Enqueue,
})

r.tracker = tracker.New(impl.EnqueueKey, 30*time.Minute)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

30 seconds?

if err != nil {
return fmt.Errorf("error getting next TaskRun to create for PipelineRun %s: %s", pr.Name, err)
return fmt.Errorf("error getting Tasks for Pipeline %s, Pipeline may be invalid!: %s", p.Name, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/error getting Tasks for Pipeline/error getting Tasks or TaskRuns for Pipeline/

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call!

type GetTask func(namespace, name string) (*v1alpha1.Task, error)

// GetTaskRun is a function that will retrieve the TaskRun name from namespace.
type GetTaskRun func(namespace, name string) (*v1alpha1.TaskRun, error)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not your fault, but this is not rendered properly :) i wonder if its a formatting issue

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah im not sure why the syntax highlighting is having a problem here 🤔

last time I saw this happen it was b/c @jonjohnsonjr decided to replace a e with an or something like that

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the syntax highlighter that GitHub uses has a hard time with func if it isn't followed by { ... }.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah that could be it, thanks @jonjohnsonjr :D

and seriously that 𝛾 was pretty sweet, gonna be talking about that one for years to come

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

oh it just keeps getting better, i'm crying

Copy link
Collaborator Author

@bobcatfish bobcatfish Oct 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well that was distracting

pt := p.Spec.Tasks[i]
t, err := getTask(p.Namespace, pt.TaskRef.Name)
if err != nil {
return nil, fmt.Errorf("failed to get tasks for Pipeline %q: Error getting task %q : %s",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now, this is case where we need to check if Task does not exists.
If it does not exist, then we should quit re-conciling this pipeline since its invalid and nothing can be done.
For that to happen, we should return "nil" from the pipelinerun.Reconciler.reconcile

however, there was some other error while fetching task then we keep trying.
Does that make sense?

I would say, we can create a InvalidPipelineError and return that from here when a Task is not found.
In the reconcile method, we should if err is something other than InvalidPipelineError and return nil if its InvalidPipelineError

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep that makes sense! okay ill update this so that if we can't find a task, we return nil from the reconcile loop and stop reconciling.

thanks for catching this and the detailed explanation!

@tejal29 tejal29 added this to the Mid October Demo milestone Oct 11, 2018
logger.Infof("Making sure the expected TaskRuns were created")
expectedTaskRuns := []string{
hwPipelineName + hwPipelineTaskName1,
hwPipelineName + hwPipelineTaskName2,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whoops somehow forgot to include a fix for this

@bobcatfish bobcatfish force-pushed the pipelinerun_status_status branch 2 times, most recently from 5a02ce8 to 3283605 Compare October 11, 2018 20:31
@knative-prow-robot knative-prow-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Oct 11, 2018
@bobcatfish bobcatfish force-pushed the pipelinerun_status_status branch from ab1dc80 to 33036c9 Compare October 11, 2018 23:53
@bobcatfish
Copy link
Collaborator Author

Ready for another look @tejal29 !

@@ -91,6 +95,11 @@ func NewController(
UpdateFunc: controller.PassNew(impl.Enqueue),
DeleteFunc: impl.Enqueue,
})

r.tracker = tracker.New(impl.EnqueueKey, 30*time.Second)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a better way to get the tracker lease as a function of the resync period in #120 which I can update here once this is merged, no point in both of us making the same change

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!! thanks @pivotal-nader-ziada :D if you merge first ill pickup your change, ill leave this at 30 min for now then

@bobcatfish
Copy link
Collaborator Author

oh dear

I1012 01:29:35.122] default                  8s          8s           1         pvc-284c1005-cdbe-11e8-a9ba-42010a800231.155cb8422d54da43                         PersistentVolume                                                Normal    VolumeDelete              persistentvolume-controller                                           googleapi: Error 400: The disk resource 'projects/knative-boskos-10/zones/us-central1-a/disks/gke-kbuild-pipeline-e2-pvc-284c1005-cdbe-11e8-a9ba-42010a800231' is already being used by 'projects/knative-boskos-10/zones/us-central1-a/instances/gke-kbuild-pipeline-e2e--default-pool-a626184b-8p1m', resourceInUseByAnotherResource

@bobcatfish
Copy link
Collaborator Author

oh i guess that volume error is about deletion, maybe a red herring 🤔

@bobcatfish bobcatfish removed this from the Mid October Demo milestone Oct 12, 2018
@bobcatfish bobcatfish force-pushed the pipelinerun_status_status branch from 43bbb0a to bb6e0f8 Compare October 12, 2018 16:32
@bobcatfish
Copy link
Collaborator Author

I1012 01:29:21.113] --- FAIL: TestPipelineRun (240.28s)
I1012 01:29:21.113] 	pipelinerun_test.go:63: Error waiting for PipelineRun helloworld-run to finish: timed out waiting for the condition
I1012 01:29:21.113] 	pipelinerun_test.go:77: Expected TaskRun helloworld-pipelinerun-helloworld-task-1 to have succeeded but Status is Unknown
I1012 01:29:21.113] 	pipelinerun_test.go:73: Couldn't get expected TaskRun helloworld-pipelinerun-helloworld-task-2: taskruns.pipeline.knative.dev "helloworld-pipelinerun-helloworld-task-2" not found
I1012 01:29:21.114] 	crd.go:237: Error waiting for Pod helloworld-validation-busybox to finish: timed out waiting for the condition

yyyyyyyyyyyyyyyyyyyyyyyy

@tejal29
Copy link
Contributor

tejal29 commented Oct 12, 2018

/lgtm

@knative-prow-robot knative-prow-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 12, 2018
@bobcatfish
Copy link
Collaborator Author

bobcatfish commented Oct 12, 2018

oh man i just keep making these tests better and better XD

I1012 16:40:42.177] --- FAIL: TestTaskRun (0.06s)
I1012 16:40:42.178] panic: runtime error: invalid memory address or nil pointer dereference [recovered]
I1012 16:40:42.178] 	panic: runtime error: invalid memory address or nil pointer dereference
I1012 16:40:42.178] [signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x152ab9b]
I1012 16:40:42.178] 
I1012 16:40:42.178] goroutine 166 [running]:
I1012 16:40:42.178] testing.tRunner.func1(0xc4200ac2d0)
I1012 16:40:42.178] 	/usr/local/go/src/testing/testing.go:742 +0x567
I1012 16:40:42.178] panic(0x163b020, 0x20b1ce0)
I1012 16:40:42.178] 	/usr/local/go/src/runtime/panic.go:502 +0x24a
I1012 16:40:42.178] github.com/knative/build-pipeline/test.TestTaskRun.func2(0xc4202886c0, 0x17789c3, 0xe, 0x0)
I1012 16:40:42.179] 	/go/src/github.com/knative/build-pipeline/test/taskrun_test.go:56 +0x7b
I1012 16:40:42.179] github.com/knative/build-pipeline/test.WaitForTaskRunState.func1(0x94e27f, 0x1, 0xc4205a6800)
I1012 16:40:42.179] 	/go/src/github.com/knative/build-pipeline/test/crd_checks.go:54 +0x152
I1012 16:40:42.179] github.com/knative/build-pipeline/vendor/k8s.io/apimachinery/pkg/util/wait.pollImmediateInternal(0xc4205a6800, 0xc4202262d0, 0xc4205a6800, 0x16af140)
I1012 16:40:42.179] 	/go/src/github.com/knative/build-pipeline/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:245 +0x39
I1012 16:40:42.179] github.com/knative/build-pipeline/vendor/k8s.io/apimachinery/pkg/util/wait.PollImmediate(0x3b9aca00, 0x1bf08eb000, 0xc4202262d0, 0x31, 0x0)
I1012 16:40:42.179] 	/go/src/github.com/knative/build-pipeline/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:241 +0x5f
I1012 16:40:42.180] github.com/knative/build-pipeline/test.WaitForTaskRunState(0xc42011f260, 0x17789c3, 0xe, 0x17ed5a0, 0x177881f, 0xe, 0x0, 0x0)
I1012 16:40:42.180] 	/go/src/github.com/knative/build-pipeline/test/crd_checks.go:49 +0x320
I1012 16:40:42.180] github.com/knative/build-pipeline/test.TestTaskRun(0xc4200ac2d0)
I1012 16:40:42.180] 	/go/src/github.com/knative/build-pipeline/test/taskrun_test.go:54 +0xf37
I1012 16:40:42.180] testing.tRunner(0xc4200ac2d0, 0x17ed5a8)
I1012 16:40:42.180] 	/usr/local/go/src/testing/testing.go:777 +0x16e
I1012 16:40:42.180] created by testing.(*T).Run
I1012 16:40:42.180] 	/usr/local/go/src/testing/testing.go:824 +0x565
I1012 16:40:42.181] FAIL	github.com/knative/build-pipeline/test	118.710s

It turns out that we need to look at TaskRuns for a few reasons,
including 1) figuring out what to run next and 2) determining the status
of the PipelineRun, so I've refactored the logic that grabs these to
collect a bunch of related state that can be reused.

When the graph becomes more sophisticated, we will need to make this
structure more than just a list.
Added logic to check statuses of other TaskRuns when deciding if a new
one should be started for #61
PipelineRun status will be based on the condition of the TaskRuns which
it has created, for #61. If any TaskRuns have failed, the PipelineRun
has failed. If all are successful, it is successful. If any are in
progress, it is in progress.

This is assuming a linear Pipeline, we will have to tweak this a bit
when we implement the graph (for #65)
Added the Task reference to the TaskRun so that when a PipelineRun
creates a TaskRun, it actually executes! (For #61)

While running the integration test, noticed that the PipelineRuns
weren't getting reconciled quickly enough, but adding a tracker which
will invoke reconcile when the created TaskRuns are updated fixed this -
however it did still take > 1 minute to create 3 helloworld TaskRuns and
wait for them to complete, so since 3 was arbitrary, reduced to 2.

Also cleaned up the TaskRun controller a bit: using the Logger object on
the controller/reconciler itself, made the log messages a bit more
descriptive.
If a PipelineRun references a Pipeline that uses Tasks which don't
exist, we should immediately stop trying to Reconcile it. To fix this,
the user/trigger should create a new PipelineRun after creating the
Tasks needed.
@bobcatfish bobcatfish force-pushed the pipelinerun_status_status branch from bb6e0f8 to 87d6926 Compare October 12, 2018 22:51
@knative-prow-robot knative-prow-robot removed the lgtm Indicates that a PR is ready to be merged. label Oct 12, 2018
@tejal29
Copy link
Contributor

tejal29 commented Oct 12, 2018

/lgtm

@knative-prow-robot knative-prow-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 12, 2018
@bobcatfish
Copy link
Collaborator Author

image

😩

@bobcatfish bobcatfish force-pushed the pipelinerun_status_status branch from 87d6926 to 3a4cad3 Compare October 12, 2018 23:10
@knative-prow-robot knative-prow-robot removed the lgtm Indicates that a PR is ready to be merged. label Oct 12, 2018
@tejal29
Copy link
Contributor

tejal29 commented Oct 12, 2018

/lgtm

@knative-prow-robot knative-prow-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 12, 2018
@bobcatfish
Copy link
Collaborator Author

🤞🤞🤞
image
🤞🤞🤞

Something has gone wrong with one of the integration tests on my PR and
I don't know what so I'm trying to add more info.

Added Builds to the dumped CRDs, and also moved the step that deploys
the examples is now after the integration tests b/c it produces a lot of
errors in the logs (hahaha...) and makes it harder to debug integration
tests failures.
I think it's reasonable for only one of our eventually many integration
tests to verify the build output, especially when it involves adding a
volume mount to the pile of things that could go wrong in the test.

Refactored the test a bit, so we don't assert inside the test, and we
output some logs before polling.

Removed dumping of CRDs in test script b/c each test runs in its own
namespace and cleans up after itself, so there is never anything to dump
(see tektoncd#145).

Updated condition checking so that if the Run fails, we bail immediately
instead of continuing to hope it will succeed.
@bobcatfish bobcatfish force-pushed the pipelinerun_status_status branch from 3a4cad3 to 360d282 Compare October 12, 2018 23:35
@knative-prow-robot knative-prow-robot removed the lgtm Indicates that a PR is ready to be merged. label Oct 12, 2018
@bobcatfish
Copy link
Collaborator Author

😩

@bobcatfish
Copy link
Collaborator Author

/lgtm

WHAT ITS WORTH A TRY

@knative-prow-robot
Copy link

@bobcatfish: you cannot LGTM your own PR.

In response to this:

/lgtm

WHAT ITS WORTH A TRY

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@bobcatfish
Copy link
Collaborator Author

image

image

@nader-ziada
Copy link
Member

/lgtm

@knative-prow-robot knative-prow-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 13, 2018
@knative-prow-robot knative-prow-robot merged commit 162c2f5 into tektoncd:master Oct 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement simple PipelineRun
5 participants