Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Link git outputs between tasks #270

Merged
merged 1 commit into from
Nov 29, 2018

Conversation

shashwathi
Copy link
Contributor

In this PR I have implemented outputs using PVC within lifetime of pipelinerun. Closed old PR #255 in favor of this. Discussion for design is in previous PR so please refer to that for more details.

Note:

  • Replaces current e2e test for multiple tasks pipelinerun test with fan-in and fan-out pipeline. (More complex than existing e2e test and covers all use cases previous e2e test as well)
  • Added another example under examples to demonstrate git resource dependency
  • Support for targetPath is in pipeline task resource. e2e test includes this use case as well.

What will not be included?

Support for images
This PR only includes support for git resource dependency. Support for images includes some design of metadata and I would like to follow up with another PR for that.

/assign @imjasonh
/assign @bobcatfish
/assign @pivotal-nader-ziada

@knative-prow-robot knative-prow-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Nov 27, 2018
Copy link
Collaborator

@bobcatfish bobcatfish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to start off by saying this is a great PR and thanks for putting all this thorough work in 😍

I especially appreciate your attention to test coverage and all of the well factored unit tests :D

Of course I've always got some feedback tho, my apologies 😅 ! Initial thoughts below:

I'm confused about the fact that in the e2e test, it looks liek the two fanned out tasks modify the same resource is that right? I thought we wanted to explicitly avoid that :S :S :S

Replaces current e2e test for multiple tasks pipelinerun test with fan-in and fan-out pipeline. (More complex than existing e2e test and covers all use cases previous e2e test as well)

Since we haven't completed #168 yet, what order does the fan-in, fan-out end up executing in?

Quick plug for rewriting some of the commit messages, see https://github.com/knative/build-pipeline/blob/master/CONTRIBUTING.md#commit-messages (commit messages like "POC outputs", "Fix merge conflicts" could have more info - also might lend themselves to better messages if they were rebased together? and it would be great if the issue #s could be in some of the commit messages)

Name: "workspace",
ProvidedBy: []string{createImageTaskName},
Name: "workspace",
// ProvidedBy: []string{createImageTaskName}, //Yet to be implemented
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we add a TODO(some issue # here), e.g.:

// TODO(#148 ) add support for xyz

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean you are going to implement this as part of this same PR?

@@ -30,7 +30,7 @@ import (

const (
interval = 1 * time.Second
timeout = 5 * time.Minute
timeout = 10 * time.Minute // TODO: timeout should be configurable via go test timeout
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

go test timeout is the timeout for all tests being run, while this timeout is for individual calls to PollImmediate, I don't think you can use go test timeout for this

which call to PollImmediate needs 10 minutes? That seems extremely long

if it's unavoidable to have a 10 minute timeout, I suggest that we change the interface to these functions to take a timeout parameter so we can use the timeout that is needed each time we call it (i.e. certainly most calls to Poll don't need 10 min)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, if we DO need to up one of these timeouts to 10 min, we should update the test README (https://github.com/knative/build-pipeline/blob/master/test/README.md) to indicate that the go test default timeout won't cut it anymore

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New pipelinerun e2e test includes 4 tasks and this takes upto 6min (on my gke) cluster. I wanted to add timeout to be on safer side so increased it upto 10min but it is very specific to that e2e test. So probably I will just update timeout for that test instead of all crds check timeout

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just bumping this before we merge

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh wait my bad, this is an outdated comment. ignore me!

}, &v1alpha1.Task{
ObjectMeta: metav1.ObjectMeta{
Namespace: namespace,
Name: "check-create-files-exists",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hahaha i feel like ive seen this exact example in the concourse docs (as we slowly re-create concourse XD)

Name: "read-something-else",
Image: "ubuntu",
Command: []string{"cat"},
Args: []string{"/workspace/readingspace/something-else"},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. im a bit confused by what each task and each step in each task is doing - maybe we could use the names of the steps to make it a bit more clear, e.g.:
  • In the first task, "write-something" could become "write-task-1"
  • In the next two tasks that run in parallel, "read-stuff" could be something like "read-data-from-task-1", and "write-soemthing-else" could be "write-data-from-task-2-1" or something
  1. what do you think if we had the tasks themselves assert? i.e. have the step fail if the expected text doesnt exist, e.g. using grep

Copy link
Contributor Author

@shashwathi shashwathi Nov 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you think if we had the tasks themselves assert? i.e. have the step fail if the expected text doesnt exist, e.g. using grep

I like the idea. I will update the tasks to assert contents.

im a bit confused by what each task and each step in each task is doing - maybe we could use the names of the steps to make it a bit more clear

I tried to name tasks with "stuff" and "something" because the tasks are actually writing the same words. I could update them to pattern you mentioned.

})
}

resources.WrapSteps(&tr.Spec, pr.Spec.PipelineTaskResources, pt)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i really like breaking this logic into a separate function!! ❤️

// +optional
PreBuiltSteps []TaskBuildStep `json:"preBuildSteps,omitempty"`
// +optional
PostBuiltSteps []TaskBuildStep `json:"postBuiltSteps,omitempty"`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does Built mean in this context? I think this is intended to run after all the steps run, is that right? i think since this CRD isn't called Build anymore, using Built is a bit confusing, some other ideas:

  • PreSteps
  • BeforeSteps
  • InitializationSteps
  • SetupSteps

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I leaned toward Built because thats the end object we are creating now. I like the name BeforeSteps for pre build steps, for post build steps we could go with AfterSteps.

}

// TaskBuildStep contains information to construct build pre and post steps
type TaskBuildStep struct {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again im not sure about calling this Build (maybe Im missing something tho)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about TaskStep? I agree with moving away from associating with Build keyword

// +optional
PVCName string `json:"pvcName,omitempty"`
// +optional
PreBuiltSteps []TaskBuildStep `json:"preBuildSteps,omitempty"`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment it would look to a user of TaskRun as if these are fields we expect them to provide - do we? What would it look like if a user provided these fields? Can we document it?

If not, can we somehow make it obvious we don't want users to provide them?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have to have these steps in the definition of the taskRun type as a field, can we not append them in the appropriate order while creating the taskRun

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If not, can we somehow make it obvious we don't want users to provide them?

How about add validation in taskrun to check if owner reference is pipelinerun Kind then post build and pre build steps should be allowed else throw error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't allow user to provide those field, it shouldn't be in the spec (should it ?), but I guess they are required for pipeline/pipelinerun to populate them… so… yeah most likely a validation error could work there.

Is there a use case for a user to provides those in addition to normal steps ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a use case for a user to provides those in addition to normal steps ?

@vdemeester: @shashwathi and i discussed, and we were thinking that maybe a use case where a user wants to debug tasks that are part of a pipeline in isolation? it might be a bit of a stretch tho!

i think adding paths to the interface instead of the builtSteps makes it a bit clearer hopefully!

tests := []struct {
name string
steps TaskBuildStep
wantErr *apis.FieldError
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like we don't need wantErr, we have no error cases here (and id prefer those in a separate test anyway!)

@@ -83,3 +83,24 @@ func TestInitializeConditions(t *testing.T) {
t.Fatalf("PipelineRun status getting reset")
}
}

func Test_GetPVC(t *testing.T) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

Copy link
Member

@nader-ziada nader-ziada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @shashwathi

ITs a little bit hard to follow, so wondering if you can maybe update the commit message and add some more comments

value: admin

---
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you need a new resource that has the same properties as existing resource?

Type PipelineResourceType `json:"type"`
Name string `json:"name"`
Type PipelineResourceType `json:"type"`
TargetPath string `json:"targetPath"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be optional since its not applicable to all types of resources

@@ -118,6 +119,8 @@ type PipelineResource struct {
type TaskRunResource struct {
Name string `json:"name"`
ResourceRef PipelineResourceRef `json:"resourceRef"`
// +optional
SourcePaths []string `json:"sourcePaths"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment to what this new field is for

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think I am using that. I will go ahead and delete that field

}
newSteps = append(newSteps, []corev1.Container{{
Name: fmt.Sprintf("source-mkdir-%s", source.Name),
Image: "busybox",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer not to use busybox and instead use some base image controlled by the project

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I follow up with another PR to fix image?

// +optional
PVCName string `json:"pvcName,omitempty"`
// +optional
PreBuiltSteps []TaskBuildStep `json:"preBuildSteps,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have to have these steps in the definition of the taskRun type as a field, can we not append them in the appropriate order while creating the taskRun

Name: "workspace",
ProvidedBy: []string{createImageTaskName},
Name: "workspace",
// ProvidedBy: []string{createImageTaskName}, //Yet to be implemented
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean you are going to implement this as part of this same PR?

Copy link
Member

@vdemeester vdemeester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, most of my question were already covered by @bobcatfish and @pivotal-nader-ziada 😉

// +optional
PVCName string `json:"pvcName,omitempty"`
// +optional
PreBuiltSteps []TaskBuildStep `json:"preBuildSteps,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't allow user to provide those field, it shouldn't be in the spec (should it ?), but I guess they are required for pipeline/pipelinerun to populate them… so… yeah most likely a validation error could work there.

Is there a use case for a user to provides those in addition to normal steps ?

@shashwathi shashwathi force-pushed the output-with-tests branch 3 times, most recently from 6ed542e to 7381dab Compare November 28, 2018 16:46
@shashwathi
Copy link
Contributor Author

@pivotal-nader-ziada @bobcatfish @vdemeester

I updated the design in recent change. Please take a look at it and leave comments. Thanks to @bobcatfish for slack brainstorming.

  • Updated e2e test to assert contents
  • No more use of steps(post, pre). Using taskrun.resource.paths to accomplish passing information to taskrun.
  • Updated docs.
  • Updated commit msg to include issue number and more details.

docs/Concepts.md Outdated
@@ -37,6 +37,7 @@ A Task is a collection of sequential steps you would want to run as part of your
A task will run inside a container on your cluster. A Task declares:

1. Inputs the task needs.
Task input resource can provide `targetPath` to initialize resource in specific directory. Resource will be placed under `/workspace/targetPath`. If `targetPath` is not specified then resource will be initialized under `/workspace`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the PR is looking great and I really like the interface change!

I find this part of the docs a bit hard to follow - maybe it would be more clear if it was in a different section, such as Resources or TaskRun, and maybe had an example?

I'm not 100% sure how to improve it tho, feel free to merge this and we can iterate on it.

@@ -73,12 +73,12 @@ func WaitForPodState(c *clients, name string, namespace string, inState func(r *
// interval until inState returns `true` indicating it is done, returns an
// error or timeout. desc will be used to name the metric that is emitted to
// track how long it took for name to get into the state checked by inState.
func WaitForPipelineRunState(c *clients, name string, inState func(r *v1alpha1.PipelineRun) (bool, error), desc string) error {
func WaitForPipelineRunState(c *clients, name string, polltimeout time.Duration, inState func(r *v1alpha1.PipelineRun) (bool, error), desc string) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Name: "read-from-task-0",
Image: "ubuntu",
Command: []string{"bash"},
Args: []string{"-c", "[[ stuff == $(cat /workspace/stuff) ]]"},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

niiiiiice :D

@bobcatfish
Copy link
Collaborator

/lgtm

In case you want to make some tweaks to the docs - altho it's unlikely a user would want to use the paths directly, I think it would be nice to try to show how they could do that:

/hold

And also I'm still wondering about fan-out and fan-in, seems like we can end up with tasks running in parallel changing the same data?

@knative-prow-robot knative-prow-robot added lgtm Indicates that a PR is ready to be merged. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Nov 29, 2018
@shashwathi
Copy link
Contributor Author

shashwathi commented Nov 29, 2018

In case you want to make some tweaks to the docs - altho it's unlikely a user would want to use the paths directly, I think it would be nice to try to show how they could do that:

I will take another shot at docs tomorrow. 👍

And also I'm still wondering about fan-out and fan-in, seems like we can end up with tasks running in parallel changing the same data?

Yes tasks have individual copy of the resources so even if they do modify it, I do not see why it could be a problem. I agree that fan-in of resources could lead to undesired side effects.

I can see a use case where this feature is useful like using same git resource across whole pipeline but user has to be aware of the changes in each tasks which does not seem too unreasonable I think.

@knative-prow-robot knative-prow-robot removed the lgtm Indicates that a PR is ready to be merged. label Nov 29, 2018
@shashwathi
Copy link
Contributor Author

@bobcatfish : I have updated the docs to include examples. Please review that and let me know if you have any feedback.

- Implementation details
Pipelinerun creates pvc for the lifetime for object and uses that pvc as
scratch space to transfer git resources between them. This information
is passed to taskrun via resource paths. Paths are array of strings and
incase of inouts these paths will be considered as new source of
pipeline resource. In the case of outputs paths will be considered as
new destination directory.
- Update docs to include examples of paths

Partially fixes tektoncd#148
@knative-metrics-robot
Copy link

The following is the coverage report on pkg/.
Say /test pull-knative-build-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/apis/pipeline/v1alpha1/taskrun_types.go 0.0% 55.6% 55.6
pkg/reconciler/v1alpha1/pipelinerun/pipelinerun.go 82.7% 77.2% -5.5
pkg/reconciler/v1alpha1/pipelinerun/resources/input_output_steps.go Do not exist 95.5%
pkg/reconciler/v1alpha1/taskrun/resources/pre_post_build_step.go Do not exist 100.0%
pkg/reconciler/v1alpha1/taskrun/taskrun.go 73.2% 74.1% 0.9

@@ -169,6 +199,66 @@ Creating a `TaskRun` will invoke a [Task](#task), running all of the steps until
completion or failure. Creating a `TaskRun` will require satisfying all of the input
requirements of the `Task`.

`TaskRun` definition includes `inputs`, `outputs` for `Task` referred in spec.

Input resource includes name and reference to pipeline resource and optionally `paths`. `paths` will be used by `TaskRun` as the resource's new source paths i.e., copy the resource from specified list of paths. `TaskRun` expects the folder and contents to be already present in specified paths. `paths` feature could be used to provide extra files or altered version of existing resource before execution of steps.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah great, I get it now! thanks for the example above and the extra detail in the docs :D

@bobcatfish
Copy link
Collaborator

/lgtm
/meow space

@knative-prow-robot knative-prow-robot added the lgtm Indicates that a PR is ready to be merged. label Nov 29, 2018
@knative-prow-robot
Copy link

@bobcatfish: cat image

In response to this:

/lgtm
/meow space

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@shashwathi
Copy link
Contributor Author

@bobcatfish : I need /approve label too.

@bobcatfish
Copy link
Collaborator

whoops, just took it for granted that @shashwathi had approver rights already, she's one of the top contributors!!

/approve

@knative-prow-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bobcatfish, shashwathi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@knative-prow-robot knative-prow-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 29, 2018
@bobcatfish
Copy link
Collaborator

/hold cancel

@knative-prow-robot knative-prow-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 29, 2018
@knative-prow-robot knative-prow-robot merged commit c747227 into tektoncd:master Nov 29, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants