Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Link git resources from previous task #255

Closed
wants to merge 7 commits into from

Conversation

shashwathi
Copy link
Contributor

@shashwathi shashwathi commented Nov 21, 2018

In this PR I am proposing a solution for passing constraints between tasks.

Considered solutions:

  • external blob storage like s3, gcs
    Pros: Operator can configure bucket at controller level or at pipeline level. Easy to create directory with pipelinerun/taskrun_names to never clash
    Cons: It is extra burden on operator to set this up(main pain point).
    Additionally bucket becomes an external entity out of k8s ecosystem. If controller crasher or objects get deleted then it is possible buckets might be orphaned.

  • external server like s3 which is backed by PVC
    Pros: Operator does not have to setup anything. Only one PVC possibly backing this web server which will take take requests to write to file system and fetch from file system.
    Cons: I would be re-implementing a bad s3 or gcs version :D Handle concurrent read, write requests. I would have to make make the requests efficient by compressing file systems so avg pipelinerun time would become higher. Lot of writing new code which is never a great idea.

In this PR I am choosing the route of using PVC within lifetime of pipelinerun
Pros: Operator doesn't have to worry about any external work. PVC is created and deleted in the lifetime of pipelinerun.
cons: In few IAASes creating PVC is time consuming. Possibly operators could make this easy by preallocating disks ahead of time or some other potential solution to reuse from pool(I have not thought fool proof plan to solve this yet).

Compared to other solutions I am leaning towards this as this does not include any new setup for operator or user. Objects created (PVC) are all in same k8s ecosystem and cleaned up as pipelinerun is completed.

Now onto implementation details
PVC with pipelinerun name is created as pipelinerun starts and this information is passed onto taskrun objects created to construct build steps.

  • Pipelinerun adds prebuildsteps, postbuildsteps, pvcName for taskrun. This essentially encompasses information for taskrun to construct build steps. Only pipeline task contains information about dependency of resource so this information needs to be translated to taskrun in easy format.
  • Taskruns reads this prebuildsteps, postbuildsteps to construct actual build step container information. Any existing Build steps will be "Wrapped" with prebuildsteps and postbuildsteps. These build steps are cp -r source destination containers. Incase of prebuildsteps, build source would have already created the path so this step overrides existing source folder with altered previous source. Incase of postbuildsteps, build source will first make the dir in pvc and the copy the resource folder onto pvc(hence 2 steps, could be merged as an improvement).

TODO:

  • Add go integration tests
  • Add unit tests
  • Possibly refactor code more
  • Possibly merge creation and copy container into 1 container

You might be wondering with no tests, how do I know this PR works?

I have added 2 pipelinerun samples under examples/test-pvc, examples/test-pvc-2 which implement 2 different pipeline styles

  • examples/test-pvc- This is the vanilla use case of github repo being passed form task 1-> task2
  • examples/test-pvc-2 - This is slightly complex use case of task 1, task 2 altering a git resource and task 3 checks that changed in task 1,2 are present

What more changes does this PR include?

  • Support for targetPath in resource. As of today, pipeline resource does not support targetpath (path where the source will be placed) which means resources will placed under /workspace always, this PR would be too easy in that case. When we add support for target path it would be too complex to figure out hence I have included the complexity to support future and existing use cases. examples/test-pvc includes test with targetpath. I am also trying to show that proposed solution is not fragile.

What will not be included?

  • Support for images
    This PR only includes support for git resources. Support for images includes some design of metadata and I would like to follow up with another PR for that.

I would like hear your thoughts, concerns about the approach and implementation. I will add required tests when I have 👍

cc @bobcatfish @pivotal-nader-ziada @imjasonh @abayer

@knative-prow-robot knative-prow-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 21, 2018
@knative-prow-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: shashwathi
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: imjasonh

If they are not already assigned, you can assign the PR to them by writing /assign @imjasonh in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@knative-prow-robot knative-prow-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Nov 21, 2018
@abayer
Copy link
Contributor

abayer commented Nov 21, 2018

Nice! The design approach sounds reasonable to me at first glance, I'll dig into the code later to make sure I understand it. =) I assume this general approach could be used for the sort of thing I was asking about in Slack earlier, i.e., passing a file or set of files from one Task to another? (like, say, building a war file in the build task and then passing it on to the test task to do stuff with) I know that's not something that feels particularly k8s-native, but I am still thinking about how to use Pipeline CRD as the guts of a general-purpose pipeline execution engine that runs on k8s, but isn't necessarily deploying to k8s, if that makes any sense.

@shashwathi
Copy link
Contributor Author

/test pull-knative-build-pipeline-integration-tests

@shashwathi
Copy link
Contributor Author

passing a file or set of files from one Task to another?

providedBy currently is applied to entire resource. It takes entire resource folder and dumps into further task's workspace(blind copy).
Its a great use case to extend current output design to only copy specific file instead of whole repository(intelligent copy). Please create issue to track this feature. I think it would be really beneficial and would be easy change after this PR. This would be an enhancement to this feature.

@abayer
Copy link
Contributor

abayer commented Nov 21, 2018

@shashwathi - done! #259

@bobcatfish
Copy link
Collaborator

I think this approach makes a lot of sense @shashwathi !

I need to look at the code in more detail as well but one question at the outset about fan out and fan in: I can see that you have an example that has a Task getting a git resource from two previous Tasks:

  - name: check-kritis               # 2.  check file exists
    taskRef:
      name: check-files-exists
    inputSourceBindings:
    - name: workspace
      resourceRef:
        name: kritis-resources-git
      providedBy:
      - create-file-kritis-1
      - create-another-file-kritis 

What happens if both create-file-kritis-1 create-another-file-kritis run simultaneously and both make changes to kritis-resources-git? Are they both using the same PVC (maybe making conflicting change), or do they each have their own PVC which then needs to be merged somehow so that check-kritis can use it?

@knative-metrics-robot
Copy link

The following is the coverage report on pkg/.
Say /test pull-knative-build-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/apis/pipeline/v1alpha1/pipelinerun_types.go 100.0% 63.6% -36.4
pkg/reconciler/v1alpha1/pipelinerun/pipelinerun.go 81.7% 78.8% -2.9
pkg/reconciler/v1alpha1/pipelinerun/resources/input_output_steps.go Do not exist 37.5%
pkg/reconciler/v1alpha1/taskrun/resources/pre_post_build_step.go Do not exist 0.0%
pkg/reconciler/v1alpha1/taskrun/taskrun.go 72.6% 72.7% 0.1

@knative-prow-robot
Copy link

@shashwathi: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
pull-knative-build-pipeline-integration-tests 3cd42f7 link /test pull-knative-build-pipeline-integration-tests

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@nader-ziada
Copy link
Member

@shashwathi I like the design approach, using PVC seems like the most reasonable option, but I do think that once the build code is migrated to task, the code can be made more straightforward

@bobcatfish I don't think this is a scenario that should be allowed, two tasks updating the same git resource when running in parallel and a third task depending on both of them. We might need to have some kind of pipeline consistency checker/validator for invalid scenarios

@vdemeester
Copy link
Member

Design make sense for me too (well I'm late to teh party to say that 😛)

I don't think this is a scenario that should be allowed, two tasks updating the same git resource when running in parallel and a third task depending on both of them. We might need to have some kind of pipeline consistency checker/validator for invalid scenarios

@pivotal-nader-ziada agreeing on that ; validing that, we could "enforce" that people uses different named resource in those cases (or no parallel execution)


func (pr *PipelineRun) GetPVC() *corev1.PersistentVolumeClaim {
var pvcSizeBytes int64
pvcSizeBytes = 5 * 1024 * 1024 * 1024 // 5 GBs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

most likely in a follow-up but we may want to make this configurable

}
newSteps = append(newSteps, []corev1.Container{{
Name: fmt.Sprintf("source-copy-make-%s", source.Name),
Image: "ubuntu",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We way want to use a smaller image like busybox (as knative/build if I'm correct).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah agreed. No need for such a big image for copying purpose.

}, {
Name: fmt.Sprintf("source-copy-%s", source.Name),
Image: "ubuntu",
Command: []string{"/bin/cp"},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

genuine naive question : cp or rsync ? 😋

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cp. I did consider rsync but from my understanding it seems to be a better tool if parts of files are updated instead of complete set of files.

@shashwathi shashwathi mentioned this pull request Nov 27, 2018
3 tasks
@shashwathi
Copy link
Contributor Author

closing this in favor of #270

@shashwathi shashwathi closed this Nov 27, 2018
pradeepitm12 pushed a commit to pradeepitm12/pipeline that referenced this pull request Jan 28, 2021
Fixes tektoncd#255
Adds validation to the TriggerTemplate resource. Validation will fail
when a ResourceTemplate uses a parameter not already declared in
spec.params.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants