Design Output handling #124

dlorenc · 2018-10-10T19:30:53Z

Expected Behavior

We should design how outputs should be handled by the controllers. This includes validation and fetching (uploading, querying, storing, etc.) to make them available to other Tasks that depend on them.

Outputs can be anything, e.g.:

Test results
Binaries, jars
Images

Each Task may have 0 or more Outputs.

Requirements

If a Task specifies a Resource as an input that is an output of a previous Task, that Resources must be mounted into the Task's step containers
If your Task declares an output resource, a volume mount (or similar) should be created for that output so you can persist it
Validation: If a Task declares an output, we should verify at the end of the run that the output exists
- The Task's steps should have the output volume mounted
- Each Resource should have it's own hooks for Validation
- The TaskRun controller should call the Validation hooks after a Task completes and set status to Failed if the Validation fails
Updating a Resource: It must be possible for a Task to mutate a resource that it has as an output, e.g. adding a digest to an image, or adding/changing a revision on a git resource

For completing this issue, please implement one type of Resource with Validation, e.g. for Test results (which would be logs written to a file) or a generic File resource (which would only have to exist to be Valid).

apiVersion: pipeline.knative.dev/v1alpha1
kind: Task
metadata:
  name: make
  namespace: default
spec:
    outputs:
        resource:
            - name: data
              type: file
              path: some/path # this would be relative to the created mount
    buildSpec:
        steps:
            - name: echo
              image: ubuntu
              command: ['sh']
              args: ['-c', 'echo hello > ${outputs.resources.data.path}']

If the above Task is used in a taskRun, it should be validated that it actually created the output it claimed it would create.

The File resource should be updated with the names of the files that were created.

If a subsequent Task in a Pipeline depends on the file Resource created by that Task, it should be provided with the file(s) created automatically.

Actual Behavior

Outputs do not currently do anything, and a generic "file" type resource does not yet exist.

Additional Info

knative build artifact design

Some ideas:

Append a build step to validate and fetch Outputs to the BuildSpec. This will not work with Builds specified by Templates.
Keep the artifacts on a PV, and attach another pod to that PV after the build to handle outputs.

Something else?

The text was updated successfully, but these errors were encountered:

dlorenc · 2018-10-10T19:31:36Z

Note that this should probably be handled separately from the unstructured, stdout/stderr streams that build steps produce themselves.

tejal29 · 2018-10-10T19:36:28Z

One more way is the Task Controller will watch for pods update and delete req.
If they belong to a TaskRun, upload the unseen logs to GCS.
The GCS store should support partial update of logs.

dlorenc · 2018-10-10T20:16:12Z

For the issue on stdout/stderr in Build, see knative/build#232

bobcatfish · 2018-10-10T20:29:19Z

Some related work that was done designing Build CRD artifacts (shared with members of knative-dev@.)

bobcatfish · 2018-10-10T20:34:15Z

/assign dlorenc

dlorenc · 2018-10-11T03:25:52Z

After some more investigation it appears even the stackdriver logs don't always get uploaded. I tested this in GKE 1.11 (which defaults to containerd) and the logs are even less reliably uploaded.

So I think we have a few AIs now:

Design and Define the mechanism for outputting files
This includes both normal logs (stdout/stderr of steps) and structured outputs (test results).

Design and Define the API for where these files should be stored
Again, this includes both normal logs and structured outputs, which might need to be stored in separate places.

Document recommendations for task authors
We should recommend task authors write logs to files that they care about.

Push on getting a reliable way to export logs
Work with upstream k8s to get the hooks we need to make sure stdout/stderr can be reliably captured.

bobcatfish · 2018-10-12T21:22:59Z

@dlorenc @tejal29 I'm a bit confused about the scope of this task - I think this is about basically all of the results which would be made available to these endpoints:

https://github.com/knative/build-pipeline/blob/37381f5b714bce3c55a5643c76dc03f19f552d72/examples/pipelineparams.yaml#L15-L27

Is that right?

b/c in the context of this project "outputs" can also mean the outputs from a Task, i.e. #148 so I'd like to rename this task to Design Results handling?

Note #143 is a subtask of this issue

dlorenc · 2018-10-14T23:35:59Z

I think this overlaps with #148 but is also a bit more general. We need to design:

What happens to everything in the Task.Outputs list (Results and Resources). This includes where they gets uploaded to the things in PipelineParams and how they get named).
How that happens (do we have a step that checks and performs these uploads?, something else?)
How these get linked to the dependent tasks in a Pipeline.

bobcatfish · 2018-10-15T17:25:58Z

Ah I see, it slipped my mind that test results are in the Task.Outputs list 🤔 Do you think we'll end up with other kinds of results in Tasks.Outputs? I wonder if it might be clearer to have a top level Tasks.Results or something like that instead, since the handling of other Resource based outputs seems like itll probably be pretty different from Results

dlorenc · 2018-11-02T21:18:22Z

google/go-containerregistry#209 and google/go-containerregistry#261

bobcatfish · 2018-11-02T21:24:41Z

Note: @tejal29 @dlorenc @aaron-prindle and I revised the description of this task to be a bit more specific

bobcatfish · 2018-11-10T01:41:03Z

@shashwathi were you planning to take this one on (as well as #148 ?) i just noticed its still assigned to @dlorenc so im unassigning it

shashwathi · 2018-11-10T19:47:52Z

/assign @shashwathi
I think this needs to be addressed before #148 so let me take a stab at this issue first.

@shashwathi

We had been hoping to use `version` to cover two cases: 1. To allow a user to override the version of the Resource they are using (e.g. for a particular git resource, use a different revision) 2. For the case where a Task modifies a resource then provides it to the next Task, we wanted to express the modification in the version In the previous commit, we made it so that a user can change a Resource by creating a new one and specifying the new one in teh PipelineRun, and they can do the same thing via a TaskRun. For the second case, in #124 @shashwathi is designing how the output of one task will be passed to the next, and it is likely going to be via PVCs and will not need the version field. Fixes tektoncd#200

@shashwathi

We had been hoping to use `version` to cover two cases: 1. To allow a user to override the version of the Resource they are using (e.g. for a particular git resource, use a different revision) 2. For the case where a Task modifies a resource then provides it to the next Task, we wanted to express the modification in the version In the previous commit, we made it so that a user can change a Resource by creating a new one and specifying the new one in teh PipelineRun, and they can do the same thing via a TaskRun. For the second case, in #124 @shashwathi is designing how the output of one task will be passed to the next, and it is likely going to be via PVCs and will not need the version field. Fixes tektoncd#200

@shashwathi

We had been hoping to use `version` to cover two cases: 1. To allow a user to override the version of the Resource they are using (e.g. for a particular git resource, use a different revision) 2. For the case where a Task modifies a resource then provides it to the next Task, we wanted to express the modification in the version In the previous commit, we made it so that a user can change a Resource by creating a new one and specifying the new one in teh PipelineRun, and they can do the same thing via a TaskRun. For the second case, in #124 @shashwathi is designing how the output of one task will be passed to the next, and it is likely going to be via PVCs and will not need the version field. Fixes tektoncd#200

@shashwathi

We had been hoping to use `version` to cover two cases: 1. To allow a user to override the version of the Resource they are using (e.g. for a particular git resource, use a different revision) 2. For the case where a Task modifies a resource then provides it to the next Task, we wanted to express the modification in the version In the previous commit, we made it so that a user can change a Resource by creating a new one and specifying the new one in teh PipelineRun, and they can do the same thing via a TaskRun. For the second case, in #124 @shashwathi is designing how the output of one task will be passed to the next, and it is likely going to be via PVCs and will not need the version field. Fixes tektoncd#200

@shashwathi

We had been hoping to use `version` to cover two cases: 1. To allow a user to override the version of the Resource they are using (e.g. for a particular git resource, use a different revision) 2. For the case where a Task modifies a resource then provides it to the next Task, we wanted to express the modification in the version In the previous commit, we made it so that a user can change a Resource by creating a new one and specifying the new one in teh PipelineRun, and they can do the same thing via a TaskRun. For the second case, in #124 @shashwathi is designing how the output of one task will be passed to the next, and it is likely going to be via PVCs and will not need the version field. Fixes tektoncd#200

@shashwathi

We had been hoping to use `version` to cover two cases: 1. To allow a user to override the version of the Resource they are using (e.g. for a particular git resource, use a different revision) 2. For the case where a Task modifies a resource then provides it to the next Task, we wanted to express the modification in the version In the previous commit, we made it so that a user can change a Resource by creating a new one and specifying the new one in teh PipelineRun, and they can do the same thing via a TaskRun. For the second case, in #124 @shashwathi is designing how the output of one task will be passed to the next, and it is likely going to be via PVCs and will not need the version field. Fixes #200

bobcatfish · 2018-11-30T18:16:08Z

I think this is done thanks to #270 @shashwathi or is there more to do? I think there were some TODOs about images in #270 but we could leave those for #216 if we wanted

shashwathi · 2018-11-30T22:26:37Z

In terms of git resource, output design is done. I am not convinced that output design for resources are done yet though. To justify this issue as being done, we should support another resource too.

So I am working on a PR to add support for GCS resource (input & output) in another PR. There is no issue tracking GCS resource support (I could add one and then that is like a subtask of this issue)

WDYT @bobcatfish ?

bobcatfish · 2018-12-04T00:01:16Z

That sounds great to me @shashwathi ! :D 🎉

bobcatfish · 2018-12-04T00:24:50Z

@shashwathi i just noticed that it seems like you can provide a resource as an output without actually specifying it as an output - for example in the end to end tests we write to a git input - we never declare it as an output but the changes we make to it still appear to future tasks (though in our example yaml we DO explicitly declare outputs)

Do you have any thoughts on how we can make output meaningful? I'm thinking that if you want to modify a resource, you must make it an output - maybe changes to inputs are intentionally not passed on to future tasks?

bobcatfish · 2018-12-04T00:26:50Z

(maybe i should make a new issue to discuss this inputs vs outputs thing - seems relevant to this issue too tho!)

shashwathi · 2018-12-04T01:07:25Z

Do you have any thoughts on how we can make output meaningful?

Yes I do. A resource can be declared one of these ways

output only

If a resource is only declared in output then steps will be added to build to alter resource externally (like upload artifact to gcs, push image). It did not make sense to trigger copy to PVC in this case because output the last thing that happens in task so resource wont be altered between tasks.

output and input

In this case, resource will be copied to PVC and alter resource externally. If tasks are sharing resources then explicitly tasks need to declare these resources in inputs as well. Thought behind this approach is resource is fetched(as input), altered in steps and passed onto next tasks (output) so resource needs to be declared in input and output of tasks to be copied onto PVC.

input only

If a resource is only declared in inputs (with previous task dependencies) then pre steps will be added to fetch resource from previous task. Resource will not be copied onto PVC or externally as output is not declared.
If a resource is only declared in inputs (without previous task dependencies) then fresh copy of resource is fetched. Resource will not be copied onto PVC or externally as output is not declared.

Example : In the following pipeline pipeline-share-resources, task2 can declare input git-repo with providedBy constraint only if task1 has declared resource git-repo in output and input.

task2 cannot declare input someother-git-repo with providedBy constraint since task1 has not declared resource someother-git-repo in outputs.

name: pipeline-share-resources
spec:
  tasks: 
  - name: task1
    inputs:
    - name: git-repo  
    - name: someother-git-repo
    outputs: 
    - name: git-repo
   steps:
   - name: alter-git-repo
   - name: alter-someother-git-repo 
  - name: task2
    inputs:
    - name: git-repo
       providedBy: [task1] // valid
    - name: someother-git-repo
       providedBy: [task1] // invalid

I think yaml is messed up but I hope you understood my point.
@bobcatfish WDYT?

shashwathi · 2018-12-04T01:19:04Z

for example in the end to end tests we write to a git input - we never declare it as an output

In this e2e test I considered pipelinerun task resources inputs and output to form construct taskrun inputs and outputs.
You raise a good point though if task does not declare outputs then taskrun validation should have failed. This could be a bug or needs some update in design.

bobcatfish · 2018-12-04T22:51:22Z

You raise a good point though if task does not declare outputs then taskrun validation should have failed. This could be a bug or needs some update in design.

I'm actually playing around with some of this right now (I'm using input + output linking to make an e2e test for #168) so I could add some validation here!

bobcatfish · 2018-12-08T01:24:39Z

I think yaml is messed up but I hope you understood my point.
@bobcatfish WDYT?

I finally had a chance to look at this more carefully and I think that looks good!

There is one case which is a bit weird around how to use providedBy to make Tasks that do nothing but run tests run before other Tasks (cuz these test Tasks have no outputs - so the only way to use providedBy is to use the test Task's input in the subsequent Task).

But on a related note I wrote up a doc with some proposed changes to how we bind PipelineResources(note doc visible to members of [email protected]) - if we went with alternative #1 then I think we could use the input and output semantics like you propose!

@dlorenc

While working on creating an end to end test for tektoncd#168, @dlorenc and I had a very hard time determining why our pipeline wasn't working as expected - it turned out that we were using the wrong resource name in our `providedBy` clauses. This is the first of a couple follow up commits to add more validation: this one makes sure that extra resources are not provided (e.g. if you typo-ed a param with a default value, you may never know your param was being ignored). (Shout out to @vdemeester - I updated the reconciler tests both before and after his test refactoring, and it was nearly impossible to understand the tests before his builders were added: the builders made it waaaaay easier! 🙏 Follow up on #124

@dlorenc

While working on creating an end to end test for tektoncd#168, @dlorenc and I had a very hard time determining why our pipeline wasn't working as expected - it turned out that we were using the wrong resource name in our `providedBy` clauses. This is the first of a couple follow up commits to add more validation: this one makes sure that extra resources are not provided (e.g. if you typo-ed a param with a default value, you may never know your param was being ignored). Some minor changes: * Combined 3 test cases in the taskrun reconciler into one test (testing templating with default params). * Reorganized logic in some reconciler test failures so that `Fatalf` won't prevent the dev from seeing informative failure info (specifically the condition tends to have a reason that explains why the case is failing) (Shout out to @vdemeester - I updated the reconciler tests both before and after his test refactoring, and it was nearly impossible to understand the tests before his builders were added: the builders made it waaaaay easier! 🙏 Follow up on #124

@dlorenc

While working on creating an end to end test for tektoncd#168, @dlorenc and I had a very hard time determining why our pipeline wasn't working as expected - it turned out that we were using the wrong resource name in our `providedBy` clauses. This is the first of a couple follow up commits to add more validation: this one makes sure that extra resources are not provided (e.g. if you typo-ed a param with a default value, you may never know your param was being ignored). Some minor changes: * Combined 3 test cases in the taskrun reconciler into one test (testing templating with default params). * Reorganized logic in some reconciler test failures so that `Fatalf` won't prevent the dev from seeing informative failure info (specifically the condition tends to have a reason that explains why the case is failing) (Shout out to @vdemeester - I updated the reconciler tests both before and after his test refactoring, and it was nearly impossible to understand the tests before his builders were added: the builders made it waaaaay easier! 🙏 Follow up on #124

@dlorenc

While working on creating an end to end test for tektoncd#168, @dlorenc and I had a very hard time determining why our pipeline wasn't working as expected - it turned out that we were using the wrong resource name in our `providedBy` clauses. While adding this validation I _really_ struggled to understand how the `providedBy` clause actually works. I've updated the docs with my best understanding of how it is currently working. Fortunately in tektoncd#320 we will be simplifying how this works! This is follow-up work for #124

@dlorenc

While working on creating an end to end test for tektoncd#168, @dlorenc and I had a very hard time determining why our pipeline wasn't working as expected - it turned out that we were using the wrong resource name in our `providedBy` clauses. While adding this validation I _really_ struggled to understand how the `providedBy` clause actually works. I've updated the docs with my best understanding of how it is currently working. Fortunately in tektoncd#320 we will be simplifying how this works! This is follow-up work for #124

@dlorenc

While working on creating an end to end test for #168, @dlorenc and I had a very hard time determining why our pipeline wasn't working as expected - it turned out that we were using the wrong resource name in our `providedBy` clauses. While adding this validation I _really_ struggled to understand how the `providedBy` clause actually works. I've updated the docs with my best understanding of how it is currently working. Fortunately in #320 we will be simplifying how this works! This is follow-up work for #124

@dlorenc

While working on creating an end to end test for #168, @dlorenc and I had a very hard time determining why our pipeline wasn't working as expected - it turned out that we were using the wrong resource name in our `providedBy` clauses. This is the first of a couple follow up commits to add more validation: this one makes sure that extra resources are not provided (e.g. if you typo-ed a param with a default value, you may never know your param was being ignored). Some minor changes: * Combined 3 test cases in the taskrun reconciler into one test (testing templating with default params). * Reorganized logic in some reconciler test failures so that `Fatalf` won't prevent the dev from seeing informative failure info (specifically the condition tends to have a reason that explains why the case is failing) (Shout out to @vdemeester - I updated the reconciler tests both before and after his test refactoring, and it was nearly impossible to understand the tests before his builders were added: the builders made it waaaaay easier! 🙏 Follow up on #124

knative-prow-robot assigned dlorenc Oct 10, 2018

bobcatfish added the design This task is about creating and discussing a design label Oct 12, 2018

bobcatfish added help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. meaty-juicy-coding-work This task is mostly about implementation!!! And docs and tests of course but that's a given labels Nov 2, 2018

bobcatfish mentioned this issue Nov 2, 2018

Add Image Output Resources #216

Closed

bobcatfish added this to the 0.0.1 Alpha release milestone Nov 3, 2018

bobcatfish mentioned this issue Nov 7, 2018

Link outputSourceBindings to build #173

Closed

shashwathi mentioned this issue Nov 7, 2018

Link outputSourceBindings from one PipelineTask to inputSourceBindings of next task #148

Closed

bobcatfish unassigned dlorenc Nov 10, 2018

knative-prow-robot assigned shashwathi Nov 10, 2018

bobcatfish mentioned this issue Nov 21, 2018

Allow Pipelines to be used with different Resources #248

Merged

bobcatfish mentioned this issue Dec 8, 2018

Reduce PipelineResource bindings #320

Closed

shashwathi mentioned this issue Dec 9, 2018

Add support for storage resource #321

Merged

knative-prow-robot closed this as completed in #321 Dec 19, 2018

bobcatfish mentioned this issue Dec 20, 2018

Validate that extra params + resources aren't provided #341

Merged

bobcatfish mentioned this issue Dec 20, 2018

Add docs and validation for providedBy #342

Merged

mchmarny unassigned shashwathi Mar 7, 2019

imjasonh mentioned this issue Oct 31, 2019

Remove cmd/gsutil image #1504

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Design Output handling #124

Design Output handling #124

dlorenc commented Oct 10, 2018 •

edited by bobcatfish

Loading

dlorenc commented Oct 10, 2018

tejal29 commented Oct 10, 2018

dlorenc commented Oct 10, 2018

bobcatfish commented Oct 10, 2018

bobcatfish commented Oct 10, 2018

dlorenc commented Oct 11, 2018

bobcatfish commented Oct 12, 2018

dlorenc commented Oct 14, 2018

bobcatfish commented Oct 15, 2018

dlorenc commented Nov 2, 2018

bobcatfish commented Nov 2, 2018

bobcatfish commented Nov 10, 2018

shashwathi commented Nov 10, 2018

bobcatfish commented Nov 30, 2018

shashwathi commented Nov 30, 2018

bobcatfish commented Dec 4, 2018

bobcatfish commented Dec 4, 2018

bobcatfish commented Dec 4, 2018

shashwathi commented Dec 4, 2018 •

edited

Loading

shashwathi commented Dec 4, 2018

bobcatfish commented Dec 4, 2018

bobcatfish commented Dec 8, 2018

Design Output handling #124

Design Output handling #124

Comments

dlorenc commented Oct 10, 2018 • edited by bobcatfish Loading

Expected Behavior

Requirements

Actual Behavior

Additional Info

dlorenc commented Oct 10, 2018

tejal29 commented Oct 10, 2018

dlorenc commented Oct 10, 2018

bobcatfish commented Oct 10, 2018

bobcatfish commented Oct 10, 2018

dlorenc commented Oct 11, 2018

bobcatfish commented Oct 12, 2018

dlorenc commented Oct 14, 2018

bobcatfish commented Oct 15, 2018

dlorenc commented Nov 2, 2018

bobcatfish commented Nov 2, 2018

bobcatfish commented Nov 10, 2018

shashwathi commented Nov 10, 2018

bobcatfish commented Nov 30, 2018

shashwathi commented Nov 30, 2018

bobcatfish commented Dec 4, 2018

bobcatfish commented Dec 4, 2018

bobcatfish commented Dec 4, 2018

shashwathi commented Dec 4, 2018 • edited Loading

shashwathi commented Dec 4, 2018

bobcatfish commented Dec 4, 2018

bobcatfish commented Dec 8, 2018

dlorenc commented Oct 10, 2018 •

edited by bobcatfish

Loading

shashwathi commented Dec 4, 2018 •

edited

Loading