Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design Output handling #124

Closed
dlorenc opened this issue Oct 10, 2018 · 22 comments · Fixed by #1504
Closed

Design Output handling #124

dlorenc opened this issue Oct 10, 2018 · 22 comments · Fixed by #1504
Labels
design This task is about creating and discussing a design help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. meaty-juicy-coding-work This task is mostly about implementation!!! And docs and tests of course but that's a given

Comments

@dlorenc
Copy link
Contributor

dlorenc commented Oct 10, 2018

Expected Behavior

We should design how outputs should be handled by the controllers. This includes validation and fetching (uploading, querying, storing, etc.) to make them available to other Tasks that depend on them.

Outputs can be anything, e.g.:

  • Test results
  • Binaries, jars
  • Images

Each Task may have 0 or more Outputs.

Requirements

  • If a Task specifies a Resource as an input that is an output of a previous Task, that Resources must be mounted into the Task's step containers
  • If your Task declares an output resource, a volume mount (or similar) should be created for that output so you can persist it
  • Validation: If a Task declares an output, we should verify at the end of the run that the output exists
    - The Task's steps should have the output volume mounted
    - Each Resource should have it's own hooks for Validation
    - The TaskRun controller should call the Validation hooks after a Task completes and set status to Failed if the Validation fails
  • Updating a Resource: It must be possible for a Task to mutate a resource that it has as an output, e.g. adding a digest to an image, or adding/changing a revision on a git resource

For completing this issue, please implement one type of Resource with Validation, e.g. for Test results (which would be logs written to a file) or a generic File resource (which would only have to exist to be Valid).

apiVersion: pipeline.knative.dev/v1alpha1
kind: Task
metadata:
  name: make
  namespace: default
spec:
    outputs:
        resource:
            - name: data
              type: file
              path: some/path # this would be relative to the created mount
    buildSpec:
        steps:
            - name: echo
              image: ubuntu
              command: ['sh']
              args: ['-c', 'echo hello > ${outputs.resources.data.path}']

If the above Task is used in a taskRun, it should be validated that it actually created the output it claimed it would create.

The File resource should be updated with the names of the files that were created.

If a subsequent Task in a Pipeline depends on the file Resource created by that Task, it should be provided with the file(s) created automatically.

Actual Behavior

Outputs do not currently do anything, and a generic "file" type resource does not yet exist.

Additional Info

Some ideas:

  • Append a build step to validate and fetch Outputs to the BuildSpec. This will not work with Builds specified by Templates.
  • Keep the artifacts on a PV, and attach another pod to that PV after the build to handle outputs.

Something else?

@dlorenc
Copy link
Contributor Author

dlorenc commented Oct 10, 2018

Note that this should probably be handled separately from the unstructured, stdout/stderr streams that build steps produce themselves.

@tejal29
Copy link
Contributor

tejal29 commented Oct 10, 2018

One more way is the Task Controller will watch for pods update and delete req.
If they belong to a TaskRun, upload the unseen logs to GCS.
The GCS store should support partial update of logs.

@dlorenc
Copy link
Contributor Author

dlorenc commented Oct 10, 2018

For the issue on stdout/stderr in Build, see knative/build#232

@bobcatfish
Copy link
Collaborator

Some related work that was done designing Build CRD artifacts (shared with members of knative-dev@.)

@bobcatfish
Copy link
Collaborator

/assign dlorenc

@dlorenc
Copy link
Contributor Author

dlorenc commented Oct 11, 2018

After some more investigation it appears even the stackdriver logs don't always get uploaded. I tested this in GKE 1.11 (which defaults to containerd) and the logs are even less reliably uploaded.

So I think we have a few AIs now:

Design and Define the mechanism for outputting files
This includes both normal logs (stdout/stderr of steps) and structured outputs (test results).

Design and Define the API for where these files should be stored
Again, this includes both normal logs and structured outputs, which might need to be stored in separate places.

Document recommendations for task authors
We should recommend task authors write logs to files that they care about.

Push on getting a reliable way to export logs
Work with upstream k8s to get the hooks we need to make sure stdout/stderr can be reliably captured.

@bobcatfish
Copy link
Collaborator

@dlorenc @tejal29 I'm a bit confused about the scope of this task - I think this is about basically all of the results which would be made available to these endpoints:

https://github.com/knative/build-pipeline/blob/37381f5b714bce3c55a5643c76dc03f19f552d72/examples/pipelineparams.yaml#L15-L27

Is that right?

b/c in the context of this project "outputs" can also mean the outputs from a Task, i.e. #148 so I'd like to rename this task to Design Results handling?

Note #143 is a subtask of this issue

@bobcatfish bobcatfish added the design This task is about creating and discussing a design label Oct 12, 2018
@dlorenc
Copy link
Contributor Author

dlorenc commented Oct 14, 2018

I think this overlaps with #148 but is also a bit more general. We need to design:

  • What happens to everything in the Task.Outputs list (Results and Resources). This includes where they gets uploaded to the things in PipelineParams and how they get named).
  • How that happens (do we have a step that checks and performs these uploads?, something else?)
  • How these get linked to the dependent tasks in a Pipeline.

@bobcatfish
Copy link
Collaborator

Ah I see, it slipped my mind that test results are in the Task.Outputs list 🤔 Do you think we'll end up with other kinds of results in Tasks.Outputs? I wonder if it might be clearer to have a top level Tasks.Results or something like that instead, since the handling of other Resource based outputs seems like itll probably be pretty different from Results

@dlorenc
Copy link
Contributor Author

dlorenc commented Nov 2, 2018

@bobcatfish
Copy link
Collaborator

Note: @tejal29 @dlorenc @aaron-prindle and I revised the description of this task to be a bit more specific

@bobcatfish bobcatfish added help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. meaty-juicy-coding-work This task is mostly about implementation!!! And docs and tests of course but that's a given labels Nov 2, 2018
@bobcatfish bobcatfish added this to the 0.0.1 Alpha release milestone Nov 3, 2018
@bobcatfish
Copy link
Collaborator

@shashwathi were you planning to take this one on (as well as #148 ?) i just noticed its still assigned to @dlorenc so im unassigning it

@shashwathi
Copy link
Contributor

/assign @shashwathi
I think this needs to be addressed before #148 so let me take a stab at this issue first.

bobcatfish referenced this issue in bobcatfish/pipeline Nov 21, 2018
We had been hoping to use `version` to cover two cases:
1. To allow a user to override the version of the Resource they are using
   (e.g. for a particular git resource, use a different revision)
2.  For the case where a Task modifies a resource then provides it to the
   next Task, we wanted to express the modification in the version

In the previous commit, we made it so that a user can change a Resource
by creating a new one and specifying the new one in teh PipelineRun,
and they can do the same thing via a TaskRun.

For the second case, in #124 @shashwathi is designing how the output of
one task will be passed to the next, and it is likely going to be via
PVCs and will not need the version field.

Fixes tektoncd#200
bobcatfish referenced this issue in bobcatfish/pipeline Nov 21, 2018
We had been hoping to use `version` to cover two cases:
1. To allow a user to override the version of the Resource they are using
   (e.g. for a particular git resource, use a different revision)
2.  For the case where a Task modifies a resource then provides it to the
   next Task, we wanted to express the modification in the version

In the previous commit, we made it so that a user can change a Resource
by creating a new one and specifying the new one in teh PipelineRun,
and they can do the same thing via a TaskRun.

For the second case, in #124 @shashwathi is designing how the output of
one task will be passed to the next, and it is likely going to be via
PVCs and will not need the version field.

Fixes tektoncd#200
bobcatfish referenced this issue in bobcatfish/pipeline Nov 21, 2018
We had been hoping to use `version` to cover two cases:
1. To allow a user to override the version of the Resource they are using
   (e.g. for a particular git resource, use a different revision)
2.  For the case where a Task modifies a resource then provides it to the
   next Task, we wanted to express the modification in the version

In the previous commit, we made it so that a user can change a Resource
by creating a new one and specifying the new one in teh PipelineRun,
and they can do the same thing via a TaskRun.

For the second case, in #124 @shashwathi is designing how the output of
one task will be passed to the next, and it is likely going to be via
PVCs and will not need the version field.

Fixes tektoncd#200
bobcatfish referenced this issue in bobcatfish/pipeline Nov 22, 2018
We had been hoping to use `version` to cover two cases:
1. To allow a user to override the version of the Resource they are using
   (e.g. for a particular git resource, use a different revision)
2.  For the case where a Task modifies a resource then provides it to the
   next Task, we wanted to express the modification in the version

In the previous commit, we made it so that a user can change a Resource
by creating a new one and specifying the new one in teh PipelineRun,
and they can do the same thing via a TaskRun.

For the second case, in #124 @shashwathi is designing how the output of
one task will be passed to the next, and it is likely going to be via
PVCs and will not need the version field.

Fixes tektoncd#200
bobcatfish referenced this issue in bobcatfish/pipeline Nov 22, 2018
We had been hoping to use `version` to cover two cases:
1. To allow a user to override the version of the Resource they are using
   (e.g. for a particular git resource, use a different revision)
2.  For the case where a Task modifies a resource then provides it to the
   next Task, we wanted to express the modification in the version

In the previous commit, we made it so that a user can change a Resource
by creating a new one and specifying the new one in teh PipelineRun,
and they can do the same thing via a TaskRun.

For the second case, in #124 @shashwathi is designing how the output of
one task will be passed to the next, and it is likely going to be via
PVCs and will not need the version field.

Fixes tektoncd#200
knative-prow-robot pushed a commit that referenced this issue Nov 22, 2018
We had been hoping to use `version` to cover two cases:
1. To allow a user to override the version of the Resource they are using
   (e.g. for a particular git resource, use a different revision)
2.  For the case where a Task modifies a resource then provides it to the
   next Task, we wanted to express the modification in the version

In the previous commit, we made it so that a user can change a Resource
by creating a new one and specifying the new one in teh PipelineRun,
and they can do the same thing via a TaskRun.

For the second case, in #124 @shashwathi is designing how the output of
one task will be passed to the next, and it is likely going to be via
PVCs and will not need the version field.

Fixes #200
@bobcatfish
Copy link
Collaborator

I think this is done thanks to #270 @shashwathi or is there more to do? I think there were some TODOs about images in #270 but we could leave those for #216 if we wanted

@shashwathi
Copy link
Contributor

In terms of git resource, output design is done. I am not convinced that output design for resources are done yet though. To justify this issue as being done, we should support another resource too.

So I am working on a PR to add support for GCS resource (input & output) in another PR. There is no issue tracking GCS resource support (I could add one and then that is like a subtask of this issue)

WDYT @bobcatfish ?

@bobcatfish
Copy link
Collaborator

That sounds great to me @shashwathi ! :D 🎉

@bobcatfish
Copy link
Collaborator

@shashwathi i just noticed that it seems like you can provide a resource as an output without actually specifying it as an output - for example in the end to end tests we write to a git input - we never declare it as an output but the changes we make to it still appear to future tasks (though in our example yaml we DO explicitly declare outputs)

Do you have any thoughts on how we can make output meaningful? I'm thinking that if you want to modify a resource, you must make it an output - maybe changes to inputs are intentionally not passed on to future tasks?

@bobcatfish
Copy link
Collaborator

(maybe i should make a new issue to discuss this inputs vs outputs thing - seems relevant to this issue too tho!)

@shashwathi
Copy link
Contributor

shashwathi commented Dec 4, 2018

Do you have any thoughts on how we can make output meaningful?

Yes I do. A resource can be declared one of these ways

output only

  • If a resource is only declared in output then steps will be added to build to alter resource externally (like upload artifact to gcs, push image). It did not make sense to trigger copy to PVC in this case because output the last thing that happens in task so resource wont be altered between tasks.

output and input

  • In this case, resource will be copied to PVC and alter resource externally. If tasks are sharing resources then explicitly tasks need to declare these resources in inputs as well. Thought behind this approach is resource is fetched(as input), altered in steps and passed onto next tasks (output) so resource needs to be declared in input and output of tasks to be copied onto PVC.

input only

  • If a resource is only declared in inputs (with previous task dependencies) then pre steps will be added to fetch resource from previous task. Resource will not be copied onto PVC or externally as output is not declared.
  • If a resource is only declared in inputs (without previous task dependencies) then fresh copy of resource is fetched. Resource will not be copied onto PVC or externally as output is not declared.

Example : In the following pipeline pipeline-share-resources, task2 can declare input git-repo with providedBy constraint only if task1 has declared resource git-repo in output and input.

task2 cannot declare input someother-git-repo with providedBy constraint since task1 has not declared resource someother-git-repo in outputs.

name: pipeline-share-resources
spec:
  tasks: 
  - name: task1
    inputs:
    - name: git-repo  
    - name: someother-git-repo
    outputs: 
    - name: git-repo
   steps:
   - name: alter-git-repo
   - name: alter-someother-git-repo 
  - name: task2
    inputs:
    - name: git-repo
       providedBy: [task1] // valid
    - name: someother-git-repo
       providedBy: [task1] // invalid

I think yaml is messed up but I hope you understood my point.
@bobcatfish WDYT?

@shashwathi
Copy link
Contributor

for example in the end to end tests we write to a git input - we never declare it as an output

In this e2e test I considered pipelinerun task resources inputs and output to form construct taskrun inputs and outputs.
You raise a good point though if task does not declare outputs then taskrun validation should have failed. This could be a bug or needs some update in design.

@bobcatfish
Copy link
Collaborator

You raise a good point though if task does not declare outputs then taskrun validation should have failed. This could be a bug or needs some update in design.

I'm actually playing around with some of this right now (I'm using input + output linking to make an e2e test for #168) so I could add some validation here!

@bobcatfish
Copy link
Collaborator

I think yaml is messed up but I hope you understood my point.
@bobcatfish WDYT?

I finally had a chance to look at this more carefully and I think that looks good!

There is one case which is a bit weird around how to use providedBy to make Tasks that do nothing but run tests run before other Tasks (cuz these test Tasks have no outputs - so the only way to use providedBy is to use the test Task's input in the subsequent Task).

But on a related note I wrote up a doc with some proposed changes to how we bind PipelineResources(note doc visible to members of [email protected]) - if we went with alternative #1 then I think we could use the input and output semantics like you propose!

bobcatfish referenced this issue in bobcatfish/pipeline Dec 19, 2018
While working on creating an end to end test for tektoncd#168, @dlorenc and I
had a very hard time determining why our pipeline wasn't working as
expected - it turned out that we were using the wrong resource name in
our `providedBy` clauses.

This is the first of a couple follow up commits to add more validation:
this one makes sure that extra resources are not provided (e.g. if you
typo-ed a param with a default value, you may never know your param was
being ignored).

(Shout out to @vdemeester - I updated the reconciler tests both before
and after his test refactoring, and it was nearly impossible to
understand the tests before his builders were added: the builders made
it waaaaay easier! 🙏

Follow up on #124
bobcatfish referenced this issue in bobcatfish/pipeline Dec 20, 2018
While working on creating an end to end test for tektoncd#168, @dlorenc and I
had a very hard time determining why our pipeline wasn't working as
expected - it turned out that we were using the wrong resource name in
our `providedBy` clauses.

This is the first of a couple follow up commits to add more validation:
this one makes sure that extra resources are not provided (e.g. if you
typo-ed a param with a default value, you may never know your param was
being ignored).

Some minor changes:
* Combined 3 test cases in the taskrun reconciler into one test (testing
  templating with default params).
* Reorganized logic in some reconciler test failures so that `Fatalf`
  won't prevent the dev from seeing informative failure info
  (specifically the condition tends to have a reason that explains why
  the case is failing)

(Shout out to @vdemeester - I updated the reconciler tests both before
and after his test refactoring, and it was nearly impossible to
understand the tests before his builders were added: the builders made
it waaaaay easier! 🙏

Follow up on #124
bobcatfish referenced this issue in bobcatfish/pipeline Dec 20, 2018
While working on creating an end to end test for tektoncd#168, @dlorenc and I
had a very hard time determining why our pipeline wasn't working as
expected - it turned out that we were using the wrong resource name in
our `providedBy` clauses.

This is the first of a couple follow up commits to add more validation:
this one makes sure that extra resources are not provided (e.g. if you
typo-ed a param with a default value, you may never know your param was
being ignored).

Some minor changes:
* Combined 3 test cases in the taskrun reconciler into one test (testing
  templating with default params).
* Reorganized logic in some reconciler test failures so that `Fatalf`
  won't prevent the dev from seeing informative failure info
  (specifically the condition tends to have a reason that explains why
  the case is failing)

(Shout out to @vdemeester - I updated the reconciler tests both before
and after his test refactoring, and it was nearly impossible to
understand the tests before his builders were added: the builders made
it waaaaay easier! 🙏

Follow up on #124
bobcatfish referenced this issue in bobcatfish/pipeline Dec 20, 2018
While working on creating an end to end test for tektoncd#168, @dlorenc and I
had a very hard time determining why our pipeline wasn't working as
expected - it turned out that we were using the wrong resource name in
our `providedBy` clauses.

While adding this validation I _really_ struggled to understand how the
`providedBy` clause actually works. I've updated the docs with my best
understanding of how it is currently working.

Fortunately in tektoncd#320 we will be simplifying how this works!

This is follow-up work for #124
bobcatfish referenced this issue in bobcatfish/pipeline Dec 20, 2018
While working on creating an end to end test for tektoncd#168, @dlorenc and I
had a very hard time determining why our pipeline wasn't working as
expected - it turned out that we were using the wrong resource name in
our `providedBy` clauses.

While adding this validation I _really_ struggled to understand how the
`providedBy` clause actually works. I've updated the docs with my best
understanding of how it is currently working.

Fortunately in tektoncd#320 we will be simplifying how this works!

This is follow-up work for #124
knative-prow-robot pushed a commit that referenced this issue Dec 20, 2018
While working on creating an end to end test for #168, @dlorenc and I
had a very hard time determining why our pipeline wasn't working as
expected - it turned out that we were using the wrong resource name in
our `providedBy` clauses.

While adding this validation I _really_ struggled to understand how the
`providedBy` clause actually works. I've updated the docs with my best
understanding of how it is currently working.

Fortunately in #320 we will be simplifying how this works!

This is follow-up work for #124
knative-prow-robot pushed a commit that referenced this issue Jan 9, 2019
While working on creating an end to end test for #168, @dlorenc and I
had a very hard time determining why our pipeline wasn't working as
expected - it turned out that we were using the wrong resource name in
our `providedBy` clauses.

This is the first of a couple follow up commits to add more validation:
this one makes sure that extra resources are not provided (e.g. if you
typo-ed a param with a default value, you may never know your param was
being ignored).

Some minor changes:
* Combined 3 test cases in the taskrun reconciler into one test (testing
  templating with default params).
* Reorganized logic in some reconciler test failures so that `Fatalf`
  won't prevent the dev from seeing informative failure info
  (specifically the condition tends to have a reason that explains why
  the case is failing)

(Shout out to @vdemeester - I updated the reconciler tests both before
and after his test refactoring, and it was nearly impossible to
understand the tests before his builders were added: the builders made
it waaaaay easier! 🙏

Follow up on #124
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design This task is about creating and discussing a design help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. meaty-juicy-coding-work This task is mostly about implementation!!! And docs and tests of course but that's a given
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants