Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added entrypoint log grabber to taskrun controller #167

Merged
merged 1 commit into from
Oct 24, 2018
Merged

Added entrypoint log grabber to taskrun controller #167

merged 1 commit into from
Oct 24, 2018

Conversation

aaron-prindle
Copy link
Contributor

What is the problem being solved?
This PR addresses issue #143. This issue is that it when Build's complete, logs from the steps the Build ran are garbage collected by kubernetes and no longer available to the user.

Why is this the best approach?

  1. No special kubernetes configuration (eg. changing garbage collector values)

What other approaches did you consider?

  1. Changing kubernetes garbage collection for these containers so that they are not immediately deleted and log capture was possible

What side effects will this approach have?

  1. With this approach, users will have to specify the "Command" value for the containers they which to run as the Entrypoint is not retrievable. This means that containers/flows setup to use only the Entrypoint will no longer be supported.

What future work remains to be done?

  1. It is possible to have the PVCs changed to EmptyDir volumes once a log uploader is created. This will help with the issue that currently PVCs are being created and not cleaned up.

Co-authored-by: Christie Wilson [email protected]

@knative-prow-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: aaron-prindle

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@knative-prow-robot knative-prow-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 18, 2018
@knative-prow-robot knative-prow-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Oct 18, 2018
Copy link
Contributor

@tejal29 tejal29 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some more notes on what happens to the claimed space once the task is done?

func GetCopyStep() corev1.Container {
return corev1.Container{
Name: "place-tools",
Image: "gcr.io/k8s-prow/entrypoint",
Copy link
Contributor

@tejal29 tejal29 Oct 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Lets pin this image to sha.

},
)
if err != nil {
return nil, fmt.Errorf("failed to create PVC %q: %s", tr.Name, err)
Copy link
Contributor

@tejal29 tejal29 Oct 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Can we explain it more like "failed to claim Persistent Volume due to error "

const (
// MountName is the name of the pvc being mounted (which
// will contain the entrypoint binary and eventually the logs)
MountName = "tools"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aaron-prindle and I talked about some follow-up work to randomize the name of the mount and the volume so that they won't conflict with any of the user's config - I'll create a follow up ticket!

}

// TODO: add more test cases after all, e.g. with existing env
// var and volume mounts
Copy link
Collaborator

@bobcatfish bobcatfish Oct 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • you can remove this comment for now i think @aaron-prindle , we can cover this in the follow up task ill make

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping you can remove the TODO

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol okay we can remove these TODOs later, i dont want to block this any longer

// TODO: This will not work when a step uses an image that has its
// own entrypoint, i.e. `Command` is a required field. In later iterations
// we can update the controller to inspect the image's `Entrypoint`
// and use that if required.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we'll need a follow up ticket for this too - could you write that one up @aaron-prindle ?

Copy link
Contributor Author

@aaron-prindle aaron-prindle Oct 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done: #175

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plz remove todo (or update with issue #)

return nil, fmt.Errorf("Task has nil BuildSpec")
// TODO: additional validation in Task webhook:
// No mounts with same names / path
// No env var with the name we'll be adding
Copy link
Collaborator

@bobcatfish bobcatfish Oct 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • you can remove these comments too, i think we can handle this with randomization instead

}
// 3. check that volume was created
pvc := kubeclient.Actions()[0].(ktesting.CreateAction).GetObject().(*corev1.PersistentVolumeClaim)
// TODO(aaron-prindle) make these tests verify exact pvc spec, not just name
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should probably add this before we merge @aaron-prindle

ClaimName: volumeName,
},
},
},

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is bSpec.Volumes guaranteed to be empty at this point or should this new volume be appended instead?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point @tanner-bruce , we should definitely fix this before merging @aaron-prindle

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch, done!

// panic: cannot handle unexported field: {v1.ResourceRequirements}.Requests["storage"].i
// consider using AllowUnexported or cmpopts.IgnoreUnexported [recovered]

// if d := cmp.Diff(pvc.Spec.Resources, expectedVolume.Spec.Resources); d != "" {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could assert on the individual fields in Resources - cmp.Diff is probably trying to assert on every single field in the object, including fields which aren't exposed.

you just need to look at the values we set in:
image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@knative-prow-robot knative-prow-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 22, 2018
Copy link
Collaborator

@bobcatfish bobcatfish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dlorenc found some big issues we need to address before merging, I added a couple comments but he can update if he finds more

Thanks for being our guinea pig @dlorenc 🙏 ❤️ 😰

func (c *Reconciler) createBuild(tr *v1alpha1.TaskRun) (*buildv1alpha1.Build, error) {
// GetExpectedVolume returns the volume template used by createVolume
// this is exported for testing
func GetExpectedVolume(tr *v1alpha1.TaskRun) *corev1.PersistentVolumeClaim {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't use this in the tests - if we are using exactly the same code to test the code we are using, there is no reason to test it.

In the test we should only assert on the fields we care about. I'd rather have less coverage in the test than do this

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we plz rename this and not use it in the tests? assert only on fields we care about? i.e. access mode + resource size? or not assert on those at all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -209,6 +250,29 @@ func (c *Reconciler) createBuild(tr *v1alpha1.TaskRun) (*buildv1alpha1.Build, er
return nil, fmt.Errorf("Task %s has nil BuildSpec", t.Name)
}

bSpec := t.Spec.BuildSpec
bSpec.Volumes = append(bSpec.Volumes,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dlorenc tried this out and if there is an error creating the build, we end up appending this multiple times and trying to create the same PVC multiple times - and wrapping the entry point multiple times too!! 🤣

We need to:

  • not create the PVC if it already exists, maybe we can try to reuse it
  • not append this volume multiple times, - copy the buildspec before modifying it - we shouldn't mutate the task at all!!

We should add a test case somewhere to cover this, we can probably use the reconciler tests and avoid making this a full fledged system test

@@ -209,6 +250,29 @@ func (c *Reconciler) createBuild(tr *v1alpha1.TaskRun) (*buildv1alpha1.Build, er
return nil, fmt.Errorf("Task %s has nil BuildSpec", t.Name)
}

bSpec := t.Spec.BuildSpec
Copy link
Contributor

@dlorenc dlorenc Oct 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I think this needs to be DeepCopy()

out.txt Outdated
ok github.com/knative/build-pipeline/pkg/reconciler/v1alpha1/taskrun/resources (cached)
? github.com/knative/build-pipeline/pkg/system [no test files]
ok github.com/knative/build-pipeline/test (cached)
? github.com/knative/build-pipeline/test/gohelloworld [no test files]
Copy link
Collaborator

@bobcatfish bobcatfish Oct 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • i think this was committed by accident

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

@@ -209,6 +265,28 @@ func (c *Reconciler) createBuild(tr *v1alpha1.TaskRun) (*buildv1alpha1.Build, er
return nil, fmt.Errorf("Task %s has nil BuildSpec", t.Name)
}

bSpec := t.Spec.BuildSpec.DeepCopy()
bSpec.Volumes = appendIfMissing(bSpec.Volumes, corev1.Volume{
Copy link
Collaborator

@bobcatfish bobcatfish Oct 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • since we're copying the buildspec before modifying it, we should be able to append the volume every time? it should never be present

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed appendIfMissing( fxn and logic -> append( now

Copy link
Contributor

@tejal29 tejal29 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm except for 2 nits.

@@ -154,8 +151,20 @@ func (c *Reconciler) reconcile(ctx context.Context, tr *v1alpha1.TaskRun) error
// get build the same as the taskrun, this is the value we use for 1:1 mapping and retrieval
build, err := c.BuildClientSet.BuildV1alpha1().Builds(tr.Namespace).Get(tr.Name, metav1.GetOptions{})
if errors.IsNotFound(err) {
pvc, err := c.KubeClientSet.CoreV1().PersistentVolumeClaims(tr.Namespace).Get(tr.Name, metav1.GetOptions{})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move L154 to createPVC and name is getOrCreatePVC.

This block is getting longer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do this another PR


// Override the entrypoint so that we can use our custom
// entrypoint which copies logs
err = resources.AddEntrypoint(bSpec.Steps)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we change the Concepts.doc to mention we no longer depend on a BuilderImage ?
You can also add a note for it and then we can decide what this image should be.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand, I can update the doc to explain the entrypoint log change but we still depend on a BuilderImage if my understanding is correct

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've update this doc with entrypoint info

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing @tejal29 is talking about the fact that the builder image contract actually says to do the opposite of what we want - it says we want ppl to define an entrypoint when in fact we definitely do not want that

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created some docs but I'm having a hard time adding them to this PR

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm failing at this, I'll just make a new PR with the doc changes

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@knative-prow-robot knative-prow-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 23, 2018
What is the problem being solved?
This PR addresses issue #143.  This issue is that it when Build's complete, logs from the steps the Build ran are garbage collected by kubernetes and no longer available to the user.

Why is this the best approach?
1) No special kubernetes configuration (eg. changing garbage collector values)

What other approaches did you consider?
1) Changing kubernetes garbage collection for these containers so that they are not immediately deleted and log capture was possible

What side effects will this approach have?
1) With this approach, users will have to specify the "Command" value for the containers they which to run as the Entrypoint is not retrievable.  This means that containers/flows setup to use only the Entrypoint will no longer be supported.

What future work remains to be done?
1) It is possible to have the PVCs changed to EmptyDir volumes once a log uploader is created.  This will help with the issue that currently PVCs are being created and not cleaned up.

Co-authored-by: Christie Wilson <[email protected]>
entrypointJSONConfigEnvVar = "ENTRYPOINT_OPTIONS"
EntrypointImage = "gcr.io/k8s-prow/entrypoint@sha256:7c7cd8906ce4982ffee326218e9fc75da2d4896d53cabc9833b9cc8d2d6b2b8f"
)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we probably should not be using an image from Prow for our running pipeline system, maybe have out own?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could, esp. if we feel we can't rely on this image.

but in the meantime, i think it's easier if they manage it's lifecycle? one less thing we have to worry about! but once we start having a release process i guess we should probably release this image too 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'll want our own anyway, to add features like being able to specify the entrypoint. BUilding the image ourself also makes it easy to build into the config YAMLs with ko, and makes it easy to make sure it's cached on every node

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for now its fine, but once we have a release like you said, we should be using our own to make sure it does't change on us

bobcatfish added a commit that referenced this pull request Oct 24, 2018
The Builder image contract is actually the opposite of what we now
support - it specifies that users should define an `entrypoint` and rely
on that, however the new behaviour in #167 will completely ignore this
`entrypoint`.

These docs are a copy of the Builder image docs at
https://github.com/knative/docs/blob/master/build/builder-contract.md
but modified to describe the contract required by Task.
@bobcatfish
Copy link
Collaborator

/lgtm

@knative-prow-robot knative-prow-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 24, 2018
@knative-prow-robot knative-prow-robot merged commit 6f6b071 into tektoncd:master Oct 24, 2018
bobcatfish added a commit to bobcatfish/pipeline that referenced this pull request Oct 24, 2018
The Builder image contract is actually the opposite of what we now
support - it specifies that users should define an `entrypoint` and rely
on that, however the new behaviour in tektoncd#167 will completely ignore this
`entrypoint`.

These docs are a copy of the Builder image docs at
https://github.com/knative/docs/blob/master/build/builder-contract.md
but modified to describe the contract required by Task.
bobcatfish added a commit to bobcatfish/pipeline that referenced this pull request Oct 24, 2018
The Builder image contract is actually the opposite of what we now
support - it specifies that users should define an `entrypoint` and rely
on that, however the new behaviour in tektoncd#167 will completely ignore this
`entrypoint`.

These docs are a copy of the Builder image docs at
https://github.com/knative/docs/blob/master/build/builder-contract.md
but modified to describe the contract required by Task.

This is for tektoncd#143
knative-prow-robot pushed a commit that referenced this pull request Oct 24, 2018
The Builder image contract is actually the opposite of what we now
support - it specifies that users should define an `entrypoint` and rely
on that, however the new behaviour in #167 will completely ignore this
`entrypoint`.

These docs are a copy of the Builder image docs at
https://github.com/knative/docs/blob/master/build/builder-contract.md
but modified to describe the contract required by Task.

This is for #143
vdemeester referenced this pull request in openshift/tektoncd-pipeline Apr 5, 2019
The Builder image contract is actually the opposite of what we now
support - it specifies that users should define an `entrypoint` and rely
on that, however the new behaviour in #167 will completely ignore this
`entrypoint`.

These docs are a copy of the Builder image docs at
https://github.com/knative/docs/blob/master/build/builder-contract.md
but modified to describe the contract required by Task.

This is for #143
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants