Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Conditions in the BuildRun resource #515

Merged

Conversation

qu1queee
Copy link
Contributor

@qu1queee qu1queee commented Dec 9, 2020

Fixes #307

In a nutshell this PR adds support for Conditions, but keep the old fields Status.Reason and Status.Succeeded for backwards compatibility.(we should remove them at some point in the future).

Based on each commit message(6 commits), this PR is doing the following changes:

  • Add condition test cases to integration test. Adds test cases for the three main status condition:

    • unknown (ramp-up and early phases of buildrun)
    • true (buildrun runs through fine)
    • false (error scenarios like timeout or misconfiguration)
  • Introduce conditions for buildruns

    • Add function to translate the taskrun condition into a buildrun one.
  • Refactor build and buildrun timeout logic

    • Externalize logic to get the effective timeout from build and buildrun tuple into a separate function.
  • Change CRD printer columns to use conditions

    • Change Succeeded column to use the Succeeded condition Status field.
    • Change Reason column to use the Succeeded condition Ready field.
    • Add Message column based on the Succeeded condition Message field.
  • Update docs with new printer output

    • Update buildrun.md documentation with a new example that shows the new columns.
  • Update e2e tests with BuildRun Conditions

    • Small refactoring to ensure we are asserting the fields of the Conditions object, rather than the old Status fields

If you are wondering how this looks:

For Succeeded BuildRuns:

image

For Failed BuildRuns:

image

On this scenario, we introduced a new field, that will be only populated on failures. This is useful for third-parties that wanna get the exact container that failed on the pod, by consuming the values of this new field(status.failedAt)

image

For BuildRuns that timedOut:

image

Add test cases for the three main status condition:
- unknown (ramp-up and early phases of buildrun)
- true (buildrun runs through fine)
- false (error scenarios like timeout or misconfiguration)

Signed-off-by: Enrique Eduardo Encalada Olivas <[email protected]>
Add function to translate the taskrun condition into a buildrun one.

Signed-off-by: Enrique Eduardo Encalada Olivas <[email protected]>
Externalize logic to get the effective timeout from build and builrun
tuple into a separate function.

Signed-off-by: Enrique Eduardo Encalada Olivas <[email protected]>
@adambkaplan
Copy link
Member

Created #517 to track the removal of these redundant fields. I've also created a master tracking issue for breaking API changes we want to consider: #516.

case v1beta1.TaskRunReasonTimedOut:
reason = "BuildRunTimeout"
message = fmt.Sprintf("BuildRun %s failed to finish within %s",
taskRun.GetLabels()[buildv1alpha1.LabelBuildRun],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit - we probably don't need the name since in the message will be inside the BuildRun object.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a user that is looking at the Conditions directly from the BuildRun I agree, but for the scenarios where someone is building a workflow on top(e.g. Shipwright/CLI), this will help to make the message very explicit. Therefore I would opt to not change this.

)

case v1beta1.TaskRunReasonFailed:
if taskRun.Status.CompletionTime != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the TaskRun is not complete? Would we expect this to be a time out error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory no. If the TaskRun Condition.Reason is Failed, this means that something failed in the Pod and Completion will be set always after the Failed is set in the TaskRun Condition.Reason. You can dive in the code to verify that part.

Also, Tekton have this nice table that illustrates the above, see table from tekton. You will find there that whevener the Reason is Failed, the CompletionTime is set.

Therefore, to answer your question, a TaskRun cannot be Failed and "not complete" .

}

if failedContainer != nil {
message = fmt.Sprintf("buildrun step failed in pod %s, for detailed information: kubectl --namespace %s logs %s --container=%s",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we bubble up terminated state message from the pod here, rather than a "look at the logs" message? https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.20/#containerstateterminated-v1-core

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adambkaplan interesting, let me play with that tomorrow, and then I can give you a proper answer. Note: This constructed message is very similar to what Tekton does on a failed Taskrun, it provides you some metadata + the kubectl cmd to execute.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adambkaplan I took a look on this, this will not work, while Tekton does not exposes internal results. They override the message with InternalTektonResult, this will look like:

    state:
      terminated:
        containerID: containerd://fd0627262107d2339abc75261afd4ca66a91f1b6a3f19e2a3ae68a9ff817138e
        exitCode: 1
        finishedAt: "2020-12-14T17:06:19Z"
        message: '[{"key":"StartedAt","value":"2020-12-14T17:06:19.706Z","type":"InternalTektonResult"}]'
        reason: Error
        startedAt: "2020-12-14T17:06:13Z"

I think the current approach is good for us, also it will help users to move into the right direction, which is "getting the logs of the failed step".

deploy/crds/build.dev_buildruns_crd.yaml Outdated Show resolved Hide resolved
pkg/apis/build/v1alpha1/buildrun_types.go Outdated Show resolved Hide resolved
@qu1queee qu1queee force-pushed the br_conditions_imp branch 4 times, most recently from 531be8a to 7d9b581 Compare December 15, 2020 14:22
Copy link
Member

@SaschaSchwarze0 SaschaSchwarze0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sry for the late feedback. Looks good. Just a few suggestions.

pkg/controller/buildrun/buildrun_controller.go Outdated Show resolved Hide resolved
pkg/controller/buildrun/buildrun_controller.go Outdated Show resolved Hide resolved
test/e2e/validators.go Outdated Show resolved Hide resolved
HeavyWombat and others added 3 commits December 16, 2020 21:42
Change `Succeeded` column to use the Succeeded condition Status field.

Change `Reason` column to use the Succeeded condition Ready field.

Add `Message` column based on the Succeeded condition Message field.
Update `buildrun.md` documentation with a new example that shows the new columns.
Small refactoring to ensure we are asserting the fields
of the Conditions object, rather than the old Status fields

Signed-off-by: Matthias Diester <[email protected]>
Copy link
Member

@SaschaSchwarze0 SaschaSchwarze0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Dec 17, 2020
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: SaschaSchwarze0

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 17, 2020
@openshift-merge-robot openshift-merge-robot merged commit 72c80ce into shipwright-io:master Dec 17, 2020
@SaschaSchwarze0
Copy link
Member

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: SaschaSchwarze0

The full list of commands accepted by this bot can be found here.

The pull request process is described here
Needs approval from an approver in each of these files:

Sry @adambkaplan, this was not desired. Wanted to wait for your feedback. I still do not understand why the approved tag is added for me when I did the GitHub approve vs this did not happen for Jordan. If my /lgtm at the same time is the cause, then this is not intuitive.

@gabemontero
Copy link
Member

[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: SaschaSchwarze0
The full list of commands accepted by this bot can be found here.
The pull request process is described here
Needs approval from an approver in each of these files:

Sry @adambkaplan, this was not desired. Wanted to wait for your feedback. I still do not understand why the approved tag is added for me when I did the GitHub approve vs this did not happen for Jordan. If my /lgtm at the same time is the cause, then this is not intuitive.

Don't know if you got the answer to your question @SaschaSchwarze0 but a lgtm from an approver in the OWNERS file adds both lgtm and approved labels.... your ID is listed as an approver in https://github.com/shipwright-io/build/blob/master/OWNERS for example, at least with the latest version

I don't know the precise github wiring that makes this happen but it is the same in the openshift repos.

By comparison, @zhangtbj 's review approval via #515 (review) is not the same as an PR lgtm or approve wrt the github wiring I am referring to.

@gabemontero
Copy link
Member

I've added discussing any post merge actions to the Jan 11 community agenda ^^

@gabemontero
Copy link
Member

[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: SaschaSchwarze0
The full list of commands accepted by this bot can be found here.
The pull request process is described here
Needs approval from an approver in each of these files:

Sry @adambkaplan, this was not desired. Wanted to wait for your feedback. I still do not understand why the approved tag is added for me when I did the GitHub approve vs this did not happen for Jordan. If my /lgtm at the same time is the cause, then this is not intuitive.

Don't know if you got the answer to your question @SaschaSchwarze0 but a lgtm from an approver in the OWNERS file adds both lgtm and approved labels.... your ID is listed as an approver in https://github.com/shipwright-io/build/blob/master/OWNERS for example, at least with the latest version

I don't know the precise github wiring that makes this happen but it is the same in the openshift repos.

Duh, forgot the automatic approve process provides a link to the process and how things work. The lgtm from owner adding approve quirk is discussed at https://github.com/kubernetes/community/blob/master/contributors/guide/owners.md#quirks-of-the-process

This is all part of the k8s prow infrastructure used by k8s and openshift, which has been adopted for our repo.

By comparison, @zhangtbj 's review approval via #515 (review) is not the same as an PR lgtm or approve wrt the github wiring I am referring to.

@zhangtbj
Copy link
Contributor

zhangtbj commented Jan 9, 2021

Interesting. I am not sure if I operated something wrong before.

As I know, there are many reviewers for a PR. and if I review this PR or be asked to review this PR and after it looks good to me. I will or what I thought:

  • Approve my review by using below operation then wait for other's review (At that time, no approved label is added)

Screen Shot 2021-01-09 at 1 37 48 PM

  • If other reviewers review this PR and can also approve by using above way, like Sascha did
  • Then, someone can mark /lgtm on the issue
  • If all agree or no other comment, then the approver can mark /approve on the issue to accept the PR

But seems the last action is done automatically if approver approve the PR before?

But for sure, let us discuss and confirm the correct process in next meeting.

Thanks!

@SaschaSchwarze0
Copy link
Member

[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: SaschaSchwarze0
The full list of commands accepted by this bot can be found here.
The pull request process is described here
Needs approval from an approver in each of these files:

Sry @adambkaplan, this was not desired. Wanted to wait for your feedback. I still do not understand why the approved tag is added for me when I did the GitHub approve vs this did not happen for Jordan. If my /lgtm at the same time is the cause, then this is not intuitive.

Don't know if you got the answer to your question @SaschaSchwarze0 but a lgtm from an approver in the OWNERS file adds both lgtm and approved labels.... your ID is listed as an approver in https://github.com/shipwright-io/build/blob/master/OWNERS for example, at least with the latest version

I don't know the precise github wiring that makes this happen but it is the same in the openshift repos.

By comparison, @zhangtbj 's review approval via #515 (review) is not the same as an PR lgtm or approve wrt the github wiring I am referring to.

Yeah, probably, for somebody who worked on the OSS space for some time, this is normal. For somebody new it is imo not intuitive: there are the /lgtm and /approve commands and there are the lgtm and approved labels. Intuitively, I guess that 99 % assume a 1-1 relationship between the command and the label that have the same name. This is not the case here.

Anyway, joking now, software engineers often tend to try to type as less as possible while still doing as much as possible. I personal am capable and have the time to write both commands. But anyway, software engineers also need to be capable of learning non-logical things as they exist everywhere. Cheers.

@qu1queee qu1queee deleted the br_conditions_imp branch February 15, 2021 19:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Move to Conditions instead of Status in the BuildRun
8 participants