Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Retrieve build stage status from the build run #280

Closed
wants to merge 2 commits into from

Conversation

bamachrn
Copy link

This PR addresses #67
Retrieves and lists the status of a build run with its stage details for different build strategies.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 23, 2020
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign sbose78
You can assign the PR to them by writing /assign @sbose78 in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot
Copy link

Hi @bamachrn. Thanks for your PR.

I'm waiting for a redhat-developer member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jun 23, 2020
@@ -64,8 +64,50 @@ type BuildRunStatus struct {
// BuildSpec is the Build Spec of this BuildRun.
// +optional
BuildSpec *BuildSpec `json:"buildSpec,omitempty"`

//stages containes the details about the stage for each build with stage start time
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upper case for the comment and miss a blank there

// name is a unique identifier for each stage of the builds
Name string `json:"name,omitempty"`

//status is timestamp indicating strting time for the perticular stage
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above comments pls.

buildRun.StageInfo.Name = buildStep.Name
buildRun.StageInfo.Status = buildStep.Terminated.Reason
buildRun.StageInfo.StartTime = buildStep.Terminated.StartedAt
//buildRun.StageInfo.DurationMilliSeconds = buildStep.Terminated.FinishedAt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you help remove this non-used DurationMilliSeconds setting and also add ut to cover that?

BTW, can you help provide an example what the result so that we can see what it looks like?

Thanks!

}

// StageInfo contains details about a build stage.
type StageInfo struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in build v1 / "classic" builds we recently moved away from a custom status type like this toward leveraging the k8s "Condition" pattern .... i.e. something like

To borrow / slightly tweak corev1.PodCondition

// various constants of this would be defined
type ConditionType string

type Condition struct {
	// Type is the type of the condition.
	// More info: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#pod-conditions
	Type ConditionType `json:"type" protobuf:"bytes,1,opt,name=type,casttype=PodConditionType"`
	// Status is the status of the condition.
	// Can be True, False, Unknown.

	Status corev1.ConditionStatus `json:"status" protobuf:"bytes,2,opt,name=status,casttype=ConditionStatus"`
	// Last time we probed the condition.
	// +optional
	LastProbeTime metav1.Time `json:"lastProbeTime,omitempty" protobuf:"bytes,3,opt,name=lastProbeTime"`
	// Last time the condition transitioned from one status to another.
	// +optional
	LastTransitionTime metav1.Time `json:"lastTransitionTime,omitempty" protobuf:"bytes,4,opt,name=lastTransitionTime"`
	// Unique, one-word, CamelCase reason for the condition's last transition.
	// +optional
	Reason string `json:"reason,omitempty" protobuf:"bytes,5,opt,name=reason"`
	// Human-readable message indicating details about last transition.
	// +optional
	Message string `json:"message,omitempty" protobuf:"bytes,6,opt,name=message"`
}

I would also contend adding a creation time to that condition type could be useful for our purposes.

In the "learn from recent experiences" category, I think buildv2/k8s builds should start off this way

thoughts @sbose78 @adambkaplan @otaviof ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should definitely move towards Conditions in our status reporting. The ecosystem has adopted duck-typed Conditions, so using this pattern would unlock a lot of tooling like kubectl wait [1].

Upstream has refined their guidance on what conditions types should exist. The more interesting thing to note is the preference towards "abnormal-true" polarity [2]. My interpretation is that "best practice" is to use conditions to surface why something is not working properly. There are, obviously, exceptions to the rule, like what OpenShift does with ClusterOperator [3].

[1] https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#wait
[2] https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#typical-status-properties
[3] https://github.com/openshift/cluster-version-operator/blob/master/docs/dev/clusteroperator.md#what-should-an-operator-report-with-clusteroperator-custom-resource

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah ha good references @adambkaplan thanks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adambkaplan @gabemontero thanks for the links. Very useful.

What we do at the moment is actually take the Tekton conditions for a particular TaskRun and put some of those conditions into a BuildRun Status.

If we do Conditions instead of Status for the BuildRun we will eventually have a TaskRun replication into a BuildRun, so one of the questions I have is, where do we put the limits in terms of what we copy from a TaskRun? Keep in mind that a Build user can get access to the TaskRun resources all the time.

I think my above question also applies to this PR. Opinions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qu1queee I think replicating Tekton's status is a good start for this PR. And incrementally, we can evolve the Conditions reported by build-v2 operator to more meaningful messages.

Using Conditions is the "Kubernetes way" of handling change in status reported by an operator, with them is possible to use kubectl wait command to wait for a certain condition to happen, like in our case, a successful image build (BuildRun).

So for me is +1 on pursuing the Conditions, in short.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will open an issue for this, you guys are so right :)

@openshift-ci-robot
Copy link

@bamachrn: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 26, 2020
Copy link
Contributor

@qu1queee qu1queee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @bamachrn , thanks for the PR. I´m requesting changes and also asking for:

@@ -64,8 +64,50 @@ type BuildRunStatus struct {
// BuildSpec is the Build Spec of this BuildRun.
// +optional
BuildSpec *BuildSpec `json:"buildSpec,omitempty"`

//stages containes the details about the stage for each build with stage start time
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

containes -> contains?

Status string `json:"status, omitempty"`

//startTime is timestamp indicating strting time for the perticular stage
StartTime metav1.Time `json:"startTime, omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the space between startTime, omitempty is making my go linter to report some warning, can you avoid this. Similar to all the other structs. Same case for Status field, and DurationMilliSeconds field. Thanks.

}

// StageName is the unique identifier for each build stage.
type StageName string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you using somewhere StageName constants?

StartTime metav1.Time `json:"startTime, omitempty"`

//durationMilliseconds
DurationMilliSeconds int64 `json:"durationMilliSencods, omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems DurationMilliSeconds is not used, maybe we should remove this field?

for iter := range lastTaskRun.Status.Steps {
buildStep := lastTaskRun.Status.Steps[iter]
buildRun.StageInfo.Name = buildStep.Name
buildRun.StageInfo.Status = buildStep.Terminated.Reason
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this code, I run into panic errors, see:

E0630 10:57:26.791316   41267 runtime.go:78] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 1032 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic(0x212f9a0, 0x30d9b70)
	/Users/encalada/Documents/Bluemix/go/src/github.com/redhat-developer/build/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:74 +0xa3
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
	/Users/encalada/Documents/Bluemix/go/src/github.com/redhat-developer/build/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:48 +0x82
panic(0x212f9a0, 0x30d9b70)
	/usr/local/Cellar/go/1.14.2_1/libexec/src/runtime/panic.go:969 +0x166
github.com/redhat-developer/build/pkg/controller/buildrun.(*ReconcileBuildRun).Reconcile(0xc0005c7800, 0xc00086aaa0, 0x7, 0xc0005be400, 0x16, 0x0, 0x0, 0x0, 0x0)
	/Users/encalada/Documents/Bluemix/go/src/github.com/redhat-developer/build/pkg/controller/buildrun/buildrun_controller.go:181 +0x5ef
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc0006da540, 0x2189040, 0xc00066c0a0, 0xc00056ce00)
	/Users/encalada/Documents/Bluemix/go/src/github.com/redhat-developer/build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:256 +0x161
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc0006da540, 0xc00056cf00)
	/Users/encalada/Documents/Bluemix/go/src/github.com/redhat-developer/build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:232 +0xae
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker(0xc0006da540)
	/Users/encalada/Documents/Bluemix/go/src/github.com/redhat-developer/build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:211 +0x2b
k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc00069c560)
	/Users/encalada/Documents/Bluemix/go/src/github.com/redhat-developer/build/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:152 +0x5f
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc00069c560, 0x3b9aca00, 0x0, 0x1, 0xc000622a20)
	/Users/encalada/Documents/Bluemix/go/src/github.com/redhat-developer/build/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:153 +0xf8
k8s.io/apimachinery/pkg/util/wait.Until(0xc00069c560, 0x3b9aca00, 0xc000622a20)
	/Users/encalada/Documents/Bluemix/go/src/github.com/redhat-developer/build/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88 +0x4d
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1
	/Users/encalada/Documents/Bluemix/go/src/github.com/redhat-developer/build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:193 +0x305
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x1fde08f]

I did the following without any modifications to the sample files:

  • k apply -f samples/build/build_kaniko_cr.yaml
  • k apply -f samples/buildrun/buildrun_kaniko_cr.yaml

This PR requires unit testing.

// StageInfo contains details about a build stage.
type StageInfo struct {
// name is a unique identifier for each stage of the builds
Name string `json:"name,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is omitempty, we should have a +optional tag as well. The same goes for other endpoints.

@qu1queee
Copy link
Contributor

Closing stale PR, please feel free to re-open if needed.

@qu1queee qu1queee closed this Nov 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants