-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PipelineRun status does not accurately reflect the state of the runtime #2268
Comments
I'm going to assign this a "bug" label for the PVC issue - that looks to me like we should do a better job of detecting the failure. Although I'm actually not that clear whether the pod itself has entered into a completely failed state here or is re-attempting. With the configmap example I'm not really sure what the best approach would be. I guess we could indicate some kind of Initializing state but then again it's quite possible that some TaskRun pods are executing fine while others in the same PipelineRun are still booting up or experiencing issues. In that case it's a little confusing to call the PipelineRun "Initializing". |
I've been looking into this issue. In both cases a pod is waiting for some external resource (pvc, configmap) to become available. I've been thinking about inventing a new PipelineRun condition Reason for this condition. The problem is that as @sbwsg points out, some tasks may be running while some may be waiting.
The other question is what to do with the PR condition's message in this case.
(This used to be 'Not all Tasks in the Pipeline have finished executing' as shown in the issue description.) One option is to change it to decompose incomplete into running and waiting.
Taskruns with ConditionUnknown and Reason Running are running. However this won't tell you which TaskRuns are waiting. You'd have to scan the taskrun conditions to find that. Unfortunately there isn't a consistent Reason used in the Taskruns. So the other option is to use a different message when there is at least one waiting taskrun.
where it would tell you the taskrun and it's message. If there is more than one waiting taskrun, you either just get one of them, or I guess we could list them all. It might get lengthy. What do you think @sbwsg? @afrittoli? |
My 2c is something like
|
@dibyom @pritidesai just on #1684 and how it relates to this. Error, or broadly something reflecting this state seems like something to consider in that design as a terminal state. Although another one here is how do the |
Stale issues rot after 30d of inactivity. /lifecycle rotten Send feedback to tektoncd/plumbing. |
Issues go stale after 90d of inactivity. /lifecycle stale Send feedback to tektoncd/plumbing. |
Rotten issues close after 30d of inactivity. /close Send feedback to tektoncd/plumbing. |
@tekton-robot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Expected Behavior
PipelineRuns and TaskRuns status should reflect the state of the runtimes that make up a pipeline.
Actual Behavior
There are some cases where a pod will not run but the pipelineRun and taskRun status does not entirely reflect this (although the information is in there). This may be by design given the pods are recoverable but worth raising as a question at least.
For example
Sample resources with missing config map
We end up with a pipeline run which appears to be running but on inspecting the taskRun message we see is stuck with the message
Message: build step "step-task-one-step-one" is pending with reason "configmap \"environment-properties\" not found"
and reasonReason: CreateContainerConfigError
(standalone taskRun message and pipelineRun -> taskRun message both show this)This does mean observing the pipelineRun you don't get an accurate view of its state given it will never complete. However, if the configmap was to be created the pod would recover and run.
PipelineRun
Similar if a pvc is missing.
Sample missing pvc resources
The taskRun/pod relationship is the same as above for the config map example.
There are other cases where a pod will fail to schedule and the pipelineRun will be ambiguous around the state of the run, for example taints or resource limits stopping the pod from scheduling.
Steps to Reproduce the Problem
Additional Info
Kubernetes version:
v1.16
Tekton Pipeline version:
v0.10.1
I'm not sure this is a legitimate issue given all these states are recoverable and this recovery is handled at the pod level, it does mean that at a given point for a pipelineRun to say its running is not accurate. It may also tie into #1684 given we are considering creating dependencies on the status of these resources.
EDIT: Should have spotted this, the pvc case looks like a legitimate issue given theres an event indicating it failed
The text was updated successfully, but these errors were encountered: