-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TaskRun fails with recoverable mount error #6960
Comments
I can help with the fix... |
Issues go stale after 90d of inactivity. /lifecycle stale Send feedback to tektoncd/plumbing. |
/remove-lifecycle stale |
My team has tried to recover from a
status:
conditions:
- lastTransitionTime: "2024-03-22T18:09:56Z"
message: Failed to create pod due to config error
reason: CreateContainerConfigError
status: "False"
type: Succeeded
startTime: "2024-03-22T18:09:40Z"
steps:
- container: step-check-step
name: check-step
waiting:
message: secret "oci-store" not found
reason: CreateContainerConfigError In that waiting (but failed status) state, we tried to provide the correct configuration to pull an image, but the task never recovered. We had a pipeline tied to the task (it spawned the task), and it was in a terminated/failed/non-waiting/non-recoverable state. We also went the other way, and waited for the pod to timeout while waiting, but the I wonder, @RafaeLeal, you mentioned that the TaskRun fails, and pod recovers, but too late. In that state, is the TaskRun terminated at that point, with a completionTime, or is it still in waiting? I wonder if your problem is the same as ours, or if we need to make a separate issue. |
There are a few other similar issues, some closed due to inactivity, but this issue (#6960) seems closest to what my team is seeing. |
Expected Behavior
TaskRun's pods should be able to recover from transient mount errors
Actual Behavior
When such an error occurs, the pod gets the state
CreateContainerConfigError
then the TaskRun fails.Often the pod recovers, but it's too late.
This behavior was introduced in #1907
Steps to Reproduce the Problem
Not sure exactly how to reproduce this, but we have a fairly big Tekton cluster and it happens quite often with a volume that uses AWS EFS.
What happens it's that we notice the pod status like this:
Additional Info
Kubernetes version:
Output of
kubectl version
:Tekton Pipeline version:
Output of
tkn version
orkubectl get pods -n tekton-pipelines -l app=tekton-pipelines-controller -o=jsonpath='{.items[0].metadata.labels.version}'
The text was updated successfully, but these errors were encountered: