[BUG] Flyte tasks may fail if containerStatus is not present for primary container fast enough #4332
Closed
2 tasks done
Labels
bug
Something isn't working
Describe the bug
The DeterminePrimaryContainerPhase function fails if the primary container name is not found in the
containerStatuses
field on the k8s pod. However, in scenarios where the k8s api server is under heavy load thecontainerStatuses
field may not be fully populated even though the container does actually exist in the Pod. Before falling back to a permanent failure of the container is not found in thecontainerStatuses
field, we should check if it exists in thecontainer
field first.Expected behavior
Flyte should wait until the
containerStatuses
field is populated with the primary container name.Additional context to reproduce
No response
Screenshots
No response
Are you sure this issue hasn't been raised already?
Have you read the Code of Conduct?
The text was updated successfully, but these errors were encountered: