Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Flyte tasks may fail if containerStatus is not present for primary container fast enough #4332

Closed
2 tasks done
hamersaw opened this issue Oct 30, 2023 · 0 comments · Fixed by #4339
Closed
2 tasks done
Assignees
Labels
bug Something isn't working

Comments

@hamersaw
Copy link
Contributor

Describe the bug

The DeterminePrimaryContainerPhase function fails if the primary container name is not found in the containerStatuses field on the k8s pod. However, in scenarios where the k8s api server is under heavy load the containerStatuses field may not be fully populated even though the container does actually exist in the Pod. Before falling back to a permanent failure of the container is not found in the containerStatuses field, we should check if it exists in the container field first.

Expected behavior

Flyte should wait until the containerStatuses field is populated with the primary container name.

Additional context to reproduce

No response

Screenshots

No response

Are you sure this issue hasn't been raised already?

  • Yes

Have you read the Code of Conduct?

  • Yes
@hamersaw hamersaw added bug Something isn't working untriaged This issues has not yet been looked at by the Maintainers labels Oct 30, 2023
@hamersaw hamersaw self-assigned this Oct 31, 2023
@hamersaw hamersaw removed the untriaged This issues has not yet been looked at by the Maintainers label Oct 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant