Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: mark node failed if pod absent. Fixes #12993 #13454

Merged
merged 7 commits into from
Aug 15, 2024

Conversation

isubasinghe
Copy link
Member

@isubasinghe isubasinghe commented Aug 12, 2024

Fixes #12993

Motivation

If the pod is absent and the node has not been completed, we should mark the node as failed after a timeout and mark the workflowtaskresult as completed.

Modifications

Verification

@isubasinghe isubasinghe changed the title fix: mark node failed if pod absent fix: mark node failed if pod absent. Fixes #12993 Aug 12, 2024
@isubasinghe isubasinghe marked this pull request as ready for review August 12, 2024 10:07
@isubasinghe isubasinghe requested a review from Joibel August 12, 2024 10:07
if foundPod {
woc.log.Debugf("Got pod %s with phase %s for task result %s and node id %s with label %s", pod.Name, pod.Status.Phase, resultName, result.Name, label)
woc.log.Debugf("The node phase was %s for node named %s", node.Phase, node.Name)
} else if !foundPod && !node.Completed() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could technically hit this case where a pod exists but hasn't made it to the cache yet.

@isubasinghe
Copy link
Member Author

Also @Joibel note that I haven't been able to reproduce any of the open source issues.
So this hasn't been tested against #12993

@Joibel Joibel self-assigned this Aug 12, 2024
workflow/controller/taskresult.go Outdated Show resolved Hide resolved
workflow/controller/taskresult.go Outdated Show resolved Hide resolved
workflow/controller/taskresult.go Outdated Show resolved Hide resolved
workflow/controller/taskresult.go Outdated Show resolved Hide resolved
workflow/controller/taskresult.go Outdated Show resolved Hide resolved
workflow/controller/taskresult.go Outdated Show resolved Hide resolved
workflow/controller/taskresult.go Outdated Show resolved Hide resolved
workflow/controller/taskresult.go Outdated Show resolved Hide resolved
workflow/controller/taskresult.go Outdated Show resolved Hide resolved
@Joibel Joibel merged commit 36b7a72 into argoproj:main Aug 15, 2024
28 checks passed
@agilgur5 agilgur5 added this to the v3.5.x patches milestone Aug 15, 2024
@agilgur5 agilgur5 added the area/controller Controller issues, panics label Aug 15, 2024
Joibel pushed a commit to pipekit/argo-workflows that referenced this pull request Sep 19, 2024
Joibel pushed a commit that referenced this pull request Sep 20, 2024
@Joibel Joibel deleted the fix-hung-workflows branch October 25, 2024 15:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/controller Controller issues, panics
Projects
None yet
Development

Successfully merging this pull request may close these issues.

v3.5.3: Workflow processing fails to complete due to incomplete WorkflowTaskResult from interrupted pod
3 participants