Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Only apply execution control to nodes that are not part of exit handler. #13016

Merged
merged 3 commits into from
Aug 11, 2024

Conversation

jswxstw
Copy link
Member

@jswxstw jswxstw commented May 7, 2024

Fixes #13060
Fixes #13252

Motivation

  1. This PR feat: Support argo plugin stop. Fixes #12333 #12441 has support plugin nodes shutdown, but HTTP node still not.

  2. Fixes _ controller sending many pod delete requests that result in 404 response #12659 Split to PR: fix: Only cleanup agent pod if exists. Fixes #12659 #13294

  3. Fixes onExit steps executed after deadline exceeded are stopped if an onExit pod spends >=10 seconds in Pending state #13060 Fixes workflow deadline is not ignored for exit handler #13252

Modifications

Modification 1

Execution control has been applied to the nodes with created pods after pod reconciliation. However, pending and suspended nodes do not have created pods, and taskset nodes use the agent pod. Mark these nodes failed when shutting down or exceeding deadline since pod reconciliation does not take effect on them.

Modification 2

Agent pod will only be created when taskset nodes exist, so do not delete it if it does not exist.
Split to PR: #13294

Modification 3

Only apply execution control for nodes that are not part of exit handler:

  • failNodesWithoutCreatedPodsAfterDeadlineOrShutdown: Pending nodes will not be failed if they are part of exit handler.
  • createWorkflowPod: Do not set ARGO_DEADLINE if it is onExitPod.

Verification

local test and e2e tests

@agilgur5 agilgur5 self-assigned this May 7, 2024
@agilgur5 agilgur5 added the area/agent Argo Agent that runs for HTTP and Plugin templates label May 7, 2024
@jswxstw jswxstw force-pushed the fix-taskset-nodes-with-agent-pod branch from 1399849 to 6cba1c9 Compare May 9, 2024 06:01
@jswxstw jswxstw marked this pull request as draft May 21, 2024 09:19
@jswxstw jswxstw force-pushed the fix-taskset-nodes-with-agent-pod branch 2 times, most recently from 5d5c40c to 21de6e7 Compare May 21, 2024 13:40
@jswxstw jswxstw changed the title fix: Apply execution control to taskset nodes and only delete agent pod if exists. fix: Apply execution control to taskset nodes that are not part of exit handler and only delete agent pod if exists. May 21, 2024
@jswxstw jswxstw closed this May 21, 2024
@jswxstw jswxstw reopened this May 21, 2024
@jswxstw jswxstw marked this pull request as ready for review May 23, 2024 12:22
@tooptoop4

This comment was marked as spam.

@JasonChen86899
Copy link

make some comments in pr(#13185) about IsExitNode function
Or maybe need to be clear about the definition of an ExitNode

  1. The root node of the exit handle
  2. Any one of all the nodes involved in the exit handle

@jswxstw
Copy link
Member Author

jswxstw commented Jun 24, 2024

Or maybe need to be clear about the definition of an ExitNode

I think we can do it this way: keep the existing logic of IsExitNode and add a new function IsPartOfExitHandler to determine if the node is in the Exit Handler.

@jswxstw jswxstw force-pushed the fix-taskset-nodes-with-agent-pod branch from 01f47cc to ccbc50d Compare June 24, 2024 11:50
@JasonChen86899
Copy link

// IsPartOfExitHandler returns whether node is part of exit handler.
func (woc *wfOperationCtx) IsPartOfExitHandler(node wfv1.NodeStatus) bool {
	if node.BoundaryID == "" {
		return node.IsExitNode()
	}
	if boundaryNode, err := woc.wf.Status.Nodes.Get(node.BoundaryID); err == nil {
		return boundaryNode.IsExitNode()
	}
	return false
}

A for loop or recursion is needed here. Here's a modification of my code based on yours

// IsPartOfExitHandler returns whether node is part of exit handler.
func (woc *wfOperationCtx) IsPartOfExitHandler(node wfv1.NodeStatus) (bool, error) {
	tmpNode := node
	for !tmpNode.IsExitNode() {
		if tmpNode.BoundaryID == "" {
			return false, nil
		}
		// get boundary node
		boundaryNode, err := woc.wf.Status.Nodes.Get(tmpNode.BoundaryID)
		if err != nil {
			return false, err
		}

		tmpNode = *boundaryNode
	}
	
	return true, nil
}

@jswxstw jswxstw force-pushed the fix-taskset-nodes-with-agent-pod branch 4 times, most recently from cb2397a to ac8af9a Compare June 27, 2024 03:20
@jswxstw jswxstw changed the title fix: Apply execution control to taskset nodes that are not part of exit handler and only delete agent pod if exists. fix: Only apply execution control to nodes that are not part of exit handler and delete agent pod only if it exists. Jun 27, 2024
@jswxstw jswxstw force-pushed the fix-taskset-nodes-with-agent-pod branch 3 times, most recently from 797cdaf to c86a807 Compare June 27, 2024 08:10
@agilgur5
Copy link

agilgur5 commented Jul 2, 2024

Modification 2

Agent pod will only be created when taskset nodes exist, so do not delete it if it does not exist.

@jswxstw can you split the agent delete into a separate PR? it's independent of the other changes here and supersedes #12679, which we both already reviewed, so that one I can review and merge almost immediately. (It's also a lot simpler/has a lot less edge cases to review than the rest)
Can assign me on that split PR

@jswxstw
Copy link
Member Author

jswxstw commented Jul 2, 2024

New PR: #13294 @agilgur5, I don't have permission to assign.

@jswxstw jswxstw changed the title fix: Only apply execution control to nodes that are not part of exit handler and delete agent pod only if it exists. fix: Only apply execution control to nodes that are not part of exit handler. Jul 2, 2024
@agilgur5
Copy link

agilgur5 commented Jul 2, 2024

New PR: #13294

Thanks! Approved. Should merge once tests succeed.

I don't have permission to assign.

Ah right. For future reference, I think you can "Request a Review" which will at least send me a notification

@jswxstw jswxstw force-pushed the fix-taskset-nodes-with-agent-pod branch from dbf5f8b to 77ddf46 Compare August 9, 2024 07:50
Copy link
Member

@terrytangyuan terrytangyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@terrytangyuan terrytangyuan merged commit 16f0a8e into argoproj:main Aug 11, 2024
28 checks passed
Joibel pushed a commit to pipekit/argo-workflows that referenced this pull request Sep 19, 2024
Joibel pushed a commit that referenced this pull request Sep 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/agent Argo Agent that runs for HTTP and Plugin templates area/exit-handler area/templates/http
Projects
None yet
5 participants