Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Set initial progress from pod metadata if exists. Fixes #13057 #13260

Merged
merged 2 commits into from
Aug 12, 2024

Conversation

jswxstw
Copy link
Member

@jswxstw jswxstw commented Jun 28, 2024

Fixes #13057

Motivation

Self reporting progress does not work when workflows.argoproj.io/progress is set.

Modifications

  1. Set initial progress from pod metadata if exists before podReconciliation.
  2. Prevent the node's existing progress from being overwritten with the initial progress in pod metadata: only override it when workflow uses legacy/insecure pod patch.

Verification

Local test and UT.
Workflow demo:

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: progress-
spec:
  entrypoint: main
  templates:
    - name: main
      dag:
        tasks:
          - name: progress
            template: progress
    - name: progress
      metadata:
        annotations:
          workflows.argoproj.io/progress: 0/100
      container:
        image: alpine
        imagePullPolicy: IfNotPresent
        command: [ "/bin/sh", "-c" ]
        args:
          - |
            for i in `seq 1 10`; do sleep 10; echo "$(($i*10))"'/100' > $ARGO_PROGRESS_FILE; done

@juliev0
Copy link
Contributor

juliev0 commented Jun 30, 2024

Thanks for all of the fixes @jswxstw! Any interest in joining the Argo Workflows Sustainability Effort and geting your PRs prioritized for review?

@jswxstw
Copy link
Member Author

jswxstw commented Jul 1, 2024

@juliev0 Thank you for your invitation, but unfortunately, I need to move on to other work later.

@juliev0 juliev0 self-assigned this Aug 6, 2024
@juliev0
Copy link
Contributor

juliev0 commented Aug 6, 2024

Prevent the node's existing progress from being overwritten with the initial progress in pod metadata: only override it when workflow uses legacy/insecure pod patch.

So, does this mean that the "self reporting progress" feature would only work if the RBAC is configured to prevent writing to WorkflowTaskResult per this line?

@juliev0 juliev0 added the problem/more information needed Not enough information has been provide to diagnose this issue. label Aug 6, 2024
@jswxstw
Copy link
Member Author

jswxstw commented Aug 7, 2024

Not really, "self reporting progress" feature also works when WorkflowTaskResult is written, but setting initial progress with pod label workflows.argoproj.io/progress: N/M will render this feature ineffective.

@juliev0
Copy link
Contributor

juliev0 commented Aug 7, 2024

Sorry. Does this mean that we're calling ParseProgress() somewhere else for the case of WorkflowTaskResult written? (since in your code I see it now only being called if AnnotationKeyReportOutputsCompleted is set, which seems to only get set if the RBAC prevents writing to WorkflowTaskResult)

@jswxstw
Copy link
Member Author

jswxstw commented Aug 7, 2024

(since in your code I see it now only being called if AnnotationKeyReportOutputsCompleted is set, which seems to only get set if the RBAC prevents writing to WorkflowTaskResult)

Controller will get node progress both from WorkflowTaskResult and pod annotation and the latter will override the former since podReconciliation is executed after taskResultReconciliation.
Normally, progress will appear in the pod annotation only if the RBAC prevents writing to WorkflowTaskResult. However, users can manually set the initial progress to the pod annotation, which causes the controller always use the initial progress and ignore the real progress in WorkflowTaskResult.
So, I determine if RBAC is preventing writing to WorkflowTaskResult by check if pod annotation AnnotationKeyReportOutputsCompleted is exist.

@juliev0
Copy link
Contributor

juliev0 commented Aug 9, 2024

okay, thanks for the explanation

@juliev0
Copy link
Contributor

juliev0 commented Aug 9, 2024

Did you test it both while having the RBAC permission and while not? (if you can confirm that and add the comment I mentioned (or something like it), then I'll gladly approve and merge. Thanks for fixing this!

@jswxstw
Copy link
Member Author

jswxstw commented Aug 11, 2024

Did you test it both while having the RBAC permission and while not? (if you can confirm that and add the comment I mentioned (or something like it), then I'll gladly approve and merge. Thanks for fixing this!

I removed workflowtaskresult edit permission for executor, wait container logs shows that:

time="2024-08-11T03:43:30.299Z" level=debug msg="Create workflowtaskresults 403"
time="2024-08-11T03:43:30.299Z" level=warning msg="failed to patch task result, falling back to legacy/insecure pod patch, see https://argo-workflows.readthedocs.io/en/latest/workflow-rbac/" error="workflowtaskresults.argoproj.io is forbidden: User \"system:serviceaccount:argo:default\" cannot create resource \"workflowtaskresults\" in API group \"argoproj.io\" in the namespace \"argo\""
time="2024-08-11T03:43:30.308Z" level=debug msg="Patch pods 200"
time="2024-08-11T03:43:35.298Z" level=info progress=10/100
time="2024-08-11T03:43:37.300Z" level=debug msg="Create workflowtaskresults 403"
time="2024-08-11T03:43:37.300Z" level=warning msg="failed to patch task result, falling back to legacy/insecure pod patch, see https://argo-workflows.readthedocs.io/en/latest/workflow-rbac/" error="workflowtaskresults.argoproj.io is forbidden: User \"system:serviceaccount:argo:default\" cannot create resource \"workflowtaskresults\" in API group \"argoproj.io\" in the namespace \"argo\""
time="2024-08-11T03:43:37.308Z" level=debug msg="Patch pods 200"
time="2024-08-11T03:43:44.299Z" level=debug msg="Create workflowtaskresults 403"
time="2024-08-11T03:43:44.299Z" level=warning msg="failed to patch task result, falling back to legacy/insecure pod patch, see https://argo-workflows.readthedocs.io/en/latest/workflow-rbac/" error="workflowtaskresults.argoproj.io is forbidden: User \"system:serviceaccount:argo:default\" cannot create resource \"workflowtaskresults\" in API group \"argoproj.io\" in the namespace \"argo\""
time="2024-08-11T03:43:44.308Z" level=debug msg="Patch pods 200"
time="2024-08-11T03:43:44.309Z" level=info progress=20/100

Self reporting progress works well.

@jswxstw
Copy link
Member Author

jswxstw commented Aug 11, 2024

/retest

@jswxstw jswxstw requested a review from juliev0 August 11, 2024 04:08
Signed-off-by: oninowang <[email protected]>
@juliev0 juliev0 merged commit 2d7e2b5 into argoproj:main Aug 12, 2024
28 checks passed
@agilgur5 agilgur5 added area/controller Controller issues, panics and removed problem/more information needed Not enough information has been provide to diagnose this issue. labels Aug 12, 2024
Joibel pushed a commit to pipekit/argo-workflows that referenced this pull request Sep 19, 2024
Joibel pushed a commit that referenced this pull request Sep 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/controller Controller issues, panics
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Self reporting progress does not work when workflows.argoproj.io/progress is set
3 participants