[BUG] workflow without execution cannot be failed/aborted/finalized correctly after max retry #2577
Closed
2 tasks done
Labels
bug
Something isn't working
Describe the bug
For unknown reason, we have a workflow execution that never managed to start with error:
So trying to mutate this workflow will always result in failure and eventually
mutableW.Status.FailedAttempts > maxRetries
.When this happens, the abort process cannot handle it correctly.
If we try to trace back the very first time when retry was exhausted, there was an error when publishing event because for same reason that the execution did not exist. However, transitioning to
StatusFailed
would fail because of failing to publish event and that did not match this condition; and then everything would be retried over and over again as we have seen logs likeRuntimeExecutionError: max number of system retry attempts [38676/50] exhausted
. These lines also seems to be related.And this of course prevents GC from cleaning it up.
Expected behavior
Workflow can be terminated successfully even it didn't start.
Additional context to reproduce
Not sure the root cause of missing execution, so don't know how to reproduce it.
Screenshots
No response
Are you sure this issue hasn't been raised already?
Have you read the Code of Conduct?
The text was updated successfully, but these errors were encountered: