Skip to content
This repository has been archived by the owner on Oct 9, 2023. It is now read-only.

Graceful failure on ExecutionNotFound error #439

Merged
merged 5 commits into from
Jun 6, 2022

Conversation

hamersaw
Copy link
Contributor

@hamersaw hamersaw commented May 16, 2022

TL;DR

Updating the workflow abort pipeline to drop ExecutionNotFound errors. Previously, this solution could handle workflows that have not yet started, but for those that are already executing the events that are sent as part of aborting the workflow triggered a cycle.

Type

  • Bug Fix
  • Feature
  • Plugin

Are all requirements met?

  • Code completed
  • Smoke tested
  • Unit tests added
  • Code documentation added
  • Any pending items have an associated Issue

Complete description

^^^

Tracking Issue

fixes flyteorg/flyte#2275

Follow-up issue

NA

@hamersaw hamersaw requested review from kumare3 and EngHabu as code owners May 16, 2022 22:09
@codecov
Copy link

codecov bot commented May 16, 2022

Codecov Report

Merging #439 (430dac6) into master (25802b0) will increase coverage by 0.02%.
The diff coverage is 33.33%.

pkg/controller/handler.go Outdated Show resolved Hide resolved
@hamersaw hamersaw requested a review from EngHabu May 18, 2022 00:10
@@ -1057,7 +1057,7 @@ func (c *nodeExecutor) AbortHandler(ctx context.Context, execContext executors.E
},
ProducerId: c.clusterID,
})
if err != nil && !eventsErr.IsEventIncompatibleClusterError(err) {
if err != nil && !eventsErr.IsNotFound(err) && !eventsErr.IsEventIncompatibleClusterError(err) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can extract a function to have a global "ignore list" check? This will start to get less maintainable.

@hamersaw hamersaw merged commit ba197a0 into master Jun 6, 2022
@hamersaw hamersaw deleted the bug/graceful-failure-on-execution-not-found branch June 6, 2022 22:33
eapolinario pushed a commit to eapolinario/flytepropeller that referenced this pull request Aug 9, 2023
* updated ExecutionNotFound to check failed attempts rather than workflow state

Signed-off-by: Daniel Rammer <[email protected]>

* added unit test to allow continue until maxRetries

Signed-off-by: Daniel Rammer <[email protected]>

* added dropping ExecutionNotFound error when aborting workflow

Signed-off-by: Daniel Rammer <[email protected]>

* fixed unit tests

Signed-off-by: Daniel Rammer <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] flytepropeller gracefully cleanup CRDs missing execution
3 participants