Skip to content
This repository has been archived by the owner on Oct 9, 2023. It is now read-only.

Added IsFailurePermanent flag on DynamicTaskStatus #567

Merged
merged 8 commits into from
Jul 6, 2023

Conversation

hamersaw
Copy link
Contributor

@hamersaw hamersaw commented May 25, 2023

TL;DR

Currently dynamic tasks treat all failures and retryable which results in unintended behavior when subtasks are attempted multiple times. This PR adds a IsFailurePermanent field on the DynamicTaskStatus struct to indicate that the failure is permanent, and therefore should not be recovered.

Type

  • Bug Fix
  • Feature
  • Plugin

Are all requirements met?

  • Code completed
  • Smoke tested
  • Unit tests added
  • Code documentation added
  • Any pending items have an associated Issue

Complete description

The choice to use a new flag is entirely out of backwards compatibility. Other options include:
(1) Using the ExecutionError in flyteplugins io rather than flyteidl ExecutionError: This would make retrieval of in progress dynamic task status fail when it expects a different error type on the field.
(2) Naming the flag IsRecoverable vs isPermanent: This is difficult because intuition says that we should use IsRecoverable which adheres more to the Flyte standard. The biggest issue is that dynamics treat all failures as retryable by default now, so this would change the default to be non-recoverable (ie. false for IsRecoverable) which changes the default behavior. It is VERY difficult to ensure there will be no regressions.

Tracking Issue

fixes flyteorg/flyte#3606

Follow-up issue

NA

@codecov
Copy link

codecov bot commented May 25, 2023

Codecov Report

Merging #567 (8e18bfe) into master (206013a) will increase coverage by 0.39%.
The diff coverage is 33.33%.

❗ Current head 8e18bfe differs from pull request most recent head f450131. Consider uploading reports for the commit f450131 to get more accurate results

hamersaw added 2 commits May 25, 2023 12:29
Signed-off-by: Daniel Rammer <[email protected]>
@hamersaw hamersaw marked this pull request as ready for review May 25, 2023 17:51
@hamersaw hamersaw requested review from kumare3 and EngHabu as code owners May 25, 2023 17:51
@hamersaw hamersaw merged commit 0cfb0a0 into master Jul 6, 2023
@hamersaw hamersaw deleted the bug/dynamic-task-retries branch July 6, 2023 14:25
eapolinario pushed a commit to eapolinario/flytepropeller that referenced this pull request Aug 9, 2023
* added IsFailurePermanent flag on DynamicTaskStatus

Signed-off-by: Daniel Rammer <[email protected]>

* fixed linter

Signed-off-by: Daniel Rammer <[email protected]>

* cleaned up retryable vs permanent reporting

Signed-off-by: Daniel Rammer <[email protected]>

* make generate

Signed-off-by: Daniel Rammer <[email protected]>

---------

Signed-off-by: Daniel Rammer <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Runtime error causes retries
2 participants