Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Persisting task on PhaseTimeout #4765

Closed
2 tasks done
pvditt opened this issue Jan 23, 2024 · 3 comments
Closed
2 tasks done

[BUG] Persisting task on PhaseTimeout #4765

pvditt opened this issue Jan 23, 2024 · 3 comments
Assignees
Labels
bug Something isn't working exo untriaged This issues has not yet been looked at by the Maintainers

Comments

@pvditt
Copy link
Contributor

pvditt commented Jan 23, 2024

Describe the bug

When a task fails due to timing out with respect to the node-active-deadline, the node is set to NodePhaseTimingOut -> node is aborted -> NodePhaseTimedOut which is a terminal phase.

Since the abort is not set as a final transition, the state does not get persisted to admin causing for the task state to show up as "UNKNOWN" instead of "ABORTED"

Expected behavior

When a task is aborted, the aborted state should get persisted to admin.

Additional context to reproduce

Set node-active-deadline to small value in Flyte config:

propeller:
node-config:
default-deadlines:
node-active-deadline: 10s

run workflow that runs longer than that deadline + don't set an execution timeout:

@task(
   retries=1,
)
def demo_task():
   time.sleep(3000)


@workflow
def demo_workflow():
   demo_task()

Screenshots

image

Are you sure this issue hasn't been raised already?

  • Yes

Have you read the Code of Conduct?

  • Yes
@pvditt pvditt added bug Something isn't working untriaged This issues has not yet been looked at by the Maintainers labels Jan 23, 2024
@pvditt pvditt self-assigned this Jan 23, 2024
@pvditt pvditt added the exo label Jan 23, 2024
Copy link

dosubot bot commented Jan 23, 2024

🤖

Hey @pvditt, good to see you again! Hope you're doing well. Thanks for bringing this to our attention.

Based on the information provided, it appears that the task state shows up as "UNKNOWN" instead of "ABORTED" in the Flyte admin when a task fails due to timing out with respect to the node-active-deadline because the phase of the task is not being set to "ABORTED" in the code when the task times out.

In the provided code, there are different phases defined for a task such as PhaseUndefined, PhaseNotReady, PhaseWaitingForResources, PhaseQueued, PhaseInitializing, PhaseRunning, PhaseSuccess, PhaseRetryableFailure, PhasePermanentFailure, PhaseWaitingForCache, and PhaseAborted.

However, when a task times out, the phase of the task is not being set to PhaseAborted. Instead, it seems to be remaining in the PhaseUndefined state, which is why it shows up as "UNKNOWN" in the Flyte admin.

To fix this issue, you would need to update the code to set the phase of the task to PhaseAborted when the task times out. This would then correctly reflect in the Flyte admin as "ABORTED".

Please note that without the complete context or the rest of the codebase, this is the best assumption that can be made based on the provided code. You can refer to the flyteplugins/go/tasks/pluginmachinery/core/phase.go for more details on the different phases of a task.

If you need further clarification or if there's anything else I can assist you with, please let me know.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

1 similar comment
Copy link

dosubot bot commented Jan 23, 2024

🤖

Hey @pvditt, good to see you again! Hope you're doing well. Thanks for bringing this to our attention.

Based on the information provided, it appears that the task state shows up as "UNKNOWN" instead of "ABORTED" in the Flyte admin when a task fails due to timing out with respect to the node-active-deadline because the phase of the task is not being set to "ABORTED" in the code when the task times out.

In the provided code, there are different phases defined for a task such as PhaseUndefined, PhaseNotReady, PhaseWaitingForResources, PhaseQueued, PhaseInitializing, PhaseRunning, PhaseSuccess, PhaseRetryableFailure, PhasePermanentFailure, PhaseWaitingForCache, and PhaseAborted.

However, when a task times out, the phase of the task is not being set to PhaseAborted. Instead, it seems to be remaining in the PhaseUndefined state, which is why it shows up as "UNKNOWN" in the Flyte admin.

To fix this issue, you would need to update the code to set the phase of the task to PhaseAborted when the task times out. This would then correctly reflect in the Flyte admin as "ABORTED".

Please note that without the complete context or the rest of the codebase, this is the best assumption that can be made based on the provided code. You can refer to the flyteplugins/go/tasks/pluginmachinery/core/phase.go for more details on the different phases of a task.

If you need further clarification or if there's anything else I can assist you with, please let me know.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@pvditt
Copy link
Contributor Author

pvditt commented Jan 24, 2024

this issue should be closed by #4766

@pvditt pvditt closed this as completed Jan 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working exo untriaged This issues has not yet been looked at by the Maintainers
Projects
None yet
Development

No branches or pull requests

1 participant