[Core Feature] Allow tasks/config to specify max queue/wait time #1149
Labels
backlogged
For internal use. Reserved for contributor team workflow.
enhancement
New feature or request
exo
flytekit
FlyteKit Python related issue
propeller
Issues related to flyte propeller
Motivation: Why do you think this is important?
In cases when the underlying execution engine (AWS Batch, K8s, Spark, Hive, AWS EMR, GCP BigQuery... etc.) is having issues scheduling flyte workloads, sometimes the workload get stuck. While Flyte has a concept of timeout, it only measures the execution timeout overall. Which doesn't allow the users to express their tolerance for how much they can wait in a queue to get a task executing.
Goal: What should the final outcome look like, ideally?
Expose an additional
queue_timeout
flag that can be set at a global scope through configs, or at a task scope (ideally can also be on a project/domain/WF levels). And when flytepropeller detects that a task hasn't started executing for that period of time, it should just abort it.The text was updated successfully, but these errors were encountered: