Releases: flyteorg/flytepropeller
Discretization of Statemachine fixes
Fixes a few issues uncovered during the investigation of the statemachine inconsistency issues last week. Specifically:
- Ensure each node can a progress at most once per round (IsDirty flag)
- Remove ParentTaskID and DataDir from NodeStatus field (Causing workflow etcd. obj size to bloat)
- Add Parent RetryAttempt in the generated hierarchal name of dynamic sub-nodes to ensure retries do not reuse an existing sub-node status.
Details: https://docs.google.com/document/d/1ISaxIZeYLcBaeapEmeTqb-g0x04pJbf5t3i30qMfk6U/edit?usp=sharing
Adding the support of cluster-namespaced resource management
Pulling in the changes from flyteplugins
Prefixing Allocation Tokens in Resource Manager
This release adds prefixes to allocation tokens used in the resource manager to simplify tracking and lineaging.
Node executor abort to call finalize even on error
Node executor abort to call finalize even on error (#51) In cases where the abort call fails, we should still call finalize as this is the intended behavior of the finalize construct.
Adding Namespaced Resource-aware Exponential Backoff Handler
This release supports a per-namespace resource-aware backoff mechanism to guard pod creation.
Instead of having a separate back off on each execution or having one back off for the entire queue, this PR aims to supports per-namespace backoff which strikes a good balance on the granularity given the resource quota is set per namespace.
This backoff mechanism also is resource-aware. Even if a creation request is blocked and the blocking is still active, the next creation request coming in can be allowed to try if the resource requirement of this creation request is strictly smaller all the previous trials during the same backoff period. This prevents one single big creation request unnecessarily blocking everything else coming from the same namespace.
IDL to 0.16.3
Bump IDL to 0.16.3 (#46) https://github.com/lyft/flyteidl/releases/tag/v0.16.3 which needs a bump to plugins to https://github.com/lyft/flyteplugins/releases/tag/v0.2.4
Use flytestdlib contextutils.RevisionVersionKey instead of the locally defined one
v0.1.19 Update flytestdlib to use proper contextutil key (#45)
Add a log field for resource version
v0.1.18 Add ResourceVersion to log fields (#44)
Implementation for Node timeout
Implementation for node timeout (#42) * Implementation for node timeout * . * adding some tests * bogus change to retrigger travis * cr feedback Creating a separate state in the CRD for failure type so we can distinguish between user and system error. For now, we will use it for timeout failures. * removing failure type and adding TimingOut as a separate phase * updated mockery * finalize to abort on timeout * fixing NodePhaseTimedout
Removing flytekit version check for fixing array task interface
Remove flytekit version check (#41) * Remove flytekit version check * lint