Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/add pod pending timeout config #4590

Merged
merged 12 commits into from
Dec 13, 2023

Conversation

pvditt
Copy link
Contributor

@pvditt pvditt commented Dec 13, 2023

Tracking issue

#1149

Why are the changes needed?

Users aren't able to configure timeout for pending pods. This leads for pods/tasks stuck in pending to rely upon execution timeout to fail.

PodPendingTimeout can help in situations when pods aren't getting executed due to resources not being available or when downstream execution engines are not able to schedule work.

What changes were proposed in this pull request?

Introduce a "pod-pending-timeout" plugin config to enable users to set pending/queued timeouts for tasks. This value defaults to 0 such that the current pending behavior does not change unless the config is set.

How was this patch tested?

Added unit test here

Setup process

Configure plugins -> k8s -> pod-pending-timeout in flyte_single_binary.yaml

Screenshots

Screenshot 2023-12-05 at 12 21 26 PM

Check all the applicable boxes

  • I updated the documentation accordingly.
    (should be auto updated running script/generate_config_docs.sh)
  • All new and existing tests passed.
  • All commits are signed-off.

Related PRs

Docs link

@dosubot dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Dec 13, 2023
@pvditt pvditt changed the title Feature/add pod pending timeout config 2 Feature/add pod pending timeout config Dec 13, 2023
@pvditt
Copy link
Contributor Author

pvditt commented Dec 13, 2023

I accidentally ruined https://github.com/flyteorg/flyte/pulls?q=is%3Apr+is%3Aclosed with a rebase and figured it'd be easier to just checkout a new branch

@hamersaw:

My only concern with this logic is if DemystifyPendingHelper returns an error and the podPendingTimeout has elapsed the later will enforce a RetryableFailure and mask the error returned. IMO it might make more sense to return the error immediately?

The function only ever returns nil for the error as did the DemystifyPending previously. The linter for some reason flags when the function is not getting exported, so I removed the error return param from the helper. I can also do that for the DemystifyPending function as that only ever returns nil errors as well.

Copy link

codecov bot commented Dec 13, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (78efbf8) 58.98% compared to head (d3e0fca) 58.99%.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #4590   +/-   ##
=======================================
  Coverage   58.98%   58.99%           
=======================================
  Files         621      621           
  Lines       52483    52498   +15     
=======================================
+ Hits        30959    30969   +10     
- Misses      19057    19062    +5     
  Partials     2467     2467           
Flag Coverage Δ
unittests 58.99% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@hamersaw hamersaw merged commit a5ea1c8 into master Dec 13, 2023
41 checks passed
@hamersaw hamersaw deleted the feature/add-pod-pending-timeout-config-2 branch December 13, 2023 16:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:M This PR changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants