-
Notifications
You must be signed in to change notification settings - Fork 674
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Core feature] Workflow level throttling #5125
Comments
Thank you for opening your first issue here! 🛠 |
Hey there, @a4501150! Great to see you bringing up another interesting point for discussion. How have you been? Thank you for your detailed feature request. Workflow level throttling is indeed a valuable feature that can help manage resource utilization and prevent overloading of backend APIs. Based on the current Flyte architecture, it is feasible to implement this feature by extending the
The current state management in Flyte, as represented in the Please note that this is a high-level approach and the actual implementation would require a detailed design and thorough testing to ensure it meets the requirements and does not introduce any regressions. For more details on the current execution management in Flyte, you can refer to the I hope this provides a good starting point for implementing workflow level throttling in Flyte. If you have any further questions or need more details, feel free to ask. Sources
|
I think this is related to - #267 |
Also I think this is a great feature to add, would Spotify folks be willing to collaborate on this? Incase concurrency limits are specified, the execution is created in |
Hey @kumare3 thanks for reply! I think we will be able to collaborate on this, needs some coordination with the team for planning.
This sounds like a great starter solution! |
Wondering if this is fixed or if someone is looking into this @a4501150 ? |
hey sorry for late reply, @sshardool @kumare3 I don't think Spotify has capacity to work on this at the moment. It's great if @sshardool can help on this. Cc @andresgomezfrr @RRap0so to double confirm on this so work didn't get duplicated. |
Thanks folks. We are evaluating this and should be to actively engage on this in the next couple of weeks. |
@sshardool do you know if this is in progress? A few other users are hitting the same limitation. Thanks! |
Hello @davidmirror-ops - unfortunately we have not been able to take this up yet and will likely take us some time to get to it. If some else is willing to contribute I would be happy to review, otherwise I will update here once we start work on this. |
This is something we are thinking of working on with the LinkedIn team @sshardool |
Motivation: Why do you think this is important?
Currently flyte does not support workflow level throttling that limits the
same
concurrent workflows that can be running under a flyte project / domain. (same workflow means the same binary identified by workflow name).In our use case, we use flyte both for workflow scheduling and orchestration. One of our workflow will hit our backend API heavily and we want to limit the resource pressure on our backend API. Although we can implement throttling on backend API but backend API will also use in synchronized manner and we don't want to affect the current synchronized API performance and behavior, thus implement throttling on workflow side is the best choice.
Compare to other workflow engines, this is the major feature that lacks in flyte.
Goal: What should the final outcome look like, ideally?
For instance, when we trigger a workflow by creating a
CreateExecution
with a launch plan (preferred) OR when we register a workflow, we should be able to configure an extra parameter, eg.max_concurrent_execution
as type long which can limit the concurrent workflows that running at the same time.For example, let's say we have a
workflow A
, with aTask A
. We created a launch plan for this workflow withmax_concurrent_execution
as 5.CreateExecution
requests with same launch plan to flyte grpc APIQUEUED
max_concurrent_execution
workflows in status ofRUNNING
at the same timeDescribe alternatives you've considered
We are now using a hacky way by creating a
resource lock
task to achieve the functionalities:for loop
andThread.sleep()
inside the for loop along with querying the flyte endpoint to get the topmax_concurrent_execution
running executions sorted by execution created time.max_concurrent_execution
, and the execution id obtained by the the task via ENVFLYTE_INTERNAL_EXECUTION_ID
is not one of the returned top X executions, then the task will just wait there in the for loop withthread.sleep()
This effectively limit the concurrent executions of a given launchPlan (workflow) by have the throttling task as the first task in the given workflow DAG.
However, this will create many semi-dangling
resource lock
tasks if the amount of concurrent execution requests is large andmax_concurrent_execution
is relatively small.Another alternatives solution
Task level throttling can further enhance the fine control level, however this maybe harder to implement than workflow level throttling.
Propose: Link/Inline OR Additional context
No response
Are you sure this issue hasn't been raised already?
Have you read the Code of Conduct?
The text was updated successfully, but these errors were encountered: