Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
## Overview When kicking off fast tasks, we typically have to do a second round of task evaluation before a worker is available, which adds latency to the initial task runs while the worker(s) come up. This change keeps an in-memory cache of tasks waiting on a worker so that when the first one comes up, we can opportunistically enqueue the owning workflow for evaluation and avoid a ~10s delay. I chose to use a service-wide lock, which trades off some lock contention for reduced complexity. This is acceptable since we already grab the service-wide `queuesLock` when discovering a new worker (call to `Heartbeat`). ## Test Plan ~- [ ] Haven't added any unittests yet. Wanted to get feedback on the approach~ Going to defer unittests to the broad pass @hamersaw is doing - [x] Ran locally and verified that with the change tasks do not require a second round Without the enqueue call (2s delay from worker registered -> send task) ``` "2024-05-13T16:42:22-07:00" "adding pending owner flytesnacks-development/feb7da731f60c482db2d for task feb7da731f60c482db2d-n0-0 on queue 4fc648840f89c02" "2024-05-13T16:42:22-07:00" "offering task feb7da731f60c482db2d-n0-0 on queue 4fc648840f89c02" "2024-05-13T16:42:22-07:00" "offering task feb7da731f60c482db2d-n0-0 on queue 4fc648840f89c02" "2024-05-13T16:42:22-07:00" "offering task feb7da731f60c482db2d-n0-0 on queue 4fc648840f89c02" "2024-05-13T16:42:23-07:00" "worker 6b772a8b-7748-4819-99ea-140086ca27af registered with queue 4fc648840f89c02" "2024-05-13T16:42:25-07:00" "offering task feb7da731f60c482db2d-n0-0 on queue 4fc648840f89c02" "2024-05-13T16:42:25-07:00" "sending task feb7da731f60c482db2d-n0-0 to worker 6b772a8b-7748-4819-99ea-140086ca27af on queue 4fc648840f89c02" ``` With enqueue call (same second) ``` "2024-05-13T16:48:25-07:00" "adding pending owner flytesnacks-development/f96f3fe69ae744129ab3 for task f96f3fe69ae744129ab3-n0-0 on queue 4fc648840f89c02" "2024-05-13T16:48:25-07:00" "offering task f96f3fe69ae744129ab3-n0-0 on queue 4fc648840f89c02" "2024-05-13T16:48:25-07:00" "offering task f96f3fe69ae744129ab3-n0-0 on queue 4fc648840f89c02" "2024-05-13T16:48:25-07:00" "offering task f96f3fe69ae744129ab3-n0-0 on queue 4fc648840f89c02" "2024-05-13T16:48:26-07:00" "worker abb8da76-b4f2-44a2-8fe9-c577f682c914 registered with queue 4fc648840f89c02" "2024-05-13T16:48:26-07:00" "offering task f96f3fe69ae744129ab3-n0-0 on queue 4fc648840f89c02" "2024-05-13T16:48:26-07:00" "sending task f96f3fe69ae744129ab3-n0-0 to worker abb8da76-b4f2-44a2-8fe9-c577f682c914 on queue 4fc648840f89c02" ``` ## Rollout Plan (if applicable) No planning to put this behind a config (although code potentially move `maxPendingOwnersPerQueue` to a config and treat 0 as disabled). Will bring to cloud and deploy in the coming days ## Upstream Changes Should this change be upstreamed to OSS (flyteorg/flyte)? If so, please check this box for auditing. Note, this is the responsibility of each developer. See [this guide](https://unionai.atlassian.net/wiki/spaces/ENG/pages/447610883/Flyte+-+Union+Cloud+Development+Runbook/#When-are-versions-updated%3F). - [ ] To be upstreamed
- Loading branch information