feat(BA-96): Metric-based model service autoscaling #3277

kyujin-cho · 2024-12-19T16:54:26Z

What's changed

This PR adds the user-configurable auto-scaling rules associated with each model-service endpoint.
The scale_services() function now interprets the configured rules and applies their decisions by changing the desired replica count.
- The original scale_services() logic that fills the difference of the current replica count to the desired replica count remains the same.
Each rule can either increase or decrease the desired number of replicas by positive/negative step_size, meaning that it is a single-direction trigger. To auto-scale the replicas in bidirectional ways, users must define at least two rules.
- There is no explicit validation or warnings for the auto-scaling rules if they contradict each other. It is the user's responsibility to configure a consistent set of rules for a single metric or combinations of them.

flowchart TD
    A1["Auto-scaling rule 1 (GREATER_THAN...)"] -->|"(&plus;) step_size"| Count(desired replica count)
    A2["Auto-scaling rule 2 (LESS_THAN...)"] -->|"(&minus;) step_size"| Count

    Count -->|apply difference to current replica count| E[Endpoint]
    
    U[User] -->|manually set| Count

Other changes

This PR separates the aiodataloader handler from the bulk loading mechanism in both EndpointStatistics and KernelStatistics to enable reuse of the bulk loader.

How it works

Every endpoint (model service) can have one or more auto scaling rules
Auto scaling rule is defined as:
- Metric source - inference runtime or kernel
  - inference framework: average value taken from every replicas. Supported only if both AppProxy reports the inference metrics. Check Backend.AI Enterprise guide for more details.
  - kernel: average value taken from every kernels backing the endpoint
- Metric name (e.g. cuda.shares or vllm_avg_prompt_throughput_toks_per_s)
- Comparator - method to compare live metrics with threshold value.
  - LESS_THAN: Rule triggered when current metric value goes below the threshold defined
  - LESS_THAN_OR_EQUAL: Rule triggered when current metric value goes below or equals the threshold defined
  - GREATER_THAN: Rule triggered when current metric value goes above the threshold defined
  - GREATER_THAN_OR_EQUAL: Rule triggered when current metric value goes above or equals the threshold defined
- Step size: size of step of the replica count to be changed when rule is triggered. Can be represented as both positive and negative value - when defined as negative, the rule will decrease number of replicas.
- Cooldown Seconds: Durations in seconds to skip reapplying the rule right after rule is first triggered.
- Minimum Replicas: Sets a minimum value for the replica count of the endpoint. Rule will not be triggered if the potential replica count gets below this value.
- Maximum Replicas: Sets a maximum value for the replica count of the endpoint. Rule will not be triggered if the potential replica count gets above this value.

Checklist: (if applicable)

Milestone metadata specifying the target backport version
Mention to the original issue

📚 Documentation preview 📚: https://sorna--3277.org.readthedocs.build/en/3277/

📚 Documentation preview 📚: https://sorna-ko--3277.org.readthedocs.build/ko/3277/

Co-authored-by: octodog <[email protected]>

…se IDEs will offer auto-completion already.

Co-authored-by: octodog <[email protected]>

kyujin-cho added 3 commits December 19, 2024 16:43

update

4f41b1f

add gql query & mutations

22cbde8

add migration script

edb16dc

github-actions bot assigned kyujin-cho Dec 19, 2024

kyujin-cho requested review from achimnol and HyeockJinKim December 19, 2024 16:54

github-actions bot added area:docs Documentations comp:manager Related to Manager component require:db-migration Automatically set when alembic migrations are added or updated labels Dec 19, 2024

kyujin-cho added type:feature Add new features and removed area:docs Documentations comp:manager Related to Manager component require:db-migration Automatically set when alembic migrations are added or updated labels Dec 19, 2024

github-actions bot added size:L 100~500 LoC comp:agent Related to Agent component comp:appproxy Related to App Proxy component comp:manager Related to Manager component urgency:5 It is imperative that action be taken right away. labels Dec 19, 2024

kyujin-cho added this to the 24.12 milestone Dec 19, 2024

kyujin-cho changed the title ~~feature: model service autoscaling~~ feat: model service autoscaling Dec 19, 2024

kyujin-cho changed the title ~~feat: model service autoscaling~~ feat: metric based model service autoscaling Dec 19, 2024

kyujin-cho added 2 commits December 20, 2024 02:07

add missing file

79ad37e

Merge branch 'main' into feature/model-service-autoscale

d044f4b

kyujin-cho marked this pull request as ready for review December 19, 2024 17:08

add news fragment

9bc0661

kyujin-cho force-pushed the feature/model-service-autoscale branch from 2e9102e to 9bc0661 Compare December 20, 2024 12:12

kyujin-cho and others added 4 commits December 20, 2024 12:14

chore: update GraphQL schema dump

66f28e7

Co-authored-by: octodog <[email protected]>

add min/max replicas

d494709

chore: update GraphQL schema dump

d1b0231

Co-authored-by: octodog <[email protected]>

add min/max replicas

11f57e0

achimnol and others added 23 commits January 17, 2025 16:48

refactor: Add missing __all__ entries in common.types

9e4835f

refactor: Add missing type annotations

561ac65

fix: Ensure immutability of default arguments

32e1636

fix: GraphQL version annotations in new EndpointAutoScalingRuleNode

98c3440

chore: update api schema dump

c8c5617

Co-authored-by: octodog <[email protected]>

fix: ObjectNotFound usage

3887004

fix: missing..

4dcb487

doc: Refine news fragment

2ea1727

refactor: Extract out GQLEnum types and specify the version-added note.

ee62b1f

chore: update api schema dump

286ce80

Co-authored-by: octodog <[email protected]>

fix: Do not repeat the enum name and value if it's CIStrEnum

ade2a40

fix: Ah.. it's useless to include the enum list in GraphQL enum becau…

59d7beb

…se IDEs will offer auto-completion already.

chore: update api schema dump

53a8de9

Co-authored-by: octodog <[email protected]>

fix: Log the autoscaling decisions explicitly (INFO instead of DEBUG)

ca44ff6

fix: Remove unnecessary logic, fix a typop

4622182

Merge branch 'main' into feature/model-service-autoscale

f18378d

Merge branch 'main' into feature/model-service-autoscale

a18294e

fix: Explicitly log all skipping cases

bc144dd

fix: oops

b605982

fix: Server-side should always use UTC if not specified

0ca161f

refactor: Use shorter UUID type phrase

b03efd3

refactor: Use shorter UUID type phrase

2c262b7

refactor: Use explicit subtype for ID values

20d1066

achimnol approved these changes Jan 17, 2025

View reviewed changes

achimnol added this pull request to the merge queue Jan 17, 2025

achimnol modified the milestones: 24.12, 25Q1 Jan 17, 2025

Merged via the queue into main with commit 9be8899 Jan 17, 2025
23 checks passed

achimnol deleted the feature/model-service-autoscale branch January 17, 2025 14:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(BA-96): Metric-based model service autoscaling #3277

feat(BA-96): Metric-based model service autoscaling #3277

kyujin-cho commented Dec 19, 2024 •

edited by github-actions bot

Loading

feat(BA-96): Metric-based model service autoscaling #3277

feat(BA-96): Metric-based model service autoscaling #3277

Conversation

kyujin-cho commented Dec 19, 2024 • edited by github-actions bot Loading

What's changed

Other changes

How it works

kyujin-cho commented Dec 19, 2024 •

edited by github-actions bot

Loading