Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: layered/cost: move refilling budgets into dispatch #870

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

JakeHillion
Copy link
Contributor

Currently budgets are refreshed in layered_stopping as part of calling
record_cpu_cost. This attempts to bill for the usage that has already
happened, and if that takes the budget negative it refreshes from the global
budgets, refreshing them if needed to stay above 0. This means that the budget
for the previously scheduled layer will always be >=0, though not necessarily
at least one time slice, potentially making it impossible to schedule that
layer in the future (particularly bad for confined layers).

While billing in stopping makes sense for accurate attribution, refreshing
here might not. To stay fair we should only refresh our budgets when there is
nothing capable of running without refreshing the budgets.

This changes alters the logic to run in dispatch. We first attempt a full set
of dispatch loops with the existing per-CPU budgets. After this fails (there
was nothing this CPU could run within the constraints of the local budgets), we
query the global budgets and refill local budgets as best we can. If we still
can't schedule anything, we refill the global budgets and try again.

The new flow in dispatch is as follows:

  • Attempt dispatching with current local budgets.
  • Refill local budgets from global budgets without refreshing them, and attempt
    dispatching with these local budgets.
  • Refresh global budgets, refilling local budgets to capacity at the same time,
    and attempt dispatching with these local budgets.

This should defer to the local budgets in the common case, drain from the
global budgets wherever that achieves forward progress, and refill the global
budgets only when necessary (forward process cannot be made on this CPU). We
may benefit from a spin-lock on refreshing the global budgets to prevent
multiple CPUs doing it at the same time.

Test plan:

  • TBD

Currently budgets are refreshed in `layered_stopping` as part of calling
`record_cpu_cost`. This attempts to bill for the usage that has already
happened, and if that takes the budget negative it refreshes from the global
budgets, refreshing them if needed to stay above 0. This means that the budget
for the previously scheduled layer will always be >=0, though not necessarily
at least one time slice, potentially making it impossible to schedule that
layer in the future (particularly bad for confined layers).

While billing in `stopping` makes sense for accurate attribution, refreshing
here might not. To stay fair we should only refresh our budgets when there is
nothing capable of running without refreshing the budgets.

This changes alters the logic to run in `dispatch`. We first attempt a full set
of dispatch loops with the existing per-CPU budgets. After this fails (there
was nothing this CPU could run within the constraints of the local budgets), we
query the global budgets and refill local budgets as best we can. If we still
can't schedule anything, we refill the global budgets and try again.

The new flow in `dispatch` is as follows:
- Attempt dispatching with current local budgets.
- Refill local budgets from global budgets without refreshing them, and attempt
  dispatching with these local budgets.
- Refresh global budgets, refilling local budgets to capacity at the same time,
  and attempt dispatching with these local budgets.

This should defer to the local budgets in the common case, drain from the
global budgets wherever that achieves forward progress, and refill the global
budgets only when necessary (forward process cannot be made on this CPU). We
may benefit from a spin-lock on refreshing the global budgets to prevent
multiple CPUs doing it at the same time.

Test plan:
- TBD
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant