-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scx_lavd: overhaul the virtual deadline algorithm #443
Conversation
Estimating the service time from run time and frequency is not incorrect. However, it reacts slowly to sudden changes since it relies on the moving average. Hence, we directly measure the service time to enforce fairness. Signed-off-by: Changwoo Min <[email protected]>
Signed-off-by: Changwoo Min <[email protected]>
Signed-off-by: Changwoo Min <[email protected]>
This is a prep to add a global ineligible dsq. Signed-off-by: Changwoo Min <[email protected]>
This is a prep for adding an ineligible DSQ. Signed-off-by: Changwoo Min <[email protected]>
We now maintain two run queues—an eligible run queue (DSQ) and an ineligible run queue (rbtree)—sorted by the task's virtual deadline. When the eligible run queue is empty, or the ineligible run queue has not been consumed for too long (e.g., 15 msec), a task in the ineligible run queue is moved to the eligible run queue for execution. With these two queues, we have a better admission control. Signed-off-by: Changwoo Min <[email protected]>
Advancing the clock slower when overloaded gives more opportunities for latency-critical tasks to cut in the run queue. Controlling the clock better reflects the actual load than the prior approach of stretching the time-space when overloaded. Signed-off-by: Changwoo Min <[email protected]>
If inheriting the parent's properties, a new fork task tends to be too prioritized. That is, many parent processes, such as `make,` are a bit more latency-critical than average. Signed-off-by: Changwoo Min <[email protected]>
That is okay since the runtime is considered in calculating a virtual deadline. A shorter runtime will result in a tighter deadline linearly. Signed-off-by: Changwoo Min <[email protected]>
Signed-off-by: Changwoo Min <[email protected]>
Use p->scx.weight instead. Signed-off-by: Changwoo Min <[email protected]>
Signed-off-by: Changwoo Min <[email protected]>
In theory, sys_load_factor should not be necessary since we do not stretch the time space anymore. Signed-off-by: Changwoo Min <[email protected]>
Signed-off-by: Changwoo Min <[email protected]>
Signed-off-by: Changwoo Min <[email protected]>
These are no longer necessary after remnoving load factor calculation. Signed-off-by: Changwoo Min <[email protected]>
These are no longer necessary after directly using latency criticality. Signed-off-by: Changwoo Min <[email protected]>
LAVD_VDL_LOOSENESS_FT represents how loose the deadline is. The smaller value means the deadline is tighter. While it is unlikely to be tuned, let's keep it as a tunable for now. Signed-off-by: Changwoo Min <[email protected]>
Further depenalize above-average latency-critical tasks and penalize further below-avergage latency-critical tasks in ineligibility duration. Signed-off-by: Changwoo Min <[email protected]>
With all the other optimizations and tunings, it turns out that maintaining two runqueues has more harm than good. Signed-off-by: Changwoo Min <[email protected]>
taskc_run = try_get_task_ctx(p_run); | ||
if (taskc_run && p_run->scx.slice != 0) | ||
try_yield_current_cpu(p_run, cpuc, taskc_run); | ||
t = bpf_obj_new(typeof(*t)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be possible to pre-allocate this node on task_init()
and keep it on taskc so that enqueue path doesn't have to do dynamic allocations but at the same time bpf_obj_new()
might be cheap enough for this to not matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nvm, this gets removed later.
|
||
/* | ||
* Advance the clock up to the task's deadline. When overloaded, | ||
* advnace the clock slower so other can jump in the run queue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: advnace
*/ | ||
ratio = (LAVD_LC_STARVATION_FT * stat_cur->avg_svc_time) / | ||
taskc->svc_time; | ||
return ratio + 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't starvation avoidance be part of the virtual timeline management rather than implemented through boosting interactivity? Is this an artifact of eligible and ineligible timelines being managed separately?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh that gets removed later. I'm curious why deadline in itself isn't sufficient for starvation avoidance.
Signed-off-by: Changwoo Min <[email protected]>
Thanks @htejun for the review. I will merge it to the main. |
This PR contains the major overhaul of the virtual deadline algorithm and relevant code cleanup.
Instead of stretching the time-space upon enqueue, we now advance the current virtual clock reverse proportionally to the system load. Under overload, the clock goes slower, so latency-critical tasks have more chance to cut in the timeline.
Instead of estimating service time (vruntime in CFS) from runtime and run frequency, we now directly measure the service time for eligibility enforcement.
We drop the runtime factor in calculating latency criticality since it is already considered in calculating the task's virtual deadline. Instead, we additionally consider the task's starvation factor (i.e., how much a task starved from the average service time) in calculating the latency criticality. By incorporating the starvation factor, we can systematically avoid the watchdog time-out error from the scx framework.
Instead of inheriting the parent's properties for a forked task, a forked task will be treated as a greedy task until the scheduler knows its true properties. This helps to avoid stalling under a fork bomb.
After the overhaul, we cleaned up many unnecessary codes and optimizations. Notably, we dropped the sched_prio_to_slice_weight[] table and directly used p->scx.weight.
Note that we first tried to maintain ineligible runnable tasks separately. However, we later removed this because it became unnecessary after the overhaul.