Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scx_lavd: replesih time slice at ops.running() only when necessary #250

Merged
merged 1 commit into from
Apr 29, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion scheds/rust/scx_lavd/src/bpf/intf.h
Original file line number Diff line number Diff line change
Expand Up @@ -49,15 +49,16 @@ enum consts {
NSEC_PER_USEC = 1000ULL,
NSEC_PER_MSEC = (1000ULL * NSEC_PER_USEC),
LAVD_TIME_ONE_SEC = (1000ULL * NSEC_PER_MSEC),
LAVD_TIME_INFINITY_NS = 0xFFFFFFFFFFFFFFFFULL,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to sue SCX_SLICE_INF which is the same u64 max value. Note that if a running task has this slice value, the tick is stopped. I don't think lavd ever actually ends up running tasks with this value tho, so not really a concern but just something to note.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank! I will update it accordingly.

LAVD_MAX_CAS_RETRY = 8,

LAVD_TARGETED_LATENCY_NS = (15 * NSEC_PER_MSEC),
LAVD_SLICE_MIN_NS = (300 * NSEC_PER_USEC),/* min time slice */
LAVD_SLICE_MAX_NS = (3 * NSEC_PER_MSEC), /* max time slice */
LAVD_SLICE_UNDECIDED = LAVD_TIME_INFINITY_NS,
LAVD_SLICE_GREEDY_FT = 3,
LAVD_LOAD_FACTOR_ADJ = 6,
LAVD_LOAD_FACTOR_MAX = (10 * 1000),
LAVD_TIME_INFINITY_NS = 0xFFFFFFFFFFFFFFFFULL,

LAVD_LC_FREQ_MAX = 1000000,
LAVD_LC_RUNTIME_MAX = LAVD_TARGETED_LATENCY_NS,
Expand Down
30 changes: 26 additions & 4 deletions scheds/rust/scx_lavd/src/bpf/main.bpf.c
Original file line number Diff line number Diff line change
Expand Up @@ -1644,7 +1644,7 @@ static void put_global_rq(struct task_struct *p, struct task_ctx *taskc,
* Enqueue the task to the global runqueue based on its virtual
* deadline.
*/
scx_bpf_dispatch_vtime(p, LAVD_GLOBAL_DSQ, LAVD_SLICE_MAX_NS,
scx_bpf_dispatch_vtime(p, LAVD_GLOBAL_DSQ, LAVD_SLICE_UNDECIDED,
vdeadline, enq_flags);

}
Expand Down Expand Up @@ -1679,7 +1679,7 @@ static bool put_local_rq(struct task_struct *p, struct task_ctx *taskc,
taskc->vdeadline_delta_ns = 0;
taskc->eligible_delta_ns = 0;
taskc->victim_cpu = (s16)LAVD_CPU_ID_NONE;
scx_bpf_dispatch(p, SCX_DSQ_LOCAL, LAVD_SLICE_MAX_NS, enq_flags);
scx_bpf_dispatch(p, SCX_DSQ_LOCAL, LAVD_SLICE_UNDECIDED, enq_flags);
return true;
}

Expand Down Expand Up @@ -1804,6 +1804,27 @@ void BPF_STRUCT_OPS(lavd_runnable, struct task_struct *p, u64 enq_flags)
waker_taskc->last_runnable_clk = now;
}

static bool need_to_calc_time_slice(struct task_struct *p)
{
/*
* We need to calculate the task @p's time slice in two cases: 1) if it
* hasn't been calculated (i.e., LAVD_SLICE_UNDECIDED) after the
* enqueue or 2) if the sched_ext kernel assigns the default time slice
* (i.e., SCX_SLICE_DFL).
*
* Calculating and assigning a time slice without checking these two
* conditions could entail pathological behaviors, notably watchdog
* time out. One condition that could trigger a watchdog time-out is as
* follows. 1) a task is preempted by another task, which runs in a
* higher scheduler class (e.g., RT or DL). 2) when the task is
* re-running after, for example, the RT task preempted out, its time
* slice will be replenished again. 3) If these two steps are repeated,
* the task can run forever.
*/
return p->scx.slice == LAVD_SLICE_UNDECIDED ||
p->scx.slice == SCX_SLICE_DFL;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, the only time the kernel assigns SCX_SLICE_DFL is when the the currently running task is the only runnable task on the CPU. When the task's slice expires, the kernel sets slice to the default value and keeps running the task. This is a convenience feature which can be disabled by setting SCX_OPS_ENQ_LAST in ops.flags. When the flag is set, the task will always be enqueued when the slice expires whether it's the last runnable task on the CPU or not. When the last task is enqueued, ops.enqueue() is called with SCX_ENQ_LAST flag:

        /*                                                                                                                                                                                                             
         * The task being enqueued is the only task available for the cpu. By                                                                                                                                          
         * default, ext core keeps executing such tasks but when                                                                                                                                                       
         * %SCX_OPS_ENQ_LAST is specified, they're ops.enqueue()'d with the                                                                                                                                            
         * %SCX_ENQ_LAST flag set.                                                                                                                                                                                     
         *                                                                                                                                                                                                             
         * If the BPF scheduler wants to continue executing the task,                                                                                                                                                  
         * ops.enqueue() should dispatch the task to %SCX_DSQ_LOCAL immediately.                                                                                                                                       
         * If the task gets queued on a different dsq or the BPF side, the BPF                                                                                                                                         
         * scheduler is responsible for triggering a follow-up scheduling event.                                                                                                                                       
         * Otherwise, Execution may stall.                                                                                                                                                                             
         */                                                                                                                                                                                                            
        SCX_ENQ_LAST            = 1LLU << 41,                                                                                                                                                                          

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, I will keep the code as it is but later when the preemption code per tick is ready, I will change it.
Thank you!

}

void BPF_STRUCT_OPS(lavd_running, struct task_struct *p)
{
struct cpu_ctx *cpuc;
Expand All @@ -1826,9 +1847,10 @@ void BPF_STRUCT_OPS(lavd_running, struct task_struct *p)
cpuc->stopping_tm_est_ns = get_est_stopping_time(taskc);

/*
* Calcualte task's time slice based on updated load.
* Calculate the task's time slice based on updated load if necessary.
*/
p->scx.slice = calc_time_slice(p, taskc);
if (need_to_calc_time_slice(p))
p->scx.slice = calc_time_slice(p, taskc);

/*
* If there is a relevant introspection command with @p, process it.
Expand Down