Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scx_rustland_core: prevent CI failures #662

Merged
merged 2 commits into from
Sep 19, 2024
Merged

Conversation

arighi
Copy link
Contributor

@arighi arighi commented Sep 19, 2024

A couple of changes that should finally prevent all the CI failures.

With this applied, the schedulers based on scx_rustland_core will always dispatch kthreads directly from the BPF component. This should be a safer option for now, allowing to prevent potential stalls also under heavy stress test conditions.

In the future, we will provide a better API to enforce stricter selection criteria for tasks permitted to bypass user-space scheduling.

Updating nr_queued in a non-atomic when a queued task is consumed can
lead to underflows. We don't really care about being 100% accurate here,
since nr_queued should be considered more of a statistic than an
accurate value.

Therefore, just accept the fact that nr_queued can be inaccurate and
handle potential underflows.

Signed-off-by: Andrea Righi <[email protected]>
Dispatching kthreads via user-space can still lead to deadlocks in
certain cases (for example we can still trigger stalls by running the
fork stressor via stress-ng).

To prevent such stalls simply dispatch kthreads directly from BPF for
now to prevent failures.

In the future we may consider to provide an API to restrict the
selection of tasks directly dispatched (for example passing a mask PF_*
flags to "whitelist" the tasks that are allowed to bypass the user-space
scheduler).

Signed-off-by: Andrea Righi <[email protected]>
@arighi
Copy link
Contributor Author

arighi commented Sep 19, 2024

Merging to see if we can reduce the amount of failures with CI.

@arighi arighi merged commit 488f209 into main Sep 19, 2024
20 checks passed
@arighi arighi deleted the rustland-prevent-ci-failures branch September 19, 2024 12:37
@likewhatevs
Copy link
Contributor

I think we want those failures.

I haven't looked a to at how stress-ng is being called in the tests, but like, stress-ng is stalling which doesn't make the tests fail like kernel threads stalling does, but the tests should still fail (they're passing with stalls atm):

https://github.com/sched-ext/scx/actions/runs/10936241301/job/30359538896#step:16:363

@arighi
Copy link
Contributor Author

arighi commented Sep 19, 2024

meh ok... it's still failing then, I really don't understand what's going on with the stress-ng fork stressor, it really breaks rustland badly... I will investigate more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants