-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce scx_rustland_core: a generic layer to implement user-space schedulers in Rust #161
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome, thanks for working on this! Left a few comments
rust/scx_user/src/bpf.rs
Outdated
|
||
// Copy one item from the ring buffer. | ||
fn callback(data: &[u8]) -> i32 { | ||
unsafe { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to ignore, but it's probably good for us to get into the habit of adding safety comments: https://std-dev-guide.rust-lang.org/policy/safety-comments.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely! Considering that it's not trivial to understand what's happening under the hood here, the more comments we add the better (especially about memory safety). I'll add a proper comment.
rust/scx_user/src/bpf.rs
Outdated
// Return an unsupported error to stop early and consume only one item. | ||
// | ||
// NOTE: this is quite a hack. I wish libbpf would honor stopping after the first item | ||
// is consumed, upon returnin a non-zero positive value here, but it doesn't seem to be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/returnin/returning
rust/scx_user/src/bpf.rs
Outdated
// https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/lib/bpf/ringbuf.c?h=v6.8-rc5#n260 | ||
// | ||
// Maybe we should fix this to stop processing items from the ring buffer also when a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, agreed. This is what we do for the kernel -> user ringbuffer: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/bpf/ringbuf.c#n764.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, and IIRC the loop stops if the callback returns a value > 0, you don't necessarily need to pass a value that < 0, I think the same should be done in libbpf (and libbpf-rs as well). I'll try to propose a fix for that.
I already sent a PR for libbpf-rs (libbpf/libbpf-rs#680) to add new consume_raw()
and poll_raw()
functions (now merged).
With all these pieces in place we would be able to remove this hack and also be able to read multiple items at once if we want, using the retcode of consume_raw()
to determine the amount of tasks received.
rust/scx_user/src/bpf.rs
Outdated
} | ||
|
||
// Override the default scheduler time slice (in us). | ||
#[allow(dead_code)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it required to use #[allow(dead_code)]
for pub
functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, if I build a scheduler that doesn't use a pub method, the compiler complains that the method is unused. But if I build scx_user
as a crate, then it doesn't complain. I'm not sure if there's a better way to tell the compiler (or our build scripts) that this is always a crate, therefore it shouldn't complain about unused pub functions... I'll investigate a bit.
scheds/rust/scx_fifo/src/main.rs
Outdated
// Notify the BPF component that all tasks have been dispatched. | ||
*self.bpf.nr_queued_mut() = 0; | ||
*self.bpf.nr_scheduled_mut() = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if there's any way to keep this internal to the library? It feels like a user shouldn't have to manually set these skeleton fields in order for the scheduling framework to work correctly, but I'd need to think about it more.
Not sure if we want to do something similar here, but in rhone we use a callback mechanism whereby the core framework calls into the caller-specified callback when a task is enqueued: https://github.com/Decave/rhone/blob/main/rhone_sched.c#L423, and we have a callback that we invoke for when a core is going idle: https://github.com/Decave/rhone/blob/main/rhone_sched.c#L438.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's preferable to allow end users to determine the program structure if possible. Would just adding clear methods be enough here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh! I like the callback idea! And I agree that setting these vars to 0 is quite ugly, we should have at least an API for that. I'll work on that!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's preferable to allow end users to determine the program structure if possible. Would just adding clear methods be enough here?
Maybe we can do both, provide a clear method (that's easy) and later on evaluate the possibility to have also a callback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can evolve the crate in tree, so code-wise, it all looks fine to me. We need to be a bit more careful about names tho. e.g. I don't think we want to allocate the scx_fifo
name to the rustland implementation and it's difficult to reclaim names once published to creates.io
. Code looks fine to me but please reconsider names before merging.
rust/scx_user/build.rs
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm... it's bothersome that this create is replicating parts of scx_utils
. It'd be better if this comes from scx_utils
instead. Given that the only thing this crate needs to add is the bpf.rs
file, wouldn't something like the following work?
- Pack the BPF source files to be included by the scheduler implementations as strings.
- Implement
BpfBuilder
wrapper, say,BpfUserBuilder
which also unpacks the above source files in the same directories as other header files. This may need some updates toBpfBuilder
so thatscx_user
can do it in multiple steps and query the directories and so on. That hopefully should be relatively straight-forward. - Users can simply use
BpfUserBuilder
instead ofBpfBuilder
and the only difference is that the extra source files are available in the build directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I was also skeptical about naming it scx_fifo
, how about scx_userfifo
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I like the idea of a BpfUserBuilder
wrapper! I'll work on that.
rust/meson.build
Outdated
@@ -1 +1,2 @@ | |||
subdir('scx_utils') | |||
subdir('scx_user') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder whether scx_rustland
name is better suited for this crate so that the framework has a distinctive name. Then, the schedulers which use this framework can be named like e.g. scxrl_fifo
or sth like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm glad that you're saying this. In my initial version I actually used scx_rustland
for this crate, but I wasn't 100% convinced about reusing the name for something else... Maybe scx_rustland
should just evolve into this meta-scheduler crate that allows to write schedulers in Rust?
And then we can unlock a new namespace that would even allow to potentially re-implement the same scheduling policies that we have in C / BPF, without conflicting with the naming.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be fine to morph scx_rustland
into the framework crate and separate out actual scheduler implementations and given them a distinct namespace. That was the intention of the scheduler after all. But yeah, as long as we give the framework a sufficiently identifying name and give the example scheduler implementations on top accordingly scoped names, we should be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated this PR (only locally for now) to rename scx_user
-> scx_rustland
, scx_fifo
-> scxrl_fifo
and scx_rustland
-> scxrl_vtime
.
However, I still have conflicting opinions about this renaming.
I don't see any problem with user and fifo, but about rustland, we have docs and press material (blog posts, articles, etc.) that link the scx_rustland
directory in this repo directly. We are going to break the existing scx_rustland crate (or at least change its nature, from being a scheduler to a framework). And we also have the scx
systemd service that has the name scx_rustland
.
If we rename the scheduler we're going to break all of the above potentially.
So, I was considering the following options:
- rename everything as planned, keep
scheds/rust/scx_rustland/README.md
to explain the renaming, add some logic to the scx service script to print a warning if someone is trying to use scx_rustland as a scheduler (not sure how to handle the change in the crate in this case) - rename
scx_user
->scx_rustland_core
, keepscx_rustland
as it is, renamescx_fifo
->scx_rlfifo
I'm more inclined to go with option 2 (at the end scx_rustland_core
is not a bad name for a user-space Rust scheduling framework) and we have the benefits of not breaking anything in the existing docs, articles, links, tools, etc.
Opinions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either sounds fine to me. Your call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack, in the meantime I've pushed an update to go with option 2. In this way we're not breaking anything and we're not allocating scx_fifo
(the "rustland FIFO" example is now called scx_rlfifo
).
I'd like to see if I can reduce the code duplication between scx_utils
and scx_rustland_core
and then I think we may be ready to merge this and keep improving this stuff in tree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...and with the last push I think I've addressed/fixed all the other comments, so the only thing that I'd like to look at before merging this PR is the BpfUserBuilder
wrapper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, with 3b4df81 on top I think we have a fairly reasonable first implementation of the scx_rustland_core
framework.
Thinking more about shpping the pre-compiled BPF component of scx_rustland_core
in the crate, I think I actually prefer to embed main.bpf.c
as source code in the crate (as it is right now), because in this way it'll be always compiled with the same toolchain that is used for the scheduler, that seems more robust.
scheds/rust/scx_fifo/src/main.rs
Outdated
// Notify the BPF component that all tasks have been dispatched. | ||
*self.bpf.nr_queued_mut() = 0; | ||
*self.bpf.nr_scheduled_mut() = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's preferable to allow end users to determine the program structure if possible. Would just adding clear methods be enough here?
LGTM but is rust/scx_rustland_core/build.rs still needed? |
meh! I forgot about that part 😅 At the moment yes, if we want to build Thanks for noticing it! |
bf657ed
to
85d453c
Compare
Added 3 more commits on top to prevent code duplication and generate all the required source files for the rustland-based schedulers in-tree via the new RustLandBuilder() helper:
I think I'm pretty happy with this PR for now, even if it seems a big change, excluding the new So, if there's no objection I'll go ahead and merge this. |
rust/scx_rustland_core/Cargo.toml
Outdated
authors = ["Andrea Righi <[email protected]>"] | ||
license = "GPL-2.0-only" | ||
repository = "https://github.com/sched-ext/scx" | ||
description = "Framework to implement sched_ext schedulers running in user-space" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI / completely optional nit: if we want to be grammatically precise, user-space should be written as "user space" here, and in other places where it's not being used as a compound adjective. A hyphen is only used like this when part of a compound adjective. In other words, to differentiate the meaning of the adjective from the words used separately, such as fire proof suit -> fire-proof suit, left handed boxer -> left-handed boxer, user space scheduler -> user-space scheduler, etc. Just user space on its own is a noun, not a compound adjective describing something like a type of scheduler.
(Fun fact -- I learned this when LWN editors were reviewing my articles and pointed it out to me)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is really interesting! Thanks for sharing it (will fix it).
rust/scx_rustland_core/Cargo.toml
Outdated
scx_utils = { path = "../scx_utils", version = "0.6" } | ||
|
||
[build-dependencies] | ||
#bindgen = ">=0.68, <0.70" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add a comment explaining why this is commented out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this one can be removed, I left it commented while testing to determine which crates could be removed (will fix as well).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I'm still conflicted about the update_tasks()
part of the API, but it's also works perfectly fine, and it's something we can iterate on later if we choose to.
rust/scx_rustland_core/README.md
Outdated
Ok(None) => { | ||
// Notify the BPF component that all tasks have been dispatched. | ||
*self.bpf.nr_queued_mut() = 0; | ||
*self.bpf.nr_scheduled_mut() = 0; | ||
|
||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be updated to say update_tasks(Some(0), Some(0))
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely (will fix also this one).
Introduce a separate crate (scx_rustland_core) that can be used to implement sched-ext schedulers in Rust that run in user-space. This commit only provides the basic layout for the new crate and the abstraction to the custom allocator. In general, any scheduler that has a user-space component needs to use the custom allocator to prevent potential deadlock conditions, caused by page faults (a kthread needs to run to resolve the page fault, but the scheduler is blocked waiting for the user-space page fault to be resolved => deadlock). However, we don't want to necessarily enforce this constraint to all the existing Rust schedulers, some of them may do all user-space allocations in safe paths, hence the separate scx_rustland_core crate. Merging this code in scx_utils would force all the Rust schedulers to use the custom allocator. In a future commit the scx_rustland backend will be moved to scx_rustland_core, making it a totally generic BPF scheduler framework that can be used to implement user-space schedulers in Rust. Signed-off-by: Andrea Righi <[email protected]>
Move the BPF component of scx_rustland to scx_rustland_core and make it available to other user-space schedulers. NOTE: main.bpf.c and bpf.rs are not pre-compiled in the scx_rustland_core crate, they need to be included in the user-space scheduler's source code in order to be compiled/linked properly. Signed-off-by: Andrea Righi <[email protected]>
scx_rustland has significantly evolved since its original design. With the introduction of scx_rustland_core and the inclusion of the scx_rlfifo example, scx_rustland's focus can be shifted from solely being an "easy-to-read Rust scheduler template" to a fully functional scheduler. For this reason, update the README and documentation to reflect its revised design, objectives, and intended use cases. Signed-off-by: Andrea Righi <[email protected]>
Implement a FIFO scheduler as an example usage of scx_rustland_core. Signed-off-by: Andrea Righi <[email protected]>
Introduce a helper function to update the counter of queued and scheduled tasks (used to notify the BPF component if the user-space scheduler has still some pending work to do). Signed-off-by: Andrea Righi <[email protected]>
Introduce a wrapper to scx_utils::BpfBuilder that can be used to build the BPF component provided by scx_rustland_core. The source of the BPF components (main.bpf.c) is included in the crate as an array of bytes, the content is then unpacked in a temporary file to perform the build. The RustLandBuilder() helper is also used to generate bpf.rs (that implements the low-level user-space Rust connector to the BPF commponent). Schedulers based on scx_rustland_core can simply use RustLandBuilder(), to build the backend provided by scx_rustland_core. Signed-off-by: Andrea Righi <[email protected]>
Introduce a Builder() class in scx_utils that can be used by other scx crates (such as scx_rustland_core) to prevent code duplication. Signed-off-by: Andrea Righi <[email protected]>
There is no need to generate source code in a temporary directory with RustLandBuilder(), we can simply generate code in-tree and exclude the generated source files from .gitignore. Having the generated source files in-tree can help to debug potential build issues (and it also allows to drop the the tempfile crate dependency). Signed-off-by: Andrea Righi <[email protected]>
@Decave thanks tons for the review, I think I've fixed all the issues that you found. About |
The idea is to decouple the generic scheduling-agnostic backend of
scx_rustland
from the scheduler itself, move the backend to a separate crate (scx_rustland_core
), and make it available to implement other user-space schedulers as pure Rust front-ends.Of course this process will require multiple iterations, we need to define and document a proper API, etc., but it doesn't need to be perfect right away and I think the "rustland backend" is mature enough now to (try to) become a generic subsystem on its own. Note that all of this is purely code refactoring, it doesn't change any functionality in
scx_rustland
.Moreover, a new example scheduler,
scx_rlfifo
is provided as a simple template to show how to usescx_rustland_core
.Why not merge
scx_rustland_core
intoscx_utils
?In general any scheduler that is made of components running in user-space may encounter deadlocks if they perform allocations in critical non-blocking paths (user-space may hit a page fault, blocking the scheduler, a kthread needs to run to resolve a page fault, but the scheduler is blocked => deadlock).
For this reason importing
scx_rustland_core
transparently enables a custom global allocator that replaces the default Rust allocator, operating on a pre-allocated mlock-ed memory area.We don't want to enforce this constraint to all existing Rust schedulers, so the separate crate gives the possibility to enable this behavior selectively to specific user-space Rust schedulers (only
scx_rustland_core
andscx_rlfifo
for now).Is
scx_rustland_core
a complete standalone crate that allows to implement user-space schedulers in Rust?Almost. The
main.bpf.c
andbpf.rs
components are not pre-compiled in the crate. They still need to be imported from the scheduler's source code (at least I haven't found yet a way to pre-compile them and I don't even know if it's possible). So, for now these two files have been moved to thescx_rustland_core
dir and they are just included as source from the user-space Rust schedulers.TODOs
scx_utils
andscx_rustland_core
(implement aBpfUserBuilder
wrapper)scx_rustland_core
(BPF arena can probably help on this)BPF_MAP_TYPE_RINGBUF
, we can't do the same in the dispatch path, since the producer is user-space and consumer is BPF; BPF arena can probably be helpful also for this part)scx_userland
to reuse the samemain.bpf.c
fromscx_rustland_core
(since the BPF part is not strictly Rust-dependent it can be used to implement also user-space schedulers in C)Edit: renamed scx_user -> scx_rustland_core, rename scx_fifo -> scx_rlfifo, misc doc updates.