Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce scx_rustland_core: a generic layer to implement user-space schedulers in Rust #161

Merged
merged 8 commits into from
Feb 28, 2024

Conversation

arighi
Copy link
Contributor

@arighi arighi commented Feb 25, 2024

The idea is to decouple the generic scheduling-agnostic backend of scx_rustland from the scheduler itself, move the backend to a separate crate (scx_rustland_core), and make it available to implement other user-space schedulers as pure Rust front-ends.

Of course this process will require multiple iterations, we need to define and document a proper API, etc., but it doesn't need to be perfect right away and I think the "rustland backend" is mature enough now to (try to) become a generic subsystem on its own. Note that all of this is purely code refactoring, it doesn't change any functionality in scx_rustland.

Moreover, a new example scheduler, scx_rlfifo is provided as a simple template to show how to use scx_rustland_core.

Why not merge scx_rustland_core into scx_utils?

In general any scheduler that is made of components running in user-space may encounter deadlocks if they perform allocations in critical non-blocking paths (user-space may hit a page fault, blocking the scheduler, a kthread needs to run to resolve a page fault, but the scheduler is blocked => deadlock).

For this reason importing scx_rustland_core transparently enables a custom global allocator that replaces the default Rust allocator, operating on a pre-allocated mlock-ed memory area.

We don't want to enforce this constraint to all existing Rust schedulers, so the separate crate gives the possibility to enable this behavior selectively to specific user-space Rust schedulers (only scx_rustland_core and scx_rlfifo for now).

Is scx_rustland_core a complete standalone crate that allows to implement user-space schedulers in Rust?

Almost. The main.bpf.c and bpf.rs components are not pre-compiled in the crate. They still need to be imported from the scheduler's source code (at least I haven't found yet a way to pre-compile them and I don't even know if it's possible). So, for now these two files have been moved to the scx_rustland_core dir and they are just included as source from the user-space Rust schedulers.

TODOs

  • avoid code duplication between scx_utils and scx_rustland_core (implement a BpfUserBuilder wrapper)
  • improve documentation, examples, in particular document the API properly
  • provide an API to expose the task's cpumask to the user-space via scx_rustland_core (BPF arena can probably help on this)
  • reduce dispatch overhead (the enqueue path has been improved already using a BPF_MAP_TYPE_RINGBUF, we can't do the same in the dispatch path, since the producer is user-space and consumer is BPF; BPF arena can probably be helpful also for this part)
  • I would also like to change scx_userland to reuse the same main.bpf.c from scx_rustland_core (since the BPF part is not strictly Rust-dependent it can be used to implement also user-space schedulers in C)

Edit: renamed scx_user -> scx_rustland_core, rename scx_fifo -> scx_rlfifo, misc doc updates.

Copy link
Contributor

@Byte-Lab Byte-Lab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome, thanks for working on this! Left a few comments


// Copy one item from the ring buffer.
fn callback(data: &[u8]) -> i32 {
unsafe {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to ignore, but it's probably good for us to get into the habit of adding safety comments: https://std-dev-guide.rust-lang.org/policy/safety-comments.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely! Considering that it's not trivial to understand what's happening under the hood here, the more comments we add the better (especially about memory safety). I'll add a proper comment.

// Return an unsupported error to stop early and consume only one item.
//
// NOTE: this is quite a hack. I wish libbpf would honor stopping after the first item
// is consumed, upon returnin a non-zero positive value here, but it doesn't seem to be
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/returnin/returning

Comment on lines 177 to 179
// https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/lib/bpf/ringbuf.c?h=v6.8-rc5#n260
//
// Maybe we should fix this to stop processing items from the ring buffer also when a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, agreed. This is what we do for the kernel -> user ringbuffer: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/bpf/ringbuf.c#n764.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, and IIRC the loop stops if the callback returns a value > 0, you don't necessarily need to pass a value that < 0, I think the same should be done in libbpf (and libbpf-rs as well). I'll try to propose a fix for that.

I already sent a PR for libbpf-rs (libbpf/libbpf-rs#680) to add new consume_raw() and poll_raw() functions (now merged).

With all these pieces in place we would be able to remove this hack and also be able to read multiple items at once if we want, using the retcode of consume_raw() to determine the amount of tasks received.

}

// Override the default scheduler time slice (in us).
#[allow(dead_code)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it required to use #[allow(dead_code)] for pub functions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, if I build a scheduler that doesn't use a pub method, the compiler complains that the method is unused. But if I build scx_user as a crate, then it doesn't complain. I'm not sure if there's a better way to tell the compiler (or our build scripts) that this is always a crate, therefore it shouldn't complain about unused pub functions... I'll investigate a bit.

Comment on lines 59 to 61
// Notify the BPF component that all tasks have been dispatched.
*self.bpf.nr_queued_mut() = 0;
*self.bpf.nr_scheduled_mut() = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there's any way to keep this internal to the library? It feels like a user shouldn't have to manually set these skeleton fields in order for the scheduling framework to work correctly, but I'd need to think about it more.

Not sure if we want to do something similar here, but in rhone we use a callback mechanism whereby the core framework calls into the caller-specified callback when a task is enqueued: https://github.com/Decave/rhone/blob/main/rhone_sched.c#L423, and we have a callback that we invoke for when a core is going idle: https://github.com/Decave/rhone/blob/main/rhone_sched.c#L438.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's preferable to allow end users to determine the program structure if possible. Would just adding clear methods be enough here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh! I like the callback idea! And I agree that setting these vars to 0 is quite ugly, we should have at least an API for that. I'll work on that!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's preferable to allow end users to determine the program structure if possible. Would just adding clear methods be enough here?

Maybe we can do both, provide a clear method (that's easy) and later on evaluate the possibility to have also a callback.

Copy link
Contributor

@htejun htejun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can evolve the crate in tree, so code-wise, it all looks fine to me. We need to be a bit more careful about names tho. e.g. I don't think we want to allocate the scx_fifo name to the rustland implementation and it's difficult to reclaim names once published to creates.io. Code looks fine to me but please reconsider names before merging.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm... it's bothersome that this create is replicating parts of scx_utils. It'd be better if this comes from scx_utils instead. Given that the only thing this crate needs to add is the bpf.rs file, wouldn't something like the following work?

  • Pack the BPF source files to be included by the scheduler implementations as strings.
  • Implement BpfBuilder wrapper, say, BpfUserBuilder which also unpacks the above source files in the same directories as other header files. This may need some updates to BpfBuilder so that scx_user can do it in multiple steps and query the directories and so on. That hopefully should be relatively straight-forward.
  • Users can simply use BpfUserBuilder instead of BpfBuilder and the only difference is that the extra source files are available in the build directory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I was also skeptical about naming it scx_fifo, how about scx_userfifo?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I like the idea of a BpfUserBuilder wrapper! I'll work on that.

rust/meson.build Outdated
@@ -1 +1,2 @@
subdir('scx_utils')
subdir('scx_user')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether scx_rustland name is better suited for this crate so that the framework has a distinctive name. Then, the schedulers which use this framework can be named like e.g. scxrl_fifo or sth like that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm glad that you're saying this. In my initial version I actually used scx_rustland for this crate, but I wasn't 100% convinced about reusing the name for something else... Maybe scx_rustland should just evolve into this meta-scheduler crate that allows to write schedulers in Rust?

And then we can unlock a new namespace that would even allow to potentially re-implement the same scheduling policies that we have in C / BPF, without conflicting with the naming.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be fine to morph scx_rustland into the framework crate and separate out actual scheduler implementations and given them a distinct namespace. That was the intention of the scheduler after all. But yeah, as long as we give the framework a sufficiently identifying name and give the example scheduler implementations on top accordingly scoped names, we should be fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated this PR (only locally for now) to rename scx_user -> scx_rustland, scx_fifo -> scxrl_fifo and scx_rustland -> scxrl_vtime.

However, I still have conflicting opinions about this renaming.

I don't see any problem with user and fifo, but about rustland, we have docs and press material (blog posts, articles, etc.) that link the scx_rustland directory in this repo directly. We are going to break the existing scx_rustland crate (or at least change its nature, from being a scheduler to a framework). And we also have the scx systemd service that has the name scx_rustland.

If we rename the scheduler we're going to break all of the above potentially.

So, I was considering the following options:

  1. rename everything as planned, keep scheds/rust/scx_rustland/README.md to explain the renaming, add some logic to the scx service script to print a warning if someone is trying to use scx_rustland as a scheduler (not sure how to handle the change in the crate in this case)
  2. rename scx_user -> scx_rustland_core, keep scx_rustland as it is, rename scx_fifo -> scx_rlfifo

I'm more inclined to go with option 2 (at the end scx_rustland_core is not a bad name for a user-space Rust scheduling framework) and we have the benefits of not breaking anything in the existing docs, articles, links, tools, etc.

Opinions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either sounds fine to me. Your call.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack, in the meantime I've pushed an update to go with option 2. In this way we're not breaking anything and we're not allocating scx_fifo (the "rustland FIFO" example is now called scx_rlfifo).

I'd like to see if I can reduce the code duplication between scx_utils and scx_rustland_core and then I think we may be ready to merge this and keep improving this stuff in tree.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...and with the last push I think I've addressed/fixed all the other comments, so the only thing that I'd like to look at before merging this PR is the BpfUserBuilder wrapper.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, with 3b4df81 on top I think we have a fairly reasonable first implementation of the scx_rustland_core framework.

Thinking more about shpping the pre-compiled BPF component of scx_rustland_core in the crate, I think I actually prefer to embed main.bpf.c as source code in the crate (as it is right now), because in this way it'll be always compiled with the same toolchain that is used for the scheduler, that seems more robust.

Comment on lines 59 to 61
// Notify the BPF component that all tasks have been dispatched.
*self.bpf.nr_queued_mut() = 0;
*self.bpf.nr_scheduled_mut() = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's preferable to allow end users to determine the program structure if possible. Would just adding clear methods be enough here?

@arighi arighi changed the title RFC: introduce scx_user: a generic layer to implement user-space schedulers in Rust Introduce scx_rustland_core: a generic layer to implement user-space schedulers in Rust Feb 27, 2024
@htejun
Copy link
Contributor

htejun commented Feb 27, 2024

LGTM but is rust/scx_rustland_core/build.rs still needed?

@arighi
Copy link
Contributor Author

arighi commented Feb 27, 2024

LGTM but is rust/scx_rustland_core/build.rs still needed?

meh! I forgot about that part 😅

At the moment yes, if we want to build scx_rustland_core as a standalone crate. I'll take a look and see if we can reuse somehow the build from scx_utils. I also don't know if the standalone crate would include the sources properly... hm.. I'll double check.

Thanks for noticing it!

@arighi arighi force-pushed the scx-user branch 2 times, most recently from bf657ed to 85d453c Compare February 28, 2024 10:11
@arighi
Copy link
Contributor Author

arighi commented Feb 28, 2024

Added 3 more commits on top to prevent code duplication and generate all the required source files for the rustland-based schedulers in-tree via the new RustLandBuilder() helper:

  • scx_rustland_core: generate source files in-tree
  • scx_utils: introduce Builder()
  • scx_rustland_core: introduce RustLandBuilder()

I think I'm pretty happy with this PR for now, even if it seems a big change, excluding the new scx_rlfifo scheduler, docs, comments, etc. it only add ~100 lines of code, that is not bad at all for the functionality it provides.

So, if there's no objection I'll go ahead and merge this.

authors = ["Andrea Righi <[email protected]>"]
license = "GPL-2.0-only"
repository = "https://github.com/sched-ext/scx"
description = "Framework to implement sched_ext schedulers running in user-space"
Copy link
Contributor

@Byte-Lab Byte-Lab Feb 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI / completely optional nit: if we want to be grammatically precise, user-space should be written as "user space" here, and in other places where it's not being used as a compound adjective. A hyphen is only used like this when part of a compound adjective. In other words, to differentiate the meaning of the adjective from the words used separately, such as fire proof suit -> fire-proof suit, left handed boxer -> left-handed boxer, user space scheduler -> user-space scheduler, etc. Just user space on its own is a noun, not a compound adjective describing something like a type of scheduler.

(Fun fact -- I learned this when LWN editors were reviewing my articles and pointed it out to me)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is really interesting! Thanks for sharing it (will fix it).

scx_utils = { path = "../scx_utils", version = "0.6" }

[build-dependencies]
#bindgen = ">=0.68, <0.70"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a comment explaining why this is commented out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one can be removed, I left it commented while testing to determine which crates could be removed (will fix as well).

Copy link
Contributor

@Byte-Lab Byte-Lab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I'm still conflicted about the update_tasks() part of the API, but it's also works perfectly fine, and it's something we can iterate on later if we choose to.

Comment on lines 53 to 57
Ok(None) => {
// Notify the BPF component that all tasks have been dispatched.
*self.bpf.nr_queued_mut() = 0;
*self.bpf.nr_scheduled_mut() = 0;

break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be updated to say update_tasks(Some(0), Some(0))?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely (will fix also this one).

Andrea Righi added 8 commits February 28, 2024 17:49
Introduce a separate crate (scx_rustland_core) that can be used to
implement sched-ext schedulers in Rust that run in user-space.

This commit only provides the basic layout for the new crate and the
abstraction to the custom allocator.

In general, any scheduler that has a user-space component needs to use
the custom allocator to prevent potential deadlock conditions, caused by
page faults (a kthread needs to run to resolve the page fault, but the
scheduler is blocked waiting for the user-space page fault to be
resolved => deadlock).

However, we don't want to necessarily enforce this constraint to all the
existing Rust schedulers, some of them may do all user-space allocations
in safe paths, hence the separate scx_rustland_core crate.

Merging this code in scx_utils would force all the Rust schedulers to
use the custom allocator.

In a future commit the scx_rustland backend will be moved to
scx_rustland_core, making it a totally generic BPF scheduler framework
that can be used to implement user-space schedulers in Rust.

Signed-off-by: Andrea Righi <[email protected]>
Move the BPF component of scx_rustland to scx_rustland_core and make it
available to other user-space schedulers.

NOTE: main.bpf.c and bpf.rs are not pre-compiled in the
scx_rustland_core crate, they need to be included in the user-space
scheduler's source code in order to be compiled/linked properly.

Signed-off-by: Andrea Righi <[email protected]>
scx_rustland has significantly evolved since its original design.

With the introduction of scx_rustland_core and the inclusion of the
scx_rlfifo example, scx_rustland's focus can be shifted from solely
being an "easy-to-read Rust scheduler template" to a fully functional
scheduler.

For this reason, update the README and documentation to reflect its
revised design, objectives, and intended use cases.

Signed-off-by: Andrea Righi <[email protected]>
Implement a FIFO scheduler as an example usage of scx_rustland_core.

Signed-off-by: Andrea Righi <[email protected]>
Introduce a helper function to update the counter of queued and
scheduled tasks (used to notify the BPF component if the user-space
scheduler has still some pending work to do).

Signed-off-by: Andrea Righi <[email protected]>
Introduce a wrapper to scx_utils::BpfBuilder that can be used to build
the BPF component provided by scx_rustland_core.

The source of the BPF components (main.bpf.c) is included in the crate
as an array of bytes, the content is then unpacked in a temporary file
to perform the build.

The RustLandBuilder() helper is also used to generate bpf.rs (that
implements the low-level user-space Rust connector to the BPF
commponent).

Schedulers based on scx_rustland_core can simply use RustLandBuilder(),
to build the backend provided by scx_rustland_core.

Signed-off-by: Andrea Righi <[email protected]>
Introduce a Builder() class in scx_utils that can be used by other scx
crates (such as scx_rustland_core) to prevent code duplication.

Signed-off-by: Andrea Righi <[email protected]>
There is no need to generate source code in a temporary directory with
RustLandBuilder(), we can simply generate code in-tree and exclude the
generated source files from .gitignore.

Having the generated source files in-tree can help to debug potential
build issues (and it also allows to drop the the tempfile crate
dependency).

Signed-off-by: Andrea Righi <[email protected]>
@arighi
Copy link
Contributor Author

arighi commented Feb 28, 2024

@Decave thanks tons for the review, I think I've fixed all the issues that you found. About update_tasks() I'm also not 100% happy about it, but we can refine it later and if we break the API at this stage it's not too critical.

@Byte-Lab Byte-Lab merged commit 7278d88 into main Feb 28, 2024
1 check passed
@Byte-Lab Byte-Lab deleted the scx-user branch March 14, 2024 18:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants