-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rusty: Integrate stats with the metrics framework #377
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks very similar to what I added to scx_layered. I'm mostly happy with the approach, but there's still a fair bit of boilerplate. One idea I kicked around was making a procedural macro to define stats from a Struct which would generate a lot of this boilerplate.
scheds/rust/scx_rusty/src/main.rs
Outdated
counter!("dispatched_tasks_count", "type" => "direct_greedy_far").increment(direct_greedy_far); | ||
counter!("dispatched_tasks_count", "type" => "dsq").increment(dsq); | ||
counter!("dispatched_tasks_count", "type" => "greedy_local").increment(greedy_local); | ||
counter!("dispatched_tasks_count", "type" => "greedy_xnuma").increment(greedy_xnuma); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This registers a new counter on each report()
. I would think we want to register the counters ahead of time and only set their absolute value here, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me look into that and see if we'd benefit from instantiating them once and reusing them. I was under the impression that the library authors want this pattern and have optimized the implementation to avoid the perceived overhead of registering new counters on each call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dschatzberg I created a Metrics
struct to hold the metrics and made it a member of the Scheduler struct. We now only register the metrics once.
scheds/rust/scx_rusty/src/main.rs
Outdated
); | ||
|
||
let kick_greedy = stat(bpf_intf::stat_idx_RUSTY_STAT_KICK_GREEDY); | ||
let repatriate = stat(bpf_intf::stat_idx_RUSTY_STAT_REPATRIATE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're not exposing these to the metrics backend?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can add them in this PR as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
stat_pct(bpf_intf::stat_idx_RUSTY_STAT_DL_CLAMP), | ||
stat_pct(bpf_intf::stat_idx_RUSTY_STAT_DL_PRESET), | ||
stat_pct(dl_clamped), | ||
stat_pct(dl_preset), | ||
); | ||
|
||
info!("slice_length={}us", self.tuner.slice_ns / 1000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it's possible to make all this logging of stats into a metrics backend itself. That would reduce some of the boilerplate needed to add more stats.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, absolutely. My plan for the next PR was to make a terminal log exporter and have that own this logic.
@dschatzberg can you give an idea of what that macro would look like? In general, I prefer not to add additional abstractions to metric clients and to be as close to the API as possible. But I'd be open to adding a macro if you give me some ideas of what it would look like. |
This looks excellent, thanks again for working on it. Once we've addressed the few things that @dschatzberg pointed out, I think we can land this and iterate in tree. Something I do think we'll want to consider addressing longer-term (which we already discussed offline, but just recording here for others' benefit): I do think that we could benefit from having a metrics crate that abstracts some of this stuff and avoids boilerplate, as I expect that a lot of the schedulers will be exporting stats in nearly the same way. In general it seems like schedulers do the following:
If we could somehow make that declarative (sort of similar to https://github.com/sched-ext/scx/blob/main/scheds/c/scx_nest_stats_table.h and https://github.com/sched-ext/scx/blob/main/scheds/c/scx_nest.c#L210-L219 in In any case, this LG for now! Once Dan's changes are addressed, I'll stamp and merge. |
This looks better than OM and no objection from me. Some things to consider for the future:
|
@Byte-Lab @htejun agreed and aligned on the long-term vision. My plan is to use @dschatzberg I applied your feedback and I also created a Struct to hold the metrics. If you are not opposed, I'd like to first create a terminal logging backend so that we can clean up the |
@jfernandez Yeah, that's totally fine by me. The idea I had regarding a macro was to make it more like the BTW, do you need .absolute() and not .increment() now that the counters are pre-registered? |
Ah yeah, I'm familiar with this pattern and I like it as well. I will explore this when I focus on removing boiler plate.
@dschatzberg no, How these monotonic counters are exported is up to the target backend. For Prometheus, since it's a poll-based system, the counter increases the value in memory until there is a scrape event, then it gets reset back to 0. Then the Prometheus backend simply adds to the value and you need to chart it with For something like Netflix's |
I pushed one more change, I forgot to append |
We need a layer of indirection between the stats collection and their output destinations. Currently, stats are only printed to stdout. Our goal is to integrate with various telemetry systems such as Prometheus, StatsD, and custom metric backends like those used by Meta and Netflix. Importantly, adding a new backend should not require changes to the existing stats code. This patch introduces the `metrics` [1] crate, which provides a framework for defining metrics and publishing them to different backends. The initial implementation includes the `dispatched_tasks_count` metric, tagged with `type`. This metric increments every time a task is dispatched, emitting the raw count instead of a percentage. A monotonic counter is the most suitable metric type for this use case, as percentages can be calculated at query time if needed. Existing logged metrics continue to print percentages and remain unchanged. A new flag, `--enable-prometheus`, has been added. When enabled, it starts a Prometheus endpoint on port 9000 (default is false). This endpoint allows metrics to be charted in Prometheus or Grafana dashboards. Future changes will migrate additional stats to this framework and add support for other backends. [1] https://metrics.rs/ Signed-off-by: Jose Fernandez <[email protected]>
One more change, I got the convention wrong. It's |
@jfernandez I'm a bit confused here. These Edit: I see why we'd want to increment here instead of setting absolute if it's scraped and reset by the target backend, but if we do that I think we'd need to track what the actual increment was relative to the last time we collected the stats.
Hmm, I'm still not quite following how we won't need state to track this (meaning, we record what the value was last time, and then increment based on that). Even if we emit the metric on every call to
|
@Byte-Lab ah! I assumed that the bpf stats were being reset for each loop. If they are always incrementing, then I need to rework this. Let me get back to you on this. |
Sorry @jfernandez, as we discussed on slack you weretotally correct. The values are reset on each read here: https://github.com/sched-ext/scx/blob/main/scheds/rust/scx_rusty/src/main.rs#L433-L435. What you have now LG. |
We need a layer of indirection between the stats collection and their output destinations. Currently, stats are only printed to stdout. Our goal is to integrate with various telemetry systems such as Prometheus, StatsD, and custom metric backends like those used by Meta and Netflix. Importantly, adding a new backend should not require changes to the existing stats code.
This patch introduces the
metrics
[1] crate, which provides a framework for defining metrics and publishing them to different backends.The initial implementation includes the
dispatched_tasks_count
metric, tagged withtype
. This metric increments every time a task is dispatched, emitting the raw count instead of a percentage. A monotonic counter is the most suitable metric type for this use case, as percentages can be calculated at query time if needed. Existing logged metrics continue to print percentages and remain unchanged.A new flag,
--enable-prometheus
, has been added. When enabled, it starts a Prometheus endpoint on port 9000 (default is false). This endpoint allows metrics to be charted in Prometheus or Grafana dashboards.Example of charting 1s rate of dispatched tasks by type:
Future changes will migrate additional stats to this framework and add support for other backends.
[1] https://metrics.rs/