-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scx_stats, scx_layered: Implement independent stats client sessions #514
Conversation
This allows stats reader to maintain persistent states per connection.
…mmunication channel This will make implementing connection sessions easier where each stats client connection maintains a set of states.
… with Arc<Mutex<>>
Instead of keeping one copy of sched_stats, each stats server session carries their own so that stats can be generated independently by each client at any interval. CPU allocation min/max tracking is broken for now.
open_ops tracks which ops have been opened by the client session; however, it was being created on each handle_request() making every request to open each time. Fix it by moving it to the caller.
So that new stats session doesn't start with an inflated utilization number.
With this, every client sees the correct nr_layer_cpus_ranges without interfering with each other.
let (req_pair, res_pair) = ChannelPair::<Req, Res>::bidi(); | ||
match add_req.send(res_pair) { | ||
Ok(()) => debug!("sent new channel to proxy"), | ||
Err(e) => warn!("ScxStatsServer::proxy() failed ({})", &e), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe this is kind of meta, but I wonder if it is worth adding a set of stats on failures on the stats server itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, some statistics can be useful. I'll put that on the todo list.
- If --monitor is specified with layer specs, the scheduler also starts stats monitoring on a thread. - Standalone monitoring mode no longer exits when the scheduler isn't there.
}; | ||
retry_cnt = 0; | ||
|
||
while !shutdown.load(Ordering::Relaxed) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this inner one needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so. Otherwise, the inner loop will never exit on ctrl-c.
info!("Stats server not avaliable, retrying..."); | ||
} | ||
retry_cnt += 1; | ||
sleep(Duration::from_secs(1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe give up after a max number of retries?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know. After how many? I think the mental model is like following a log file. Wouldn't it make sense to just let it sit there until told to exit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, I see the flow of this now... Yeah, this makes sense!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems legit. I think the unbounded channel is a little scary from a resource leaking perspective, but imo is easier to handle/debug something that could block.
We should be able to switch to bounded channels with 0 size without noticeable behavior changes. There's only supposed to be one message in flight in each connection pair at any given time and all the operations are fully synchronous (one send followed by one receive or vice-versa). Yeah, using bounded is probably better here. Will do that later. |
Currently, scx_layered generates stats at a fixed interval and every stats client reads out the content of the latest update. This means that a client whose poll interval is longer than the scheduler's would see sampled stats (e.g. the average and nr_cpus ranges wouldn't represent the whole period) and a client with shorter poll interval may see the same outputs repeated.
This PR implements
ScxStatsServer::add_stats_ops()
which enables the user to specifyopen()
andclose()
closures. Theopen()
closure creates aread()
closure per client session which can carry mutable states. In addition, it provides multiplexed communication channels between the clients and the server user which can be useful when implementing stateful stats generation.scx_layered
is updated to make use of the new feature. After this PR, every client always gets accurate stats, including the time averages and period nr_layer_cpus min/max tracking, independent of each other.