Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add resource journal #6586

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

garlick
Copy link
Member

@garlick garlick commented Jan 28, 2025

This adds a resource journal streaming RPC similar the one offered by the job manager. Just a first cut at this point.

This doesn't change what's posted to the persistent resource.eventlog in the KVS but does add one new event called restart that's only for journal consumption. It provides a baseline for mapping execution targets to hostnames in the current instance, and sets the initial online set after a restart.

Unlike the job manager journal, this doesn't have as much volume to deal with so no options for event filtering or skipping historical data are provided as yet.

flux resource eventlog can be used to dump and optionally follow this log. This is not currently polished at all - it just dumps the events in JSON form, one per line.

For more detail on what's in this log and how the journal is formatted, see the proposed RFC:

Problem: the reslog class has no way to access the resource inventory,
but it will be useful to send journal consumers a copy of R when
the resource-define event is emitted.

Pass the resource_ctx to reslog_create() instead of just the flux_t
handle.  Adjust internal uses of the flux_t handle to get it via
reslog->ctx->h instead of reslog->h.
Problem: the full resource eventlog, including online/offline
events that are not committed to the KVS, may need to be monitored.

Keep events in a json array in memory, including the events that
were read from the KVS at startup, if any.

Filter out any historical resource-define events.  These are meant for
synchronization on the availability of R and that only pertains to the
current instance.
@garlick garlick force-pushed the resource_journal branch 2 times, most recently from 46bf4a3 to 5e12001 Compare January 30, 2025 22:04
Problem: there is no way to observe the journal in real time, with
non-persistent online/offline events included.

Add a resource.journal RPC with protocol similar to the job manager
journal.
Problem: a resource journal consumer will get online/offline events
before knowing the size of the instance or the hostname mapping.

Post a 'restart' event when the resource module is loaded with
the following keys:

ranks
  An idset containing all valid ranks: 0 to size-1

online
  An idset containing any ranks that are initially online.
  This is normally empty except when starting with monitor-force-up in test.

nodelist
  Contents of the hostlist broker attribute

This event is not made persistent in the KVS resource.eventlog.
@garlick
Copy link
Member Author

garlick commented Jan 31, 2025

I added a quick --wait=EVENT option to flux resource eventlog and a sharness test. Will remove WIP - maybe good for a start that we can experiment with a bit?

@garlick garlick changed the title WIP: add resource journal add resource journal Jan 31, 2025
@grondo
Copy link
Contributor

grondo commented Jan 31, 2025

Yeah, let me see if I can generalize the JournalConsumer class to also work with the resource eventlog.

Problem: there is no convenient tool for accessing the resource journal.

Add flux resource eventlog [--follow] [--wait=EVENT].
Problem: there are no tests for the resource journal or the
flux resource eventlog command.

Add a sharness test for this purpose.
Problem: flux resource eventlog has no documentation.

Add an entry to the man page.
Copy link

codecov bot commented Jan 31, 2025

Codecov Report

Attention: Patch coverage is 67.51592% with 51 lines in your changes missing coverage. Please review.

Project coverage is 79.45%. Comparing base (fb2f0ac) to head (e047195).
Report is 6 commits behind head on master.

Files with missing lines Patch % Lines
src/modules/resource/reslog.c 59.43% 43 Missing ⚠️
src/cmd/flux-resource.py 81.48% 5 Missing ⚠️
src/modules/resource/monitor.c 90.00% 2 Missing ⚠️
src/modules/resource/resource.c 75.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6586      +/-   ##
==========================================
- Coverage   79.47%   79.45%   -0.02%     
==========================================
  Files         531      531              
  Lines       88433    88574     +141     
==========================================
+ Hits        70282    70379      +97     
- Misses      18151    18195      +44     
Files with missing lines Coverage Δ
src/modules/resource/upgrade.c 67.44% <ø> (ø)
src/modules/resource/resource.c 86.66% <75.00%> (+0.13%) ⬆️
src/modules/resource/monitor.c 70.00% <90.00%> (+2.50%) ⬆️
src/cmd/flux-resource.py 94.26% <81.48%> (-0.78%) ⬇️
src/modules/resource/reslog.c 69.45% <59.43%> (-5.38%) ⬇️

... and 4 files with indirect coverage changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants