Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HT-3211: push job metrics to pushgateway #145

Merged
merged 2 commits into from
Nov 19, 2021
Merged

HT-3211: push job metrics to pushgateway #145

merged 2 commits into from
Nov 19, 2021

Conversation

aelkiss
Copy link
Member

@aelkiss aelkiss commented Nov 11, 2021

  • uses pushgateway via waypoint - this is probably not the final shape
    we want, but it's a convenient entry point for now since both are
    concerned with tracking progress. A better idea might be a progress
    tracker object that delegates to waypoint but also reports to the
    pushgateway.

  • no specs for the pushgateway integration yet

  • we probably want to label the job completion time with the expected
    job frequency

@aelkiss aelkiss force-pushed the HT-3211-pushgateway branch 2 times, most recently from bd7ef97 to 5e38380 Compare November 16, 2021 22:55
@aelkiss aelkiss marked this pull request as ready for review November 16, 2021 22:55
@aelkiss aelkiss requested review from jsteverman and mwarin and removed request for jsteverman November 16, 2021 22:55
@aelkiss
Copy link
Member Author

aelkiss commented Nov 16, 2021

I'd suggest looking at the two commits individually, since the second one is a bit noisy and perhaps some may be hard to follow. We can discuss if needed.

@aelkiss aelkiss force-pushed the HT-3211-pushgateway branch from 5e38380 to a8f201a Compare November 17, 2021 16:44
This uses the pushgateway via decorator for waypoint, which means
individual jobs don't need to change.

This includes a 'success interval' metric, which can be used to help
make a generic prometheus alert - we can alert by looking for jobs where
current time - job_last_success > job_expected_success_interval
This fixes issues with the logger getting set to STDERR (thus polluting
test output) as well as with unexpected external HTTP calls.

- Consistently wrap stuff in bin scripts in "main" method to avoid side
effects when loading in specs (in particular to avoid changing stuff in
Services). This then leads to rubocop complaints about stuff in main,
which is probably reasonable, but also not really caused by this commit
- hence added to .rubocop_todo.yml.

- Avoid use of leaky constant BATCH_SIZE

- Avoid things that happened when autoscrub classes or specs were loaded
rather than when a specific spec is run or a class is instantiated -- in
particular fetching the max OCN

- Use webmock to stub logging calls to Slack and to the pushgateway
@aelkiss aelkiss force-pushed the HT-3211-pushgateway branch from a8f201a to a0834d8 Compare November 17, 2021 16:50
@aelkiss
Copy link
Member Author

aelkiss commented Nov 17, 2021

This is ready for review now, including tracking the expected interval between completions. See description on the individual commits.

@aelkiss aelkiss changed the title HT-3211: WIP: push job metrics to pushgateway HT-3211: push job metrics to pushgateway Nov 17, 2021
Copy link
Contributor

@jsteverman jsteverman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actual changes are pretty limited. No objections.

@aelkiss
Copy link
Member Author

aelkiss commented Nov 17, 2021

Great. @mwarin if this looks good to you too (especially see the cleanup in autoscrub) I'll go ahead and merge, and (hopefully) we can see those metrics start showing up in prometheus. We'll need to set the expected completion interval via environment variable for each job (over in ht_tanka) before we would start getting alerts. Later down the road we can look as necessary at routing alerts for holdings jobs.

@aelkiss aelkiss merged commit a6d9e13 into main Nov 19, 2021
@aelkiss aelkiss deleted the HT-3211-pushgateway branch November 19, 2021 17:41
@aelkiss
Copy link
Member Author

aelkiss commented Nov 19, 2021

@mwarin I went ahead and merged this, but let us know if you have any feedback on the autoscrub stuff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants