HT-3211: push job metrics to pushgateway #145

aelkiss · 2021-11-11T15:55:03Z

uses pushgateway via waypoint - this is probably not the final shape
we want, but it's a convenient entry point for now since both are
concerned with tracking progress. A better idea might be a progress
tracker object that delegates to waypoint but also reports to the
pushgateway.
no specs for the pushgateway integration yet
we probably want to label the job completion time with the expected
job frequency

aelkiss · 2021-11-16T22:56:13Z

I'd suggest looking at the two commits individually, since the second one is a bit noisy and perhaps some may be hard to follow. We can discuss if needed.

This uses the pushgateway via decorator for waypoint, which means individual jobs don't need to change. This includes a 'success interval' metric, which can be used to help make a generic prometheus alert - we can alert by looking for jobs where current time - job_last_success > job_expected_success_interval

This fixes issues with the logger getting set to STDERR (thus polluting test output) as well as with unexpected external HTTP calls. - Consistently wrap stuff in bin scripts in "main" method to avoid side effects when loading in specs (in particular to avoid changing stuff in Services). This then leads to rubocop complaints about stuff in main, which is probably reasonable, but also not really caused by this commit - hence added to .rubocop_todo.yml. - Avoid use of leaky constant BATCH_SIZE - Avoid things that happened when autoscrub classes or specs were loaded rather than when a specific spec is run or a class is instantiated -- in particular fetching the max OCN - Use webmock to stub logging calls to Slack and to the pushgateway

aelkiss · 2021-11-17T19:19:26Z

This is ready for review now, including tracking the expected interval between completions. See description on the individual commits.

jsteverman

Actual changes are pretty limited. No objections.

aelkiss · 2021-11-17T20:51:33Z

Great. @mwarin if this looks good to you too (especially see the cleanup in autoscrub) I'll go ahead and merge, and (hopefully) we can see those metrics start showing up in prometheus. We'll need to set the expected completion interval via environment variable for each job (over in ht_tanka) before we would start getting alerts. Later down the road we can look as necessary at routing alerts for holdings jobs.

aelkiss · 2021-11-19T17:41:44Z

@mwarin I went ahead and merged this, but let us know if you have any feedback on the autoscrub stuff.

aelkiss force-pushed the HT-3211-pushgateway branch 2 times, most recently from bd7ef97 to 5e38380 Compare November 16, 2021 22:55

aelkiss marked this pull request as ready for review November 16, 2021 22:55

aelkiss requested review from jsteverman and mwarin and removed request for jsteverman November 16, 2021 22:55

aelkiss force-pushed the HT-3211-pushgateway branch from 5e38380 to a8f201a Compare November 17, 2021 16:44

aelkiss added 2 commits November 17, 2021 11:50

aelkiss force-pushed the HT-3211-pushgateway branch from a8f201a to a0834d8 Compare November 17, 2021 16:50

aelkiss changed the title ~~HT-3211: WIP: push job metrics to pushgateway~~ HT-3211: push job metrics to pushgateway Nov 17, 2021

jsteverman approved these changes Nov 17, 2021

View reviewed changes

aelkiss merged commit a6d9e13 into main Nov 19, 2021

aelkiss deleted the HT-3211-pushgateway branch November 19, 2021 17:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HT-3211: push job metrics to pushgateway #145

HT-3211: push job metrics to pushgateway #145

aelkiss commented Nov 11, 2021

aelkiss commented Nov 16, 2021

aelkiss commented Nov 17, 2021

jsteverman left a comment

aelkiss commented Nov 17, 2021

aelkiss commented Nov 19, 2021

HT-3211: push job metrics to pushgateway #145

HT-3211: push job metrics to pushgateway #145

Conversation

aelkiss commented Nov 11, 2021

aelkiss commented Nov 16, 2021

aelkiss commented Nov 17, 2021

jsteverman left a comment

Choose a reason for hiding this comment

aelkiss commented Nov 17, 2021

aelkiss commented Nov 19, 2021