Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(api): add status to /mon/metrics #3500

Merged
merged 18 commits into from
Dec 2, 2018
Merged

feat(api): add status to /mon/metrics #3500

merged 18 commits into from
Dec 2, 2018

Conversation

yesnault
Copy link
Member

status{component="Global/Status",instance="cdsinstance",status="AL"} 1
status{component="Global/Version",instance="cdsinstance",status="OK"} 1
status{component="Global/api",instance="cdsinstance",status="AL"} 3
status{component="Global/dbmigrate",instance="cdsinstance",status="OK"} 0
status{component="Global/doc",instance="cdsinstance",status="WARN"} 2
status{component="Global/elasticsearch",instance="cdsinstance",status="OK"} 0
status{component="Global/hatchery",instance="cdsinstance",status="AL"} 0
status{component="Global/hooks",instance="cdsinstance",status="OK"} 1
status{component="Global/repositories",instance="cdsinstance",status="OK"} 0
status{component="Global/vcs",instance="cdsinstance",status="AL"} 1
status{component="api_0/CDSName",instance="cdsinstance",status="OK"} 1
status{component="api_0/Cache",instance="cdsinstance",status="OK"} 1
status{component="api_0/Database",instance="cdsinstance",status="OK"} 1
status{component="api_0/Event",instance="cdsinstance",status="OK"} 1
status{component="api_0/Hostname",instance="cdsinstance",status="OK"} 1
status{component="api_0/Internal Events Queue",instance="cdsinstance",status="OK"} 0
status{component="api_0/Nb of Panics",instance="cdsinstance",status="OK"} 0
status{component="api_0/Object-Store",instance="cdsinstance",status="OK"} 1
status{component="api_0/SMTP",instance="cdsinstance",status="AL"} 1
status{component="api_0/Scheduler",instance="cdsinstance",status="WARN"} 1
status{component="api_0/Sessions-Store",instance="cdsinstance",status="OK"} 1
status{component="api_0/Time",instance="cdsinstance",status="OK"} 1
status{component="api_0/Uptime",instance="cdsinstance",status="OK"} 1
status{component="api_0/Version",instance="cdsinstance",status="OK"} 1
status{component="api_0/Worker Model Errors",instance="cdsinstance",status="OK"} 0
status{component="api_1/CDSName",instance="cdsinstance",status="OK"} 1
status{component="api_1/Cache",instance="cdsinstance",status="OK"} 1
status{component="api_1/Database",instance="cdsinstance",status="OK"} 1
status{component="api_1/Event",instance="cdsinstance",status="OK"} 1
status{component="api_1/Hostname",instance="cdsinstance",status="OK"} 1
status{component="api_1/Internal Events Queue",instance="cdsinstance",status="OK"} 0
status{component="api_1/Nb of Panics",instance="cdsinstance",status="OK"} 0
status{component="api_1/Object-Store",instance="cdsinstance",status="OK"} 1
status{component="api_1/SMTP",instance="cdsinstance",status="AL"} 1
status{component="api_1/Scheduler",instance="cdsinstance",status="OK"} 1
status{component="api_1/Sessions-Store",instance="cdsinstance",status="OK"} 1
status{component="api_1/Time",instance="cdsinstance",status="OK"} 1
status{component="api_1/Uptime",instance="cdsinstance",status="OK"} 1
status{component="api_1/Version",instance="cdsinstance",status="OK"} 1
status{component="api_1/Worker Model Errors",instance="cdsinstance",status="OK"} 0
status{component="api_2/CDSName",instance="cdsinstance",status="OK"} 1
status{component="api_2/Cache",instance="cdsinstance",status="OK"} 1
status{component="api_2/Database",instance="cdsinstance",status="OK"} 1
status{component="api_2/Event",instance="cdsinstance",status="OK"} 1
status{component="api_2/Hostname",instance="cdsinstance",status="OK"} 1
status{component="api_2/Internal Events Queue",instance="cdsinstance",status="OK"} 0
status{component="api_2/Nb of Panics",instance="cdsinstance",status="OK"} 0
status{component="api_2/Object-Store",instance="cdsinstance",status="OK"} 1
status{component="api_2/SMTP",instance="cdsinstance",status="AL"} 1
status{component="api_2/Scheduler",instance="cdsinstance",status="OK"} 1
status{component="api_2/Sessions-Store",instance="cdsinstance",status="OK"} 1
status{component="api_2/Time",instance="cdsinstance",status="OK"} 1
status{component="api_2/Uptime",instance="cdsinstance",status="OK"} 1
status{component="api_2/Version",instance="cdsinstance",status="OK"} 1
status{component="api_2/Worker Model Errors",instance="cdsinstance",status="OK"} 0
status{component="hooksLocal/Time",instance="cdsinstance",status="OK"} 1
status{component="hooksLocal/Uptime",instance="cdsinstance",status="OK"} 1
status{component="hooksLocal/Version",instance="cdsinstance",status="OK"} 1
status{component="sample-service/sample-service/sample-service",instance="cdsinstance",status="WARN"} 1
status{component="sample-service2/sample-service2/sample-service2",instance="cdsinstance",status="WARN"} 1
status{component="vcsLocal/Github-RateLimit",instance="cdsinstance",status="AL"} 0
status{component="vcsLocal/Github-RateLimitRemaining",instance="cdsinstance",status="OK"} 5000
status{component="vcsLocal/Github-RateLimitReset",instance="cdsinstance",status="AL"} 1
status{component="vcsLocal/Time",instance="cdsinstance",status="OK"} 1
status{component="vcsLocal/Uptime",instance="cdsinstance",status="OK"} 1
status{component="vcsLocal/Version",instance="cdsinstance",status="OK"} 1

Signed-off-by: Yvonnick Esnault [email protected]

engine/api/metrics/metrics.go Outdated Show resolved Hide resolved
engine/api/metrics/metrics.go Outdated Show resolved Hide resolved
engine/api/metrics/metrics.go Outdated Show resolved Hide resolved
@yesnault yesnault force-pushed the ye-metrics-status branch 3 times, most recently from bf77cf5 to a133f52 Compare November 15, 2018 10:31
engine/api/api.go Outdated Show resolved Hide resolved
engine/api/observability/status.go Outdated Show resolved Hide resolved
engine/api/observability/metrics.go Outdated Show resolved Hide resolved
engine/api/observability/metrics.go Outdated Show resolved Hide resolved
engine/main.go Outdated Show resolved Hide resolved
engine/api/api.go Outdated Show resolved Hide resolved
engine/api/status.go Outdated Show resolved Hide resolved
@yesnault
Copy link
Member Author

@fsamin after serie refactor:

nb_applications{cds="cdsinstance"} 93
nb_artifacts{cds="cdsinstance"} 0
nb_groups{cds="cdsinstance"} 703
nb_max_workers_building{cds="cdsinstance"} 0
nb_pipelines{cds="cdsinstance"} 266
nb_projects{cds="cdsinstance"} 228
nb_users{cds="cdsinstance"} 664
nb_worker_models{cds="cdsinstance"} 1
nb_workflow_node_runs{cds="cdsinstance"} 240354
nb_workflow_runs{cds="cdsinstance"} 36228
nb_workflows{cds="cdsinstance"} 102
queue{cds="cdsinstance",range="10_less_10s",status="waiting"} 0
queue{cds="cdsinstance",range="20_more_10s_less_30s",status="waiting"} 0
queue{cds="cdsinstance",range="30_more_30s_less_1min",status="waiting"} 0
queue{cds="cdsinstance",range="40_more_1min_less_2min",status="waiting"} 0
queue{cds="cdsinstance",range="50_more_2min_less_5min",status="waiting"} 0
queue{cds="cdsinstance",range="60_more_5min_less_10min",status="waiting"} 0
queue{cds="cdsinstance",range="70_more_10min",status="waiting"} 4
queue{cds="cdsinstance",range="all",status="building"} 0
status{cds="cdsinstance",name="Global",status="AL",type="global"} 1
status_api{cds="cdsinstance",name="Global",status="AL",type="global"} 2
status_cache{cds="cdsinstance",name="api_craandprajen",status="OK",type="api"} 1
status_cache{cds="cdsinstance",name="api_muskir",status="OK",type="api"} 1
status_database{cds="cdsinstance",name="api_craandprajen",status="OK",type="api"} 1
status_database{cds="cdsinstance",name="api_muskir",status="OK",type="api"} 1
status_dbmigrate{cds="cdsinstance",name="Global",status="OK",type="global"} 0
status_doc{cds="cdsinstance",name="Global",status="WARN",type="global"} 2
status_elasticsearch{cds="cdsinstance",name="Global",status="OK",type="global"} 0
status_event{cds="cdsinstance",name="api_craandprajen",status="OK",type="api"} 1
status_event{cds="cdsinstance",name="api_muskir",status="OK",type="api"} 1
status_github_ratelimit{cds="cdsinstance",name="vcsLocal",status="AL",type="vcs"} 0
status_github_ratelimitremaining{cds="cdsinstance",name="vcsLocal",status="OK",type="vcs"} 5000
status_github_ratelimitreset{cds="cdsinstance",name="vcsLocal",status="AL",type="vcs"} 1
status_hatchery{cds="cdsinstance",name="Global",status="AL",type="global"} 0
status_hooks{cds="cdsinstance",name="Global",status="OK",type="global"} 1
status_internal_events_queue{cds="cdsinstance",name="api_craandprajen",status="OK",type="api"} 0
status_internal_events_queue{cds="cdsinstance",name="api_muskir",status="OK",type="api"} 0
status_nb_of_panics{cds="cdsinstance",name="api_craandprajen",status="OK",type="api"} 0
status_nb_of_panics{cds="cdsinstance",name="api_muskir",status="OK",type="api"} 0
status_object_store{cds="cdsinstance",name="api_craandprajen",status="OK",type="api"} 1
status_object_store{cds="cdsinstance",name="api_muskir",status="OK",type="api"} 1
status_repositories{cds="cdsinstance",name="Global",status="OK",type="global"} 0
status_sample_service2_sample_service2{cds="cdsinstance",name="sample-service2",status="WARN",type="doc"} 1
status_sample_service_sample_service{cds="cdsinstance",name="sample-service",status="WARN",type="doc"} 1
status_scheduler{cds="cdsinstance",name="api_craandprajen",status="WARN",type="api"} 1
status_scheduler{cds="cdsinstance",name="api_muskir",status="OK",type="api"} 1
status_sessions_store{cds="cdsinstance",name="api_craandprajen",status="OK",type="api"} 1
status_sessions_store{cds="cdsinstance",name="api_muskir",status="OK",type="api"} 1
status_smtp{cds="cdsinstance",name="api_craandprajen",status="AL",type="api"} 1
status_smtp{cds="cdsinstance",name="api_muskir",status="AL",type="api"} 1
status_vcs{cds="cdsinstance",name="Global",status="AL",type="global"} 1
status_worker_model_errors{cds="cdsinstance",name="api_craandprajen",status="OK",type="api"} 0
status_worker_model_errors{cds="cdsinstance",name="api_muskir",status="OK",type="api"} 0

I can remove status_ if needed

@fsamin
Copy link
Member

fsamin commented Nov 19, 2018

About statuses: as far as i understand, the number of "AL" and "OK" are important measures. Why not having (instead of status="AL" tag)

service_alarm{cds="cdsinstance",name="api_muskir", type="api"} 1
service_alarm{cds="cdsinstance",name="api_craandprajen", type="api"} 0
service_alarm{cds="cdsinstance",name="elasticsearch_foo", type="elasticsearchi"} 1
service_alarm{cds="cdsinstance",name="vcs_blabla",type="vcs"} 0

so we can aggregate per type, instance, or whatever the number of alarms

something like

status_nb_of_panics{cds="cdsinstance",name="api_muskir",status="OK",type="api"} 0

should not have status tag and be named nb_of_panics

stuff like

status{cds="cdsinstance",name="Global",status="AL",type="global"} 1
status_api{cds="cdsinstance",name="Global",status="AL",type="global"} 2
...

and all the measures with type="global"sound like aggregate measures and have no sense, because aggregation must be done by the consumer of the metrics, not the producer.

engine/api/status.go Outdated Show resolved Hide resolved
```
status{component="Global/Status",instance="cdsinstance",status="AL"} 1
status{component="Global/Version",instance="cdsinstance",status="OK"} 1
status{component="Global/api",instance="cdsinstance",status="AL"} 3
status{component="Global/dbmigrate",instance="cdsinstance",status="OK"} 0
status{component="Global/doc",instance="cdsinstance",status="WARN"} 2
status{component="Global/elasticsearch",instance="cdsinstance",status="OK"} 0
status{component="Global/hatchery",instance="cdsinstance",status="AL"} 0
status{component="Global/hooks",instance="cdsinstance",status="OK"} 1
status{component="Global/repositories",instance="cdsinstance",status="OK"} 0
status{component="Global/vcs",instance="cdsinstance",status="AL"} 1
status{component="api_0/CDSName",instance="cdsinstance",status="OK"} 1
status{component="api_0/Cache",instance="cdsinstance",status="OK"} 1
status{component="api_0/Database",instance="cdsinstance",status="OK"} 1
status{component="api_0/Event",instance="cdsinstance",status="OK"} 1
status{component="api_0/Hostname",instance="cdsinstance",status="OK"} 1
status{component="api_0/Internal Events Queue",instance="cdsinstance",status="OK"} 0
status{component="api_0/Nb of Panics",instance="cdsinstance",status="OK"} 0
status{component="api_0/Object-Store",instance="cdsinstance",status="OK"} 1
status{component="api_0/SMTP",instance="cdsinstance",status="AL"} 1
status{component="api_0/Scheduler",instance="cdsinstance",status="WARN"} 1
status{component="api_0/Sessions-Store",instance="cdsinstance",status="OK"} 1
status{component="api_0/Time",instance="cdsinstance",status="OK"} 1
status{component="api_0/Uptime",instance="cdsinstance",status="OK"} 1
status{component="api_0/Version",instance="cdsinstance",status="OK"} 1
status{component="api_0/Worker Model Errors",instance="cdsinstance",status="OK"} 0
status{component="api_1/CDSName",instance="cdsinstance",status="OK"} 1
status{component="api_1/Cache",instance="cdsinstance",status="OK"} 1
status{component="api_1/Database",instance="cdsinstance",status="OK"} 1
status{component="api_1/Event",instance="cdsinstance",status="OK"} 1
status{component="api_1/Hostname",instance="cdsinstance",status="OK"} 1
status{component="api_1/Internal Events Queue",instance="cdsinstance",status="OK"} 0
status{component="api_1/Nb of Panics",instance="cdsinstance",status="OK"} 0
status{component="api_1/Object-Store",instance="cdsinstance",status="OK"} 1
status{component="api_1/SMTP",instance="cdsinstance",status="AL"} 1
status{component="api_1/Scheduler",instance="cdsinstance",status="OK"} 1
status{component="api_1/Sessions-Store",instance="cdsinstance",status="OK"} 1
status{component="api_1/Time",instance="cdsinstance",status="OK"} 1
status{component="api_1/Uptime",instance="cdsinstance",status="OK"} 1
status{component="api_1/Version",instance="cdsinstance",status="OK"} 1
status{component="api_1/Worker Model Errors",instance="cdsinstance",status="OK"} 0
status{component="api_2/CDSName",instance="cdsinstance",status="OK"} 1
status{component="api_2/Cache",instance="cdsinstance",status="OK"} 1
status{component="api_2/Database",instance="cdsinstance",status="OK"} 1
status{component="api_2/Event",instance="cdsinstance",status="OK"} 1
status{component="api_2/Hostname",instance="cdsinstance",status="OK"} 1
status{component="api_2/Internal Events Queue",instance="cdsinstance",status="OK"} 0
status{component="api_2/Nb of Panics",instance="cdsinstance",status="OK"} 0
status{component="api_2/Object-Store",instance="cdsinstance",status="OK"} 1
status{component="api_2/SMTP",instance="cdsinstance",status="AL"} 1
status{component="api_2/Scheduler",instance="cdsinstance",status="OK"} 1
status{component="api_2/Sessions-Store",instance="cdsinstance",status="OK"} 1
status{component="api_2/Time",instance="cdsinstance",status="OK"} 1
status{component="api_2/Uptime",instance="cdsinstance",status="OK"} 1
status{component="api_2/Version",instance="cdsinstance",status="OK"} 1
status{component="api_2/Worker Model Errors",instance="cdsinstance",status="OK"} 0
status{component="hooksLocal/Time",instance="cdsinstance",status="OK"} 1
status{component="hooksLocal/Uptime",instance="cdsinstance",status="OK"} 1
status{component="hooksLocal/Version",instance="cdsinstance",status="OK"} 1
status{component="sample-service/sample-service/sample-service",instance="cdsinstance",status="WARN"} 1
status{component="sample-service2/sample-service2/sample-service2",instance="cdsinstance",status="WARN"} 1
status{component="vcsLocal/Github-RateLimit",instance="cdsinstance",status="AL"} 0
status{component="vcsLocal/Github-RateLimitRemaining",instance="cdsinstance",status="OK"} 5000
status{component="vcsLocal/Github-RateLimitReset",instance="cdsinstance",status="AL"} 1
status{component="vcsLocal/Time",instance="cdsinstance",status="OK"} 1
status{component="vcsLocal/Uptime",instance="cdsinstance",status="OK"} 1
status{component="vcsLocal/Version",instance="cdsinstance",status="OK"} 1
```

Signed-off-by: Yvonnick Esnault <[email protected]>
Signed-off-by: Yvonnick Esnault <[email protected]>
Signed-off-by: Yvonnick Esnault <[email protected]>
Signed-off-by: Yvonnick Esnault <[email protected]>
Signed-off-by: Yvonnick Esnault <[email protected]>
Signed-off-by: Yvonnick Esnault <[email protected]>
Signed-off-by: Yvonnick Esnault <[email protected]>
Signed-off-by: Yvonnick Esnault <[email protected]>
Signed-off-by: Yvonnick Esnault <[email protected]>
Signed-off-by: Yvonnick Esnault <[email protected]>
Signed-off-by: Yvonnick Esnault <[email protected]>
Signed-off-by: Yvonnick Esnault <[email protected]>
Signed-off-by: Yvonnick Esnault <[email protected]>
Signed-off-by: Yvonnick Esnault <[email protected]>
Signed-off-by: Yvonnick Esnault <[email protected]>
Signed-off-by: Yvonnick Esnault <[email protected]>
@yesnault yesnault merged commit 9ca2976 into master Dec 2, 2018
@yesnault yesnault deleted the ye-metrics-status branch December 15, 2018 22:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants