I see dead people
Cole is a dead man switch listener. In prometheus it is common to create a dead man switch which will constantly send alerts to test your entire alerting pipline. A question that comes up often is what do you have watching those dead man switch alerts. Who watches the watchers, effectively.
This is a basic implmentation of something that could watch for those deadman switch alerts, and then send alert itself if it does not receive a notification from the deadman switch within the assigned time interval.
this project is in very early stages and should not be used in production yet. This is Still in Work In Progress (WIP) status that does work but there are some planned features that still need to be added and things like configuration are still evolving.
Cole listens for http requests from prometheus alertmanager sending alerts for dream switch alert. When a message is received a timer will be started for the specified duration. If a message is not received from the deadman alert inside of that time duration, it will fire off an alert of it's own.
There is a forthcoming blog post on jpweber.io on how to leverage a deadman switch alert in your prometheus monitoring and how something like Cole fits in which will provide some more detail in to the thinking of creating a tool like this.
- Slack
- PagerDuty
- MsTeams
- Generic Webhook
-
Start the cole server by any of the below defined means (bare binary, docker, etc)
-
For each DeadManSwitch that you want to check in you must generate an ID for that alert. Perform an http
GET
request to/id
of the cole server. For example.curl http://yourcoleaddress/id
. This will return a json payload of the following. This timerid will be part of the url you hit to check in.{ "timerid":"bg8obqel0s1fdr02gtvg" }
-
Create a receiver in your alert manager config to make a call to a webhook when it recieves a DeadManSwitch alert. The wait, group and repeat intervals may need to be changed based on your needs.
global: ... route: ... routes: - match: alertname: DeadMansSwitch receiver: 'cole' group_wait: 0s group_interval: 1m repeat_interval: 50s receivers: - name: 'cole' webhook_configs: - url: 'http://192.168.2.66:8080/ping/bg8obqel0s1fdr02gtvg' send_resolved: false
# Example Cole configuration file
# Slack
# SenderType = "slack"
# Interval = 10
# HTTPEndpoint = "https://hooks.slack.com/services/..."
# HTTPMethod = "POST"
# SlackChannel = "#general"
# SlackUsername = "Cole - DeadManSwitch Monitor"
# SlackIcon = ":monkey_face:"
# PagerDuty
SenderType = "pagerduty"
Interval = 10
PDAPIKey = "noiD8-khbpNpgAAAAAAAAAA"
PDIntegrationKey = "5353fb993888441811111111111"
# Ms Teams
SenderType = "teams"
Interval = 10
HTTPEndpoint = "https://hooks.teams.com/services/..."
SENDER_TYPE
INTERVAL
HTTP_ENDPOINT
HTTP_METHOD
EMAIL_ADDR
PD_KEY
SLACK_CHANNEL
SLACK_USERNAME
SLACK_ICON
docker run -d \
-e SENDER_TYPE="slack" \
-e INTERVAL="10" \
-e HTTP_ENDPOINT="https://hooks.slack.com/services/..." \
-p 8080:8080 \
cole:0.2.0
./cole
POST
-/ping/<timerid>
GET
-/id
GET
-/version
- clone the repo
dep ensure -v
go build
That is it.