diff --git a/Dockerfile b/Dockerfile index eb0cb01..26faeff 100644 --- a/Dockerfile +++ b/Dockerfile @@ -3,7 +3,7 @@ ADD . /src WORKDIR /src RUN go get -t github.com/stretchr/testify/suite RUN go get -d -v -t -RUN go test --cover ./... --run UnitTest +RUN go test --cover ./... --run UnitTest -p 1 RUN CGO_ENABLED=0 GOOS=linux go build -v -o docker-flow-monitor diff --git a/docs/config.md b/docs/config.md index 3e083c4..fb37eb9 100644 --- a/docs/config.md +++ b/docs/config.md @@ -103,8 +103,13 @@ curl `[IP_OF_ONE_OF_SWARM_NODES]:8080/v1/docker-flow-monitor/reconfigure?scrapeP Please consult [Prometheus Configuration](https://prometheus.io/docs/operating/configuration/) for more information about the available options. -## Scrapes +## Scrape Secret Configuration Additional scrapes can be added through files prefixed with `scrape_`. By default, all such files located in `/run/secrets` are automatically added to the `scrape_configs` section of the configuration. The directory can be changed by setting a different value to the environment variable `CONFIGS_DIR`. The simplest way to add scrape configs is to use Docker [secrets](https://docs.docker.com/engine/swarm/secrets/) or [configs](https://docs.docker.com/engine/swarm/configs/). + + +## Scrape Label Configuration + +When using a version of [Docker Flow Swarm Listener](https://github.com/vfarcic/docker-flow-swarm-listener), DFSL, newer than `18.02.06-31`, you can configure DFSL to send node node hostnames to `Docker Flow Monitor`, DFM. This can be done by setting `DF_INCLUDE_NODE_IP_INFO` to `true` in the DFSL environment. DFM will automatically display the node hostnames as a label for each prometheus target. The `DF_SCRAPE_TARGET_LABELS` env variable allows for additional labels to be displayed. For example, if a service has env variables `com.df.env=prod` and `com.df.domain=frontend`, you can set `DF_SCRAPE_TARGET_LABELS=env,domain` in DFM to display the `prod` and `frontend` labels in prometheus. diff --git a/docs/img/flexiable-labeling-targets-page.png b/docs/img/flexiable-labeling-targets-page.png new file mode 100644 index 0000000..742f464 Binary files /dev/null and b/docs/img/flexiable-labeling-targets-page.png differ diff --git a/docs/tutorial-flexible-labeling.md b/docs/tutorial-flexible-labeling.md new file mode 100644 index 0000000..ef1a035 --- /dev/null +++ b/docs/tutorial-flexible-labeling.md @@ -0,0 +1,110 @@ +# Flexible Labeling with Docker Flow Monitor + +*Docker Flow Monitor* and *Docker Flow Swarm Listener* can be configured to allow for more flexible labeling of exporters. Please read the [Running Docker Flow Monitor](tutorial.md) tutorial before reading this one. This tutorial focuses on configuring the stacks to allow for flexible labeling. + +## Setting Up A Cluster + +!!! info + Feel free to skip this section if you already have a Swarm cluster that can be used for this tutorial + +We'll create a Swarm cluster consisting of three nodes created with Docker Machine. + +```bash +git clone https://github.com/vfarcic/docker-flow-monitor.git + +cd docker-flow-monitor + +./scripts/dm-swarm.sh + +eval $(docker-machine env swarm-1) +``` + +## Deploying Docker Flow Monitor + +We will deploy [stacks/docker-flow-monitor-flexible-labels.yml](https://github.com/vfarcic/docker-flow-monitor/blob/master/stacks/docker-flow-monitor-flexible-labels.yml) stack that contains three services: `monitor`, `alert-manager` and `swarm-listener`. The `swarm-listener` service includes an additional environment variable: `DF_INCLUDE_NODE_IP_INFO=true`. This configures `swarm-listener` to send node and ip information to `mointor`. + +The `monitor` service includes the environment variable: `DF_SCRAPE_TARGET_LABELS=env,metricType`. This sets up flexible labeling for exporters. If an exporter defines a deploy label `com.df.env` or `com.df.metricType`, that label will be used by `monitor`. + +Let's deploy the `monitor` stack: + +```bash +docker network create -d overlay monitor + +docker stack deploy \ + -c stacks/docker-flow-monitor-flexible-labels.yml \ + monitor +``` + +## Collecting Metrics and Defining Alerts + +We will deploy exporters stack defined in [stacks/exporters-tutorial-flexible-labels.yml](https://github.com/vfarcic/docker-flow-monitor/blob/master/stacks/exporters-tutorial-flexible-labels.yml), two containing two services: `cadvisor` and `node-exporter`. + +The definition of the `cadvisor` service contains additional deploy labels: + +```yaml + cadvisor: + image: google/cadvisor + networks: + - monitor + ... + deploy: + mode: global + labels: + ... + - com.df.scrapeNetwork=monitor + - com.df.env=prod + - com.df.metricType=system +``` + +The `com.df.scrapeNetwork` deploy label tells `swarm-listener` to use `cadvisor`'s IP on the `monitor` network. This is important because the `monitor` service is using the `monitor` network to scrape `cadvisor`. The `com.df.env=prod` and `com.df.metricType=system` deploy labels configures flexible labeling for `cadvisor`. + +The second service, `node-exporter` is also configured with flexiable labels: + +```yaml + node-exporter: + image: basi/node-exporter + networks: + - monitor + ... + deploy: + mode: global + labels: + ... + - com.df.scrapeNetwork=monitor + - com.df.env=dev + - com.df.metricType=system +``` + +Let's deploy the `exporter` stack + +```bash +docker stack deploy \ + -c stacks/exporters-tutorial-flexible-labels.yml \ + exporter +``` + +Please wait until the service in the stack are up-and-running. You can check their status by executing `docker stack ps exporter`. + +Now we can open the *Prometheus* targets page from a browser. + +> If you're a Windows user, Git Bash might not be able to use the `open` command. If that's the case, replace the `open` command with `echo`. As a result, you'll get the full address that should be opened directly in your browser of choice. + +```bash +open "http://$(docker-machine ip swarm-1):9090/targets" +``` + +You should see a targets page similar to the following: + +![Flexiable Labeling Targets Page](img/flexiable-labeling-targets-page.png) + +Each service is labeled with its associated `com.df.env` or `com.df.metricType` deploy label. In addition, the `node` label is the hostname the service is running on. + +## What Now? + +*Docker Flow Monitors*'s flexible labeling feature provides more information about your services. Please consult the documentation for any additional information you might need. Feel free to open [an issue](https://github.com/vfarcic/docker-flow-monitor/issues) if you require additional info, if you find a bug, or if you have a feature request. + +Before you go, please remove the cluster we created and free those resources for something else. + +```bash +docker-machine rm -f swarm-1 swarm-2 swarm-3 +``` diff --git a/docs/usage.md b/docs/usage.md index f9d8e25..708baad 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -64,6 +64,12 @@ Please visit [Alerting Overview](https://prometheus.io/docs/alerting/overview/) !!! note I hope that the number of shortcuts will grow with time thanks to community contributions. Please create [an issue](https://github.com/vfarcic/docker-flow-monitor/issues) with the `alertIf` statement and the suggested shortcut and I'll add it to the code as soon as possible. +### AlertIf Logical Operators + +The logical operators `and`, `unless`, and `or` can be used in combinations with AlertIf Parameter Shortcuts. For example, to create an alert that triggers when response time is low unless response time is high, set `alertIf=@resp_time_below:0.025,5m,0.75_unless_@resp_time_above:0.1,5m,0.99`. This alert prevents `@resp_time_below` from triggering while `@resp_time_above` is triggering. The `summary` annotation for this alert will be merged with the `and` operator: "Response time of the service my-service is below 0.025 unless Response time of the service my-service is above 0.1". When using logical operators, there are no default alert labels. The alert labels will have to be manually set by using the `alertLabels` query parameter. + + More information on the logical operators can be found on Prometheus's querying [documentation](https://prometheus.io/docs/prometheus/latest/querying/operators/#logical-set-binary-operators). + ## Remove !!! tip diff --git a/mkdocs.yml b/mkdocs.yml index c3aeb53..a0fccc8 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -4,6 +4,7 @@ pages: - Tutorial: - Running Docker Flow Monitor: tutorial.md - Auto-Scaling Services Using Instrumented Metrics: auto-scaling.md + - Flexible Labeling with Docker Flow Monitor: tutorial-flexible-labeling.md - Configuration: config.md - Usage: usage.md - Migration Guide: migration.md diff --git a/prometheus/config.go b/prometheus/config.go index 20eedf1..6f9d077 100644 --- a/prometheus/config.go +++ b/prometheus/config.go @@ -2,6 +2,7 @@ package prometheus import ( "bytes" + "encoding/json" "fmt" "net/url" "os" @@ -18,14 +19,17 @@ import ( // WriteConfig creates Prometheus configuration at configPath and writes alerts into /etc/prometheus/alert.rules func WriteConfig(configPath string, scrapes map[string]Scrape, alerts map[string]Alert) { c := &Config{} + fileSDDir := "/etc/prometheus/file_sd" + alertRulesPath := "/etc/prometheus/alert.rules" configDir := filepath.Dir(configPath) FS.MkdirAll(configDir, 0755) + FS.MkdirAll(fileSDDir, 0755) c.InsertScrapes(scrapes) if len(alerts) > 0 { logPrintf("Writing to alert.rules") - afero.WriteFile(FS, "/etc/prometheus/alert.rules", []byte(GetAlertConfig(alerts)), 0644) + afero.WriteFile(FS, alertRulesPath, []byte(GetAlertConfig(alerts)), 0644) c.RuleFiles = []string{"alert.rules"} } @@ -35,6 +39,7 @@ func WriteConfig(configPath string, scrapes map[string]Scrape, alerts map[string logPrintf("Unable to insert alertmanager url %s into prometheus config", alertmanagerURL) } } + c.CreateFileStaticConfig(scrapes, fileSDDir) for _, e := range os.Environ() { envSplit := strings.SplitN(e, "=", 2) @@ -98,6 +103,9 @@ func (c *Config) InsertScrapes(scrapes map[string]Scrape) { if len(metricsPath) == 0 { metricsPath = "/metrics" } + if s.NodeInfo != nil && len(*s.NodeInfo) > 0 { + continue + } if s.ScrapeType == "static_configs" { newScrape = &ScrapeConfig{ ServiceDiscoveryConfig: ServiceDiscoveryConfig{ @@ -152,6 +160,63 @@ func (c *Config) InsertScrapesFromDir(dir string) { } +// CreateFileStaticConfig creates static config files +func (c *Config) CreateFileStaticConfig(scrapes map[string]Scrape, fileSDDir string) { + + staticFiles := map[string]struct{}{} + for _, s := range scrapes { + fsc := FileStaticConfig{} + if s.NodeInfo == nil { + continue + } + for n := range *s.NodeInfo { + tg := TargetGroup{} + tg.Targets = []string{fmt.Sprintf("%s:%d", n.Addr, s.ScrapePort)} + tg.Labels = map[string]string{} + if s.ScrapeLabels != nil { + for k, v := range *s.ScrapeLabels { + tg.Labels[k] = v + } + } + tg.Labels["node"] = n.Name + tg.Labels["service"] = s.ServiceName + fsc = append(fsc, &tg) + } + + if len(fsc) == 0 { + continue + } + + fscBytes, err := json.Marshal(fsc) + if err != nil { + continue + } + filePath := fmt.Sprintf("%s/%s.json", fileSDDir, s.ServiceName) + afero.WriteFile(FS, filePath, fscBytes, 0644) + newScrape := &ScrapeConfig{ + ServiceDiscoveryConfig: ServiceDiscoveryConfig{ + FileSDConfigs: []*SDConfig{{ + Files: []string{filePath}, + }}, + }, + JobName: s.ServiceName, + } + c.ScrapeConfigs = append(c.ScrapeConfigs, newScrape) + staticFiles[filePath] = struct{}{} + } + + // Remove scrapes that are not in fileStaticServices + currentStaticFiles, err := afero.Glob(FS, fmt.Sprintf("%s/*.json", fileSDDir)) + if err != nil { + return + } + for _, file := range currentStaticFiles { + if _, ok := staticFiles[file]; !ok { + FS.Remove(file) + } + } +} + func normalizeScrapeFile(content []byte) []byte { spaceCnt := 0 for i, c := range content { diff --git a/prometheus/config_test.go b/prometheus/config_test.go index 8aa5eb3..cb84da7 100644 --- a/prometheus/config_test.go +++ b/prometheus/config_test.go @@ -1,6 +1,7 @@ package prometheus import ( + "encoding/json" "fmt" "os" "testing" @@ -552,6 +553,152 @@ func (s *ConfigTestSuite) Test_Writeconfig_WritesConfig() { s.Contains(actualConfig.ScrapeConfigs, c.ScrapeConfigs[1]) } +func (s *ConfigTestSuite) Test_Writeconfig_WithNodeInfo_WritesConfig() { + fsOrg := FS + defer func() { + FS = fsOrg + }() + FS = afero.NewMemMapFs() + + nodeInfo1 := NodeIPSet{} + nodeInfo1.Add("node-1", "1.0.1.1") + nodeInfo1.Add("node-2", "1.0.1.2") + serviceLabels1 := map[string]string{ + "env": "prod", + "domain": "frontend", + } + + nodeInfo2 := NodeIPSet{} + nodeInfo2.Add("node-1", "1.0.2.1") + nodeInfo2.Add("node-1", "1.0.2.2") + serviceLabels2 := map[string]string{ + "env": "dev", + "domain": "backend", + } + + scrapes := map[string]Scrape{ + "service-1": { + ServiceName: "service-1", + ScrapePort: 1234, + ScrapeLabels: &serviceLabels1, + NodeInfo: &nodeInfo1, + }, + "service-2": { + ServiceName: "service-2", + ScrapePort: 5678, + ScrapeLabels: &serviceLabels2, + NodeInfo: &nodeInfo2, + }, + "service-3": { + ServiceName: "service-3", + ScrapePort: 5432, + }, + } + alerts := map[string]Alert{} + + WriteConfig("/etc/prometheus/prometheus.yml", scrapes, alerts) + actual, err := afero.ReadFile(FS, "/etc/prometheus/prometheus.yml") + s.Require().NoError(err) + + actualConfig := Config{} + err = yaml.Unmarshal(actual, &actualConfig) + s.Require().NoError(err) + + s.Require().Len(actualConfig.ScrapeConfigs, 3) + + var service1ScrapeConfig *ScrapeConfig + var service2ScrapeConfig *ScrapeConfig + var service3ScrapeConfig *ScrapeConfig + + for _, sc := range actualConfig.ScrapeConfigs { + if sc.JobName == "service-1" { + service1ScrapeConfig = sc + } else if sc.JobName == "service-2" { + service2ScrapeConfig = sc + } else if sc.JobName == "service-3" { + service3ScrapeConfig = sc + } + } + s.Require().NotNil(service1ScrapeConfig) + s.Require().NotNil(service2ScrapeConfig) + s.Require().NotNil(service3ScrapeConfig) + + s.Require().Len(service1ScrapeConfig.ServiceDiscoveryConfig.FileSDConfigs, 1) + s.Require().Len(service2ScrapeConfig.ServiceDiscoveryConfig.FileSDConfigs, 1) + s.Require().Len(service3ScrapeConfig.ServiceDiscoveryConfig.DNSSDConfigs, 1) + + service1FileScrape := service1ScrapeConfig.ServiceDiscoveryConfig.FileSDConfigs[0] + service2FileScrape := service2ScrapeConfig.ServiceDiscoveryConfig.FileSDConfigs[0] + service3DNSScrape := service3ScrapeConfig.ServiceDiscoveryConfig.DNSSDConfigs[0] + + s.Equal("/etc/prometheus/file_sd/service-1.json", service1FileScrape.Files[0]) + s.Equal("/etc/prometheus/file_sd/service-2.json", service2FileScrape.Files[0]) + + s.Require().Len(service3DNSScrape.Names, 1) + s.Equal("tasks.service-3", service3DNSScrape.Names[0]) + s.Equal(5432, service3DNSScrape.Port) + s.Equal("A", service3DNSScrape.Type) + + actualSDService1Bytes, err := afero.ReadFile(FS, "/etc/prometheus/file_sd/service-1.json") + s.Require().NoError(err) + fsc1 := FileStaticConfig{} + err = json.Unmarshal(actualSDService1Bytes, &fsc1) + s.Require().NoError(err) + + actualSDService2Bytes, err := afero.ReadFile(FS, "/etc/prometheus/file_sd/service-2.json") + s.Require().NoError(err) + fsc2 := FileStaticConfig{} + err = json.Unmarshal(actualSDService2Bytes, &fsc2) + s.Require().NoError(err) + + var tgService1Node1 *TargetGroup + var tgService1Node2 *TargetGroup + var tgService2Node1 *TargetGroup + var tgService2Node2 *TargetGroup + + for _, tg := range fsc1 { + for _, target := range tg.Targets { + if target == "1.0.1.1:1234" { + tgService1Node1 = tg + break + } else if target == "1.0.1.2:1234" { + tgService1Node2 = tg + break + } + } + } + for _, tg := range fsc2 { + for _, target := range tg.Targets { + if target == "1.0.2.1:5678" { + tgService2Node1 = tg + break + } else if target == "1.0.2.2:5678" { + tgService2Node2 = tg + } + } + } + s.Require().NotNil(tgService1Node1) + s.Require().NotNil(tgService1Node2) + s.Require().NotNil(tgService2Node1) + s.Require().NotNil(tgService2Node2) + + s.Equal("prod", tgService1Node1.Labels["env"]) + s.Equal("frontend", tgService1Node1.Labels["domain"]) + s.Equal("service-1", tgService1Node1.Labels["service"]) + + s.Equal("prod", tgService1Node2.Labels["env"]) + s.Equal("frontend", tgService1Node2.Labels["domain"]) + s.Equal("service-1", tgService1Node2.Labels["service"]) + + s.Equal("dev", tgService2Node1.Labels["env"]) + s.Equal("backend", tgService2Node1.Labels["domain"]) + s.Equal("service-2", tgService2Node1.Labels["service"]) + + s.Equal("dev", tgService2Node2.Labels["env"]) + s.Equal("backend", tgService2Node2.Labels["domain"]) + s.Equal("service-2", tgService2Node2.Labels["service"]) +} + func (s *ConfigTestSuite) Test_WriteConfig_WriteAlerts() { fsOrig := FS defer func() { FS = fsOrig }() diff --git a/prometheus/types.go b/prometheus/types.go index 26a928a..1b39476 100644 --- a/prometheus/types.go +++ b/prometheus/types.go @@ -1,5 +1,7 @@ package prometheus +import "encoding/json" + // ScrapeConfig configures a scraping unit for Prometheus. type ScrapeConfig struct { // The job name to which the job label is set by default. @@ -78,12 +80,12 @@ type RemoteWriteConfig struct { type TargetGroup struct { // Targets is a list of targets identified by a label set. Each target is // uniquely identifiable in the group by its address label. - Targets []string `yaml:"targets,omitempty"` + Targets []string `yaml:"targets,omitempty" json:"targets,omitempty"` // Labels is a set of labels that is common across all targets in the group. - Labels map[string]string `yaml:"labels,omitempty"` + Labels map[string]string `yaml:"labels,omitempty" json:"labels,omitempty"` // Source is an identifier that describes a group of targets. - Source string `yaml:"source,omitempty"` + Source string `yaml:"source,omitempty" json:"source,omitempty"` } // DNSSDConfig is the configuration for DNS based service discovery. @@ -94,12 +96,23 @@ type DNSSDConfig struct { Port int `yaml:"port"` // Ignored for SRV records } +// SDConfig is the configuration for file based discovery. +type SDConfig struct { + Files []string `yaml:"files"` + RefreshInterval string `yaml:"refresh_interval,omitempty"` +} + +// FileStaticConfig configures File-based service discovery +type FileStaticConfig []*TargetGroup + // ServiceDiscoveryConfig configures lists of different service discovery mechanisms. type ServiceDiscoveryConfig struct { // List of labeled target groups for this job. StaticConfigs []*TargetGroup `yaml:"static_configs,omitempty"` // List of DNS service discovery configurations. DNSSDConfigs []*DNSSDConfig `yaml:"dns_sd_configs,omitempty"` + // List of file service discovery configurations. + FileSDConfigs []*SDConfig `yaml:"file_sd_configs,omitempty"` } // BasicAuth contains basic HTTP authentication credentials. @@ -214,12 +227,74 @@ type Alert struct { Replicas int `json:"replicas"` } +// NodeIP defines a node/addr pair +type NodeIP struct { + Name string `json:"name"` + Addr string `json:"addr"` +} + +// NodeIPSet is a set of NodeIPs +type NodeIPSet map[NodeIP]struct{} + +// Add node to set +func (ns *NodeIPSet) Add(name, addr string) { + (*ns)[NodeIP{Name: name, Addr: addr}] = struct{}{} +} + +// Equal returns true when NodeIPSets contain the same elements +func (ns NodeIPSet) Equal(other NodeIPSet) bool { + + if ns.Cardinality() != other.Cardinality() { + return false + } + + for ip := range ns { + if _, ok := other[ip]; !ok { + return false + } + } + return true +} + +// Cardinality returns the size of set +func (ns NodeIPSet) Cardinality() int { + return len(ns) +} + +// MarshalJSON creates JSON array from NodeIPSet +func (ns NodeIPSet) MarshalJSON() ([]byte, error) { + items := make([][2]string, 0, ns.Cardinality()) + + for elem := range ns { + items = append(items, [2]string{elem.Name, elem.Addr}) + } + return json.Marshal(items) +} + +// UnmarshalJSON recreates NodeIPSet from a JSON array +func (ns *NodeIPSet) UnmarshalJSON(b []byte) error { + + items := [][2]string{} + err := json.Unmarshal(b, &items) + if err != nil { + return err + } + + for _, item := range items { + (*ns)[NodeIP{Name: item[0], Addr: item[1]}] = struct{}{} + } + + return nil +} + // Scrape defines data used to create scraping configuration snippet type Scrape struct { - MetricsPath string `json:"metricsPath,string,omitempty"` - ScrapeInterval string `json:"scrapeInterval,string,omitempty"` - ScrapeTimeout string `json:"scrapeTimeout,string,omitempty"` - ScrapePort int `json:"scrapePort,string,omitempty"` - ServiceName string `json:"serviceName"` - ScrapeType string `json:"scrapeType"` + MetricsPath string `json:"metricsPath,string,omitempty"` + ScrapeInterval string `json:"scrapeInterval,string,omitempty"` + ScrapeLabels *map[string]string `json:"scrapeLabels,omitempty"` + ScrapePort int `json:"scrapePort,string,omitempty"` + ScrapeTimeout string `json:"scrapeTimeout,string,omitempty"` + ScrapeType string `json:"scrapeType"` + ServiceName string `json:"serviceName"` + NodeInfo *NodeIPSet `json:"nodeInfo,omitempty"` } diff --git a/server/server.go b/server/server.go index 4cd39e7..5bbbdb2 100644 --- a/server/server.go +++ b/server/server.go @@ -325,40 +325,136 @@ var alertIfShortcutData = map[string]alertIfShortcut{ func (s *serve) formatAlert(alert *prometheus.Alert) { alert.AlertNameFormatted = s.getNameFormatted(fmt.Sprintf("%s_%s", alert.ServiceName, alert.AlertName)) - if strings.HasPrefix(alert.AlertIf, "@") { + if !strings.HasPrefix(alert.AlertIf, "@") { + return + } + + _, bOp, _ := splitCompoundOp(alert.AlertIf) + if len(bOp) > 0 { + formatCompoundAlert(alert) + } else { + formatSingleAlert(alert) + } + +} + +func formatSingleAlert(alert *prometheus.Alert) { + + value := "" + alertSplit := strings.Split(alert.AlertIf, ":") + shortcut := alertSplit[0] + + if len(alertSplit) > 1 { + value = alertSplit[1] + } + + data, ok := alertIfShortcutData[shortcut] + if !ok { + return + } + + alert.AlertIf = replaceTags(data.expanded, alert, value) + + if alert.AlertAnnotations == nil { + alert.AlertAnnotations = map[string]string{} + } + for k, v := range data.annotations { + if _, ok := alert.AlertAnnotations[k]; !ok { + alert.AlertAnnotations[k] = replaceTags(v, alert, value) + } + } + + if alert.AlertLabels == nil { + alert.AlertLabels = map[string]string{} + } + for k, v := range data.labels { + if _, ok := alert.AlertLabels[k]; !ok { + alert.AlertLabels[k] = replaceTags(v, alert, value) + } + } +} + +func formatCompoundAlert(alert *prometheus.Alert) { + alertIfStr := alert.AlertIf + alertAnnotations := map[string]string{} + immutableAnnotations := map[string]struct{}{} + + // copy alert annotations and alert labels + if alert.AlertAnnotations != nil { + for k := range alert.AlertAnnotations { + immutableAnnotations[k] = struct{}{} + } + } + + var alertIfFormattedBuffer bytes.Buffer + + currentAlert, bOp, alertIfStr := splitCompoundOp(alertIfStr) + + for len(currentAlert) > 0 { value := "" - alertSplit := strings.Split(alert.AlertIf, ":") + alertSplit := strings.Split(currentAlert, ":") shortcut := alertSplit[0] if len(alertSplit) > 1 { value = alertSplit[1] } - data, ok := alertIfShortcutData[shortcut] if !ok { return } - alert.AlertIf = replaceTags(data.expanded, alert, value) - - if alert.AlertAnnotations == nil { - alert.AlertAnnotations = map[string]string{} + alertIfFormattedBuffer.WriteString(replaceTags(data.expanded, alert, value)) + if len(bOp) > 0 { + alertIfFormattedBuffer.WriteString(fmt.Sprintf(" %s ", bOp)) } + for k, v := range data.annotations { - if _, ok := alert.AlertAnnotations[k]; !ok { - alert.AlertAnnotations[k] = replaceTags(v, alert, value) + if _, ok := immutableAnnotations[k]; ok { + continue + } + alertAnnotations[k] += replaceTags(v, alert, value) + if len(bOp) > 0 { + alertAnnotations[k] += fmt.Sprintf(" %s ", bOp) } } + currentAlert, bOp, alertIfStr = splitCompoundOp(alertIfStr) + } + + alert.AlertIf = alertIfFormattedBuffer.String() - if alert.AlertLabels == nil { - alert.AlertLabels = map[string]string{} + if alert.AlertAnnotations == nil { + alert.AlertAnnotations = map[string]string{} + } + + for k, v := range alertAnnotations { + if _, ok := immutableAnnotations[k]; ok { + continue } - for k, v := range data.labels { - if _, ok := alert.AlertLabels[k]; !ok { - alert.AlertLabels[k] = replaceTags(v, alert, value) - } + alert.AlertAnnotations[k] = v + } + +} + +// splitCompoundOp find splits string into three pieces if it includes _unless_, +// _and_, or _or_. For example, hello_and_world_or_earth will return [hello, and, world_or_earth] +func splitCompoundOp(s string) (string, string, string) { + binaryOps := []string{"unless", "and", "or"} + + minIdx := len(s) + minOp := "" + for _, bOp := range binaryOps { + idx := strings.Index(s, fmt.Sprintf("_%s_", bOp)) + if idx != -1 && idx < minIdx { + minIdx = idx + minOp = bOp } } + + if len(minOp) > 0 { + return s[:minIdx], minOp, s[minIdx+len(minOp)+2:] + } + return s, "", "" + } func replaceTags(tag string, alert *prometheus.Alert, value string) string { @@ -397,10 +493,32 @@ func (s *serve) getNameFormatted(name string) string { func (s *serve) getScrape(req *http.Request) prometheus.Scrape { scrape := prometheus.Scrape{} decoder.Decode(&scrape, req.Form) - if s.isValidScrape(&scrape) { - s.scrapes[scrape.ServiceName] = scrape - logPrintf("Adding scrape %s\n%v", scrape.ServiceName, scrape) + if !s.isValidScrape(&scrape) { + return scrape } + + if nodeInfoStr := req.Form.Get("nodeInfo"); len(nodeInfoStr) > 0 { + nodeInfo := prometheus.NodeIPSet{} + json.Unmarshal([]byte(nodeInfoStr), &nodeInfo) + scrape.NodeInfo = &nodeInfo + } + + if scrape.NodeInfo != nil && len(*scrape.NodeInfo) > 0 { + scrape.ScrapeLabels = &map[string]string{} + if targetLabels := os.Getenv("DF_SCRAPE_TARGET_LABELS"); len(targetLabels) > 0 { + labels := strings.Split(targetLabels, ",") + for _, label := range labels { + value := req.Form.Get(label) + if len(value) > 0 { + (*scrape.ScrapeLabels)[label] = value + } + } + } + } + + s.scrapes[scrape.ServiceName] = scrape + logPrintf("Adding scrape %s\n%v", scrape.ServiceName, scrape) + return scrape } diff --git a/server/server_test.go b/server/server_test.go index 4613b77..1ada921 100644 --- a/server/server_test.go +++ b/server/server_test.go @@ -7,12 +7,14 @@ import ( "net/http/httptest" "net/url" "os" + "strings" "testing" "time" "../prometheus" "github.com/spf13/afero" "github.com/stretchr/testify/suite" + yaml "gopkg.in/yaml.v2" ) type ServerTestSuite struct { @@ -309,6 +311,119 @@ func (s *ServerTestSuite) Test_ReconfigureHandler_ExpandsShortcuts() { } } +func (s *ServerTestSuite) Test_ReconfigureHandler_ExpandsShortcuts_CompoundOps() { + testData := []struct { + expected string + shortcut string + annotations map[string]string + labels map[string]string + }{ + { + `sum(rate(http_server_resp_time_bucket{job="my-service", le="0.025"}[5m])) / sum(rate(http_server_resp_time_count{job="my-service"}[5m])) > 0.75 unless sum(rate(http_server_resp_time_bucket{job="my-service", le="0.1"}[5m])) / sum(rate(http_server_resp_time_count{job="my-service"}[5m])) < 0.99`, + `@resp_time_below:0.025,5m,0.75_unless_@resp_time_above:0.1,5m,0.99`, + map[string]string{"summary": "Response time of the service my-service is below 0.025 unless Response time of the service my-service is above 0.1"}, + map[string]string{}, + }, + { + `sum(rate(http_server_resp_time_bucket{job="my-service", le="0.025"}[5m])) / sum(rate(http_server_resp_time_count{job="my-service"}[5m])) > 0.75 unless sum(rate(http_server_resp_time_bucket{job="my-service", le="0.1"}[5m])) / sum(rate(http_server_resp_time_count{job="my-service"}[5m])) < 0.99`, + `@resp_time_below:0.025,5m,0.75_unless_@resp_time_above:0.1,5m,0.99`, + map[string]string{"summary": "Response time of the service my-service is below 0.025 unless Response time of the service my-service is above 0.1"}, + map[string]string{"receiver": "system", "service": "my-service", "type": "service"}, + }, + { + `sum(rate(http_server_resp_time_bucket{job="my-service", le="0.1"}[5m])) / sum(rate(http_server_resp_time_count{job="my-service"}[5m])) < 0.99 and container_memory_usage_bytes{container_label_com_docker_swarm_service_name="my-service"}/container_spec_memory_limit_bytes{container_label_com_docker_swarm_service_name="my-service"} > 0.8`, + `@resp_time_above:0.1,5m,0.99_and_@service_mem_limit:0.8`, + map[string]string{"summary": "Response time of the service my-service is above 0.1 and Memory of the service my-service is over 0.8"}, + map[string]string{"receiver": "system", "service": "my-service"}, + }, + { + `sum(rate(http_server_resp_time_bucket{job="my-service", le="0.1"}[5m])) / sum(rate(http_server_resp_time_count{job="my-service"}[5m])) < 0.99 or container_memory_usage_bytes{container_label_com_docker_swarm_service_name="my-service"}/container_spec_memory_limit_bytes{container_label_com_docker_swarm_service_name="my-service"} > 0.8`, + `@resp_time_above:0.1,5m,0.99_or_@service_mem_limit:0.8`, + map[string]string{"summary": "Response time of the service my-service is above 0.1 or Memory of the service my-service is over 0.8"}, + map[string]string{"receiver": "system"}, + }, + { + `container_memory_usage_bytes{container_label_com_docker_swarm_service_name="my-service"}/container_spec_memory_limit_bytes{container_label_com_docker_swarm_service_name="my-service"} > 0.8 and sum(rate(http_server_resp_time_bucket{job="my-service", le="0.025"}[5m])) / sum(rate(http_server_resp_time_count{job="my-service"}[5m])) > 0.75 unless sum(rate(http_server_resp_time_bucket{job="my-service", le="0.1"}[5m])) / sum(rate(http_server_resp_time_count{job="my-service"}[5m])) < 0.99`, + `@service_mem_limit:0.8_and_@resp_time_below:0.025,5m,0.75_unless_@resp_time_above:0.1,5m,0.99`, + map[string]string{"summary": "Memory of the service my-service is over 0.8 and Response time of the service my-service is below 0.025 unless Response time of the service my-service is above 0.1"}, + map[string]string{"receiver": "system"}, + }, + } + + for _, data := range testData { + expected := prometheus.Alert{ + AlertAnnotations: data.annotations, + AlertFor: "my-for", + AlertIf: data.expected, + AlertLabels: data.labels, + AlertName: "my-alert", + AlertNameFormatted: "myservice_myalert", + ServiceName: "my-service", + Replicas: 3, + } + rwMock := ResponseWriterMock{} + alertQueries := []string{} + for k, v := range data.labels { + alertQueries = append(alertQueries, fmt.Sprintf("%s=%s", k, v)) + } + alertQueryStr := strings.Join(alertQueries, ",") + addr := fmt.Sprintf( + "/v1/docker-flow-monitor?serviceName=%s&alertName=%s&alertIf=%s&alertFor=%s&replicas=3", + expected.ServiceName, + expected.AlertName, + data.shortcut, + expected.AlertFor, + ) + if len(alertQueries) > 0 { + addr += fmt.Sprintf("&alertLabels=%s", alertQueryStr) + } + req, _ := http.NewRequest("GET", addr, nil) + + serve := New() + serve.ReconfigureHandler(rwMock, req) + + s.Equal(expected, serve.alerts[expected.AlertNameFormatted]) + } +} + +func (s *ServerTestSuite) Test_ReconfigureHandler_DoesNotExpandAnnotations_WhenTheyAreAlreadySet_CompoundOps() { + testData := struct { + expected string + shortcut string + annotations map[string]string + labels map[string]string + }{ + `sum(rate(http_server_resp_time_bucket{job="my-service", le="0.025"}[5m])) / sum(rate(http_server_resp_time_count{job="my-service"}[5m])) > 0.75 unless sum(rate(http_server_resp_time_bucket{job="my-service", le="0.1"}[5m])) / sum(rate(http_server_resp_time_count{job="my-service"}[5m])) < 0.99`, + `@resp_time_below:0.025,5m,0.75_unless_@resp_time_above:0.1,5m,0.99`, + map[string]string{"summary": "not-again"}, + map[string]string{"receiver": "system", "service": "ugly-service"}, + } + expected := prometheus.Alert{ + AlertAnnotations: testData.annotations, + AlertFor: "my-for", + AlertIf: testData.expected, + AlertLabels: testData.labels, + AlertName: "my-alert", + AlertNameFormatted: "myservice_myalert", + ServiceName: "my-service", + Replicas: 3, + } + rwMock := ResponseWriterMock{} + addr := fmt.Sprintf( + "/v1/docker-flow-monitor?serviceName=%s&alertName=%s&alertIf=%s&alertFor=%s&replicas=3&alertAnnotations=summary=not-again&alertLabels=service=ugly-service,receiver=system", + expected.ServiceName, + expected.AlertName, + testData.shortcut, + expected.AlertFor, + ) + req, _ := http.NewRequest("GET", addr, nil) + + serve := New() + serve.ReconfigureHandler(rwMock, req) + + s.Equal(expected, serve.alerts[expected.AlertNameFormatted]) +} + func (s *ServerTestSuite) Test_ReconfigureHandler_DoesNotExpandAnnotationsAndLabels_WhenTheyAreAlreadySet() { testData := struct { expected string @@ -464,6 +579,94 @@ func (s *ServerTestSuite) Test_ReconfigureHandler_AddsScrapeType() { s.Equal(expected, serve.scrapes[expected.ServiceName]) } +func (s *ServerTestSuite) Test_ReconfigureHandler_WithNodeInfo() { + defer func() { + os.Unsetenv("DF_SCRAPE_TARGET_LABELS") + }() + os.Setenv("DF_SCRAPE_TARGET_LABELS", "env,domain") + nodeInfo := prometheus.NodeIPSet{} + nodeInfo.Add("node-1", "1.0.1.1") + nodeInfo.Add("node-2", "1.0.1.2") + expected := prometheus.Scrape{ + ServiceName: "my-service", + ScrapePort: 1234, + ScrapeLabels: &map[string]string{ + "env": "prod", + "domain": "frontend", + }, + NodeInfo: &nodeInfo, + } + + nodeInfoBytes, err := json.Marshal(nodeInfo) + s.Require().NoError(err) + + rwMock := ResponseWriterMock{} + addr, err := url.Parse("/v1/docker-flow-monitor") + s.Require().NoError(err) + + q := addr.Query() + q.Add("serviceName", expected.ServiceName) + q.Add("scrapePort", fmt.Sprintf("%d", expected.ScrapePort)) + q.Add("env", (*expected.ScrapeLabels)["env"]) + q.Add("domain", (*expected.ScrapeLabels)["domain"]) + q.Add("extra", "system") + q.Add("nodeInfo", string(nodeInfoBytes)) + addr.RawQuery = q.Encode() + + req, _ := http.NewRequest("GET", addr.String(), nil) + serve := New() + serve.ReconfigureHandler(rwMock, req) + + targetScrape := serve.scrapes[expected.ServiceName] + s.Equal(expected.ServiceName, targetScrape.ServiceName) + s.Equal(expected.ScrapePort, targetScrape.ScrapePort) + s.Equal(expected.ScrapeLabels, targetScrape.ScrapeLabels) + s.Equal("", (*targetScrape.ScrapeLabels)["extra"]) + + s.Require().NotNil(targetScrape.NodeInfo) + s.True(expected.NodeInfo.Equal(*targetScrape.NodeInfo)) + +} + +func (s *ServerTestSuite) Test_ReconfigureHandler_WithNodeInfo_NoTargetLabelsDefined() { + nodeInfo := prometheus.NodeIPSet{} + nodeInfo.Add("node-1", "1.0.1.1") + nodeInfo.Add("node-2", "1.0.1.2") + expected := prometheus.Scrape{ + ServiceName: "my-service", + ScrapePort: 1234, + ScrapeLabels: &map[string]string{}, + NodeInfo: &nodeInfo, + } + + nodeInfoBytes, err := json.Marshal(nodeInfo) + s.Require().NoError(err) + + rwMock := ResponseWriterMock{} + addr, err := url.Parse("/v1/docker-flow-monitor") + s.Require().NoError(err) + + q := addr.Query() + q.Add("serviceName", expected.ServiceName) + q.Add("scrapePort", fmt.Sprintf("%d", expected.ScrapePort)) + q.Add("env", "dev") + q.Add("domain", "frontend") + q.Add("nodeInfo", string(nodeInfoBytes)) + addr.RawQuery = q.Encode() + + req, _ := http.NewRequest("GET", addr.String(), nil) + serve := New() + serve.ReconfigureHandler(rwMock, req) + + targetScrape := serve.scrapes[expected.ServiceName] + s.Equal(expected.ServiceName, targetScrape.ServiceName) + s.Equal(expected.ScrapePort, targetScrape.ScrapePort) + s.Equal(expected.ScrapeLabels, targetScrape.ScrapeLabels) + + s.Require().NotNil(targetScrape.NodeInfo) + s.True(expected.NodeInfo.Equal(*targetScrape.NodeInfo)) +} + func (s *ServerTestSuite) Test_ReconfigureHandler_DoesNotAddAlert_WhenAlertNameIsEmpty() { rwMock := ResponseWriterMock{} req, _ := http.NewRequest("GET", "/v1/docker-flow-monitor", nil) @@ -588,6 +791,107 @@ scrape_configs: s.Equal(expected, string(actual)) } +func (s *ServerTestSuite) Test_ReconfigureHandler_WithNodeInfo_CallsWriteConfig() { + fsOrig := prometheus.FS + defer func() { + os.Unsetenv("DF_SCRAPE_TARGET_LABELS") + prometheus.FS = fsOrig + }() + prometheus.FS = afero.NewMemMapFs() + os.Setenv("DF_SCRAPE_TARGET_LABELS", "env,domain") + expectedConfig := `global: + scrape_interval: 5s +alerting: + alertmanagers: + - static_configs: + - targets: + - alert-manager:9093 + scheme: http +rule_files: +- alert.rules +scrape_configs: +- job_name: my-service + file_sd_configs: + - files: + - /etc/prometheus/file_sd/my-service.json +` + nodeInfo := prometheus.NodeIPSet{} + nodeInfo.Add("node-1", "1.0.1.1") + nodeInfo.Add("node-2", "1.0.1.2") + expected := prometheus.Scrape{ + ServiceName: "my-service", + ScrapePort: 1234, + ScrapeLabels: &map[string]string{ + "env": "prod", + "domain": "frontend", + }, + NodeInfo: &nodeInfo, + } + + nodeInfoBytes, err := json.Marshal(nodeInfo) + s.Require().NoError(err) + + rwMock := ResponseWriterMock{} + addr, err := url.Parse("/v1/docker-flow-monitor") + s.Require().NoError(err) + + q := addr.Query() + q.Add("serviceName", expected.ServiceName) + q.Add("scrapePort", fmt.Sprintf("%d", expected.ScrapePort)) + q.Add("env", (*expected.ScrapeLabels)["env"]) + q.Add("domain", (*expected.ScrapeLabels)["domain"]) + q.Add("nodeInfo", string(nodeInfoBytes)) + q.Add("alertName", "my-alert") + q.Add("alertIf", "my-if") + q.Add("alertFor", "my-for") + addr.RawQuery = q.Encode() + + req, _ := http.NewRequest("GET", addr.String(), nil) + + serve := New() + serve.ReconfigureHandler(rwMock, req) + + actualPrometheusConfig, err := afero.ReadFile(prometheus.FS, "/etc/prometheus/prometheus.yml") + s.Require().NoError(err) + s.Equal(expectedConfig, string(actualPrometheusConfig)) + + fileSDConfigByte, err := afero.ReadFile(prometheus.FS, "/etc/prometheus/file_sd/my-service.json") + s.Require().NoError(err) + + fileSDconfig := prometheus.FileStaticConfig{} + err = json.Unmarshal(fileSDConfigByte, &fileSDconfig) + s.Require().NoError(err) + + s.Require().Len(fileSDconfig, 2) + + var targetGroup1 *prometheus.TargetGroup + var targetGroup2 *prometheus.TargetGroup + for _, tg := range fileSDconfig { + if tg == nil { + continue + } + for _, target := range tg.Targets { + if target == "1.0.1.1:1234" { + targetGroup1 = tg + break + } else if target == "1.0.1.2:1234" { + targetGroup2 = tg + break + } + } + } + + s.Require().NotNil(targetGroup1) + s.Require().NotNil(targetGroup2) + + s.Equal((*expected.ScrapeLabels)["env"], targetGroup1.Labels["env"]) + s.Equal((*expected.ScrapeLabels)["domain"], targetGroup1.Labels["domain"]) + s.Equal("node-1", targetGroup1.Labels["node"]) + + s.Equal((*expected.ScrapeLabels)["env"], targetGroup2.Labels["env"]) + s.Equal("node-2", targetGroup2.Labels["node"]) +} + func (s *ServerTestSuite) Test_ReconfigureHandler_SendsReloadRequestToPrometheus() { reloadOrig := prometheus.Reload defer func() { prometheus.Reload = reloadOrig }() @@ -808,6 +1112,157 @@ alerting: s.Equal(expectedAfterDelete, string(actual)) } +func (s *ServerTestSuite) Test_RemoveHandler_WithNodeInfo_CallsWriteConfig() { + fsOrig := prometheus.FS + defer func() { + prometheus.FS = fsOrig + }() + prometheus.FS = afero.NewMemMapFs() + nodeInfo1 := prometheus.NodeIPSet{} + nodeInfo1.Add("node-1", "1.0.1.1") + nodeInfo1.Add("node-2", "1.0.1.2") + expected1 := prometheus.Scrape{ + ServiceName: "my-service1", + ScrapePort: 1234, + ScrapeLabels: &map[string]string{}, + NodeInfo: &nodeInfo1, + } + + nodeInfo2 := prometheus.NodeIPSet{} + nodeInfo2.Add("node-1", "1.0.2.1") + expected2 := prometheus.Scrape{ + ServiceName: "my-service2", + ScrapePort: 2341, + ScrapeLabels: &map[string]string{}, + NodeInfo: &nodeInfo2, + } + + nodeInfoBytes1, err := json.Marshal(nodeInfo1) + s.Require().NoError(err) + nodeInfoBytes2, err := json.Marshal(nodeInfo2) + s.Require().NoError(err) + + rwMock := ResponseWriterMock{} + addr, err := url.Parse("/v1/docker-flow-monitor") + s.Require().NoError(err) + + q1 := addr.Query() + q1.Add("serviceName", expected1.ServiceName) + q1.Add("scrapePort", fmt.Sprintf("%d", expected1.ScrapePort)) + q1.Add("nodeInfo", string(nodeInfoBytes1)) + + q2 := addr.Query() + q2.Add("serviceName", expected2.ServiceName) + q2.Add("scrapePort", fmt.Sprintf("%d", expected2.ScrapePort)) + q2.Add("nodeInfo", string(nodeInfoBytes2)) + + serve := New() + + addr.RawQuery = q1.Encode() + req1, _ := http.NewRequest("GET", addr.String(), nil) + serve.ReconfigureHandler(rwMock, req1) + + addr.RawQuery = q2.Encode() + req2, _ := http.NewRequest("GET", addr.String(), nil) + serve.ReconfigureHandler(rwMock, req2) + + actualPrometheusConfigBytes, err := afero.ReadFile(prometheus.FS, "/etc/prometheus/prometheus.yml") + s.Require().NoError(err) + + actualPrometheusConfig := prometheus.Config{} + err = yaml.Unmarshal(actualPrometheusConfigBytes, &actualPrometheusConfig) + s.Require().NoError(err) + s.Len(actualPrometheusConfig.ScrapeConfigs, 2) + + var sdConfig1 *prometheus.SDConfig + var sdConfig2 *prometheus.SDConfig + + for _, sc := range actualPrometheusConfig.ScrapeConfigs { + if sc.JobName == "my-service1" { + s.Require().Len(sc.ServiceDiscoveryConfig.FileSDConfigs, 1) + sdConfig1 = sc.ServiceDiscoveryConfig.FileSDConfigs[0] + } + if sc.JobName == "my-service2" { + s.Require().Len(sc.ServiceDiscoveryConfig.FileSDConfigs, 1) + sdConfig2 = sc.ServiceDiscoveryConfig.FileSDConfigs[0] + } + } + s.Require().NotNil(sdConfig1) + s.Require().NotNil(sdConfig2) + s.Len(sdConfig1.Files, 1) + s.Len(sdConfig2.Files, 1) + s.Equal("/etc/prometheus/file_sd/my-service1.json", sdConfig1.Files[0]) + s.Equal("/etc/prometheus/file_sd/my-service2.json", sdConfig2.Files[0]) + + // my-service1 has two servies + fileSDConfigService1Byte, err := afero.ReadFile(prometheus.FS, "/etc/prometheus/file_sd/my-service1.json") + s.Require().NoError(err) + fileSDconfig1 := prometheus.FileStaticConfig{} + err = json.Unmarshal(fileSDConfigService1Byte, &fileSDconfig1) + s.Require().NoError(err) + s.Require().Len(fileSDconfig1, 2) + + // my-service2 has one servies + fileSDConfigService2Byte, err := afero.ReadFile(prometheus.FS, "/etc/prometheus/file_sd/my-service2.json") + s.Require().NoError(err) + fileSDconfig2 := prometheus.FileStaticConfig{} + err = json.Unmarshal(fileSDConfigService2Byte, &fileSDconfig2) + s.Require().NoError(err) + s.Require().Len(fileSDconfig2, 1) + + // Delete my-service1 + addrDelete1 := "/v1/docker-flow-monitor?serviceName=my-service1" + reqDelete1, _ := http.NewRequest("DELETE", addrDelete1, nil) + + serve.RemoveHandler(rwMock, reqDelete1) + + actualConfigBytes, _ := afero.ReadFile(prometheus.FS, "/etc/prometheus/prometheus.yml") + // Config did not change since there is still a service being scraped + actualPrometheusConfigAfter := prometheus.Config{} + err = yaml.Unmarshal(actualConfigBytes, &actualPrometheusConfigAfter) + s.Require().NoError(err) + s.Len(actualPrometheusConfigAfter.ScrapeConfigs, 1) + + // my-service1 is gone + myService1Exists, err := afero.Exists(prometheus.FS, "/etc/prometheus/file_sd/my-service1.json") + s.Require().NoError(err) + s.False(myService1Exists) + + fileSDConfigService2Byte, err = afero.ReadFile(prometheus.FS, "/etc/prometheus/file_sd/my-service2.json") + s.Require().NoError(err) + fileSDconfig2After := prometheus.FileStaticConfig{} + err = json.Unmarshal(fileSDConfigService2Byte, &fileSDconfig2After) + s.Require().NoError(err) + + // my-service2 is still running running + s.Require().Len(fileSDconfig2After, 1) + + // Delete my-service2 + addrDelete2 := "/v1/docker-flow-monitor?serviceName=my-service2" + reqDelete2, _ := http.NewRequest("DELETE", addrDelete2, nil) + + serve.RemoveHandler(rwMock, reqDelete2) + + expectedConfigDelete := `global: + scrape_interval: 5s +alerting: + alertmanagers: + - static_configs: + - targets: + - alert-manager:9093 + scheme: http +` + + actualConfig, _ := afero.ReadFile(prometheus.FS, "/etc/prometheus/prometheus.yml") + // Config did not change since there is still a service being scraped + s.Equal(expectedConfigDelete, string(actualConfig)) + + // my-service2 is gone + myService2Exists, err := afero.Exists(prometheus.FS, "/etc/prometheus/file_sd/my-service2.json") + s.Require().NoError(err) + s.False(myService2Exists) +} + func (s *ServerTestSuite) Test_RemoveHandler_SendsReloadRequestToPrometheus() { called := false reloadOrig := prometheus.Reload diff --git a/stacks/docker-flow-monitor-flexible-labels.yml b/stacks/docker-flow-monitor-flexible-labels.yml new file mode 100644 index 0000000..fe8bcbd --- /dev/null +++ b/stacks/docker-flow-monitor-flexible-labels.yml @@ -0,0 +1,38 @@ +version: "3" + +services: + + monitor: + image: vfarcic/docker-flow-monitor:${TAG:-latest} + environment: + - LISTENER_ADDRESS=swarm-listener + - GLOBAL_SCRAPE_INTERVAL=10s + - ARG_ALERTMANAGER_URL=http://alert-manager:9093 + - DF_SCRAPE_TARGET_LABELS=env,metricType + networks: + - monitor + ports: + - 9090:9090 + + alert-manager: + image: vfarcic/alert-manager:slack + networks: + - monitor + + swarm-listener: + image: vfarcic/docker-flow-swarm-listener + networks: + - monitor + volumes: + - /var/run/docker.sock:/var/run/docker.sock + environment: + - DF_NOTIFY_CREATE_SERVICE_URL=http://monitor:8080/v1/docker-flow-monitor/reconfigure + - DF_NOTIFY_REMOVE_SERVICE_URL=http://monitor:8080/v1/docker-flow-monitor/remove + - DF_INCLUDE_NODE_IP_INFO=true + deploy: + placement: + constraints: [node.role == manager] + +networks: + monitor: + external: true diff --git a/stacks/exporters-tutorial-flexible-labels.yml b/stacks/exporters-tutorial-flexible-labels.yml new file mode 100644 index 0000000..2b91c6b --- /dev/null +++ b/stacks/exporters-tutorial-flexible-labels.yml @@ -0,0 +1,50 @@ +version: "3" + +services: + + cadvisor: + image: google/cadvisor + networks: + - monitor + volumes: + - /:/rootfs + - /var/run:/var/run + - /sys:/sys + - /var/lib/docker:/var/lib/docker + deploy: + mode: global + labels: + - com.df.notify=true + - com.df.scrapePort=8080 + - com.df.scrapeNetwork=monitor + - com.df.env=prod + - com.df.metricType=system + + node-exporter: + image: basi/node-exporter + networks: + - monitor + environment: + - HOST_HOSTNAME=/etc/host_hostname + volumes: + - /proc:/host/proc + - /sys:/host/sys + - /:/rootfs + - /etc/hostname:/etc/host_hostname + deploy: + mode: global + labels: + - com.df.notify=true + - com.df.scrapePort=9100 + - com.df.alertName.1=mem_load + - com.df.alertIf.1=(sum by (instance) (node_memory_MemTotal) - sum by (instance) (node_memory_MemFree + node_memory_Buffers + node_memory_Cached)) / sum by (instance) (node_memory_MemTotal) > 0.8 + - com.df.alertName.2=diskload + - com.df.alertIf.2=@node_fs_limit:0.8 + - com.df.scrapeNetwork=monitor + - com.df.env=dev + - com.df.metricType=system + command: '--path.procfs="/host/proc" --path.sysfs="/host/sys" --collector.filesystem.ignored-mount-points="^/(sys|proc|dev|host|etc)($$|/)" --collector.textfile.directory="/etc/node-exporter/" --collector.conntrack --collector.diskstats --collector.entropy --collector.filefd --collector.filesystem --collector.loadavg --collector.mdadm --collector.meminfo --collector.netdev --collector.netstat --collector.stat --collector.textfile --collector.time --collector.vmstat --collector.ipvs' + +networks: + monitor: + external: true