Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add time slice slo support #112

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

keidarcy
Copy link

@keidarcy keidarcy commented Dec 28, 2024

what

Add Time Slice SLOs support with error budget and burn rate alerts.

why

Time Slice SLOs are a convenient alternative to Monitor-based SLOs. You can create an uptime SLO without going through a monitor, so you don’t have to create and maintain both a monitor and an SLO.

references

https://docs.datadoghq.com/service_management/service_level_objectives/time_slice/
#111

@keidarcy keidarcy requested review from a team as code owners December 28, 2024 13:56
@keidarcy keidarcy requested review from hans-d and gberenice December 28, 2024 13:56
@mergify mergify bot added the triage Needs triage label Dec 28, 2024
@keidarcy
Copy link
Author

Terraform plan terraform plan -var 'slo_paths=["catalog/time_slice_slo.yaml"]'

╭─   ~/Code/personal/terraform-datadog-platform/examples/slo on   feat/support-time-slice-slo                                                                                              at  22:57:18 ─╮
╰─❮ terraform plan -var 'slo_paths=["catalog/time_slice_slo.yaml"]'                                                                                                                                          ─╯

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # module.datadog_slo.datadog_monitor.time_slice_slo_burn_rate_alert["(SLO) Test API p95 latency Checks"] will be created
  + resource "datadog_monitor" "time_slice_slo_burn_rate_alert" {
      + evaluation_delay    = (known after apply)
      + id                  = (known after apply)
      + include_tags        = true
      + message             = "Burn rate is high enough to deplete error budget in one day"
      + name                = "(SLO Burn Rate Alert) (SLO) Test API p95 latency Checks"
      + new_host_delay      = 300
      + notify_no_data      = false
      + priority            = "2"
      + query               = (known after apply)
      + require_full_window = true
      + tags                = [
          + "env:production",
          + "service:my-service",
        ]
      + type                = "slo alert"

      + monitor_thresholds {
          + critical = "3"
        }
    }

  # module.datadog_slo.datadog_monitor.time_slice_slo_error_budget_alert["(SLO) Test API p95 latency Checks"] will be created
  + resource "datadog_monitor" "time_slice_slo_error_budget_alert" {
      + evaluation_delay    = (known after apply)
      + id                  = (known after apply)
      + include_tags        = true
      + message             = "Alert on 80% of error budget consumed"
      + name                = "(SLO Error Budget Alert) (SLO) Test API p95 latency Checks"
      + new_host_delay      = 300
      + notify_no_data      = false
      + priority            = "2"
      + query               = (known after apply)
      + require_full_window = true
      + tags                = [
          + "env:production",
          + "service:my-service",
        ]
      + type                = "slo alert"

      + monitor_thresholds {
          + critical = "80"
        }
    }

  # module.datadog_slo.datadog_service_level_objective.time_slice_slo["(SLO) Test API p95 latency Checks"] will be created
  + resource "datadog_service_level_objective" "time_slice_slo" {
      + description       = "Test API p95 latency should be less than 1 second."
      + force_delete      = true
      + id                = (known after apply)
      + name              = "(SLO) Test API p95 latency Checks"
      + tags              = [
          + "env:production",
          + "service:my-service",
        ]
      + target_threshold  = (known after apply)
      + timeframe         = (known after apply)
      + type              = "time_slice"
      + warning_threshold = (known after apply)

      + sli_specification {
          + time_slice {
              + comparator             = "<="
              + query_interval_seconds = 300
              + threshold              = 1

              + query {
                  + formula {
                      + formula_expression = "query1 + query2"
                    }
                  + query {
                      + metric_query {
                          + data_source = "metrics"
                          + name        = "query1"
                          + query       = "p95:trace.express.request{env:production,resource_name:get_/api/test,service:my-service}"
                        }
                    }
                  + query {
                      + metric_query {
                          + data_source = "metrics"
                          + name        = "query2"
                          + query       = "p95:trace.express.request{env:production,resource_name:get_/api/test,service:my-service}"
                        }
                    }
                }
            }
        }

      + thresholds {
          + target          = 99
          + target_display  = (known after apply)
          + timeframe       = "30d"
          + warning         = 99.5
          + warning_display = (known after apply)
        }
    }

Plan: 3 to add, 0 to change, 0 to destroy.

Changes to Outputs:
  + datadog_metric_slo_alerts    = {}
  + datadog_metric_slos          = {}
  + datadog_monitor_slo_monitors = {}
  + datadog_monitor_slos         = {}

───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

Note: You didn't use the -out option to save this plan, so Terraform can't guarantee to take exactly these actions if you run "terraform apply" now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triage Needs triage
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant