All the supported validations are listed here. The validations are grouped by the scope they can be used with.
If you want some sane default validations, you can look at the default_validation.yaml. Those should be a good starting point for your own configuration and applicable to most of the use cases.
- Supported validations by scopes
- Groups
- Universal rule validators
- Labels
- PromQL expression validators
expressionIsValidPromQL
expressionDoesNotUseExperimentalFunctions
expressionDoesNotUseMetrics
expressionDoesNotUseLabels
expressionUsesOnlyAllowedLabelsForMetricRegexp
expressionDoesNotUseOlderDataThan
expressionDoesNotUseRangeShorterThan
expressionDoesNotUseIrate
validFunctionsOnCounters
rateBeforeAggregation
expressionUsesUnderscoresInLargeNumbers
expressionWithNoMetricName
expressionIsWellFormatted
expressionDoesNotUseClassicHistogramBucketOperations
- PromQL expression validators (using live Prometheus instance)
- LogQL expression validators
- Other
- Alert validators
- Recording rules validators
Usage of the following validations is limited to the Group
scope.
Group
scope.
Fails if the rule group has other than than the configured source tenants.
If using Mimir, you may want to check that only the known tenants are used to avoid typos for example.
params:
allowedSourceTenants: [ "foo", "bar" ]
Fails if the rule group has the interval
out of the configured range.
By default it will ignore, if the group does not have the interval configured.
You can enforce it to be set by setting the mustBeSet
to true.
Useful to avoid using too short or too long evaluation intervals such as
1s
which would most certainly lead to missed evaluation intervals. Enforcint the interval to be configured per group can force the user to think about how often they really need the rules to be evaluated.
params:
minimum: "0s"
maximum: <duration> # Optional, default is infinity
mustBeSet: false
Fails if the rule group has invalid value of the partial_response_strategy
option, if set.
To enforce the partial_response_strategy
to be set, set the mustBeSet
to true.
params:
mustBeSet: false
Fails if the rule group has more rules than the specified limit.
Since the rules in one rule group are evaluated sequentially, it's a good practice to split the rules to smaller groups. This way the evaluation will be parallelized and the evaluation time will be shorter.
params:
limit: 10
Fails if the rule group has the limit
option set higher, then the specified limit.
If not set at all, it will fail also, since the default limit is 0 meaning unlimited.
It's a good practice to limit the number of alerts in the group to avoid overloading the Alertmanager of event receivers, which can rate-limit. In case of recording rules, can help to avoid generating huge amount of time series.
params:
limit: 10
Fails if the group name does not match the specified regular expression.
params:
regexp: "[A-Z]\s+"
Fails if the rule group has the query_offset
out of the configured range.
params:
minimum: <duration>
maximum: <duration> # Optional, default is infinity
Validators that can be used on All rules
, Recording rule
and Alert
scopes.
Fails if rule does not have all the specified labels. Is searchInExpr
is set, the labels are also looked for in the
rules expr
.
Make sure every alert has all the labels required for it to be correctly routed by the Alertmanager.
params:
labels: [ "foo", "bar" ]
searchInExpr: true
Fails if rule has any of specified labels. Is searchInExpr
is set, the labels are also looked for in the rules expr
.
In case of deprecating some old well-known labels used formerly for routing for example, you can make sure no one will use them by mistake again.
params:
labels: [ "foo", "bar" ]
searchInExpr: true
Fails if rule does not have any of specified labels.
params:
labels: [ "foo", "bar" ]
Fails if rule label does not match the specified regular expression.
If you for example use a
team
label containing email of the specific team, you can use a regular expression to verify its form.
params:
label: "foo"
regexp: ".*"
Fails if rule label value is not one of the allowed values. If the commaSeparatedValue
is set to true, the label value
to true, the label value is split by a comma, and the distinct values are checked if valid.
Since the labels can be templated, but Promruval cannot tell if the resulting value will be valid,
there is the ignoreTemplatedValues
option, that allows you to ignore the templated values.
It's quite common to have well known severities for alerts which can be important even in the Alertmanager routing tree. Ths is how you can make sure only the well-known severities are used.
params:
label: "foo"
allowedValues: [ "foo", "bar" ]
commaSeparatedValue: true
ignoreTemplatedValues: false
Fails if any label has empty value. It has no effect and is dropped by Prometheus.
Fails if the rule has the first label and also the second one. You can also optionally specify event the value of those labels.
Example: If alert has label severity
with value critical
cannot have label page
with value true
params:
firstLabel: "severity"
firstLabelValue: "critical" # Optional, if not set, only presence of the label excludes the second label
secondLabel: "page"
secondLabelValue: "true" # Optional, if set, fails only if also the second label value matches
Fails if the expression is not a valid PromQL query.
Fails if the rule expression uses any of the experimental PromQL functions.
Fails if the rule expression uses metrics matching any of the metric name fully anchored(will be surrounded by ^...$
) regexps.
If you want to avoid using some metrics in the rules, you can use this validation to make sure it won't happen.
params:
metricNameRegexps: [ "foo_bar.*", "foo_baz" ]
Fails if the rule uses any of specified labels in its expr
label matchers, aggregations or joins.
If using Thanos, users has to know if the rule is evaluated by Prometheus or Thanos, but Prometheus cannot use the external labels. This way you can make sure it won't happen.
params:
labels: [ "foo", "bar" ]
Fails if the rule uses any labels beside those listed in allowedLabels
, in combination with given metric regexp in its expr
label matchers, aggregations or joins. If the metric name is omitted in the query, or matched using regexp or any negative matcher on the __name__
label, the rule will be skipped.
The check rather ignores validation of labels, where it cannot be sure if they are targeting only the metric in question, like aggregations by labels on top of vector matching expression where the labels might come from the other part of the expr.
If using kube-state-metrics for exposing labels information about K8S objects (kube_*_labels) only those labels whitelisted by kube-state-metrics admin will be available. Might be useful to check that users does not use any other in their expressions.
params:
metricNameRegexp: "kube_pod_labels" # The regexp will be fully anchored (surrounded by ^...$)
allowedLabels: [ "pod", "cluster", "app", "team" ]
Fails if the rule expr
uses older data than specified limit in Prometheus duration syntax. Checks even in sub-queries
and offsets.
Useful to avoid writing queries which expects longer data retention than the Prometheus actually has.
params:
limit: "12h"
Fails if the rule expr
uses shorter range than specified limit in the Prometheus duration format.
Useful to avoid using shorter range than twice of the scrape interval.
params:
limit: "1m"
Fails if the rule expr
uses the irate
function as discouraged
in https://prometheus.io/docs/prometheus/latest/querying/functions/#irate.
It's not recommended to use
irate
function in the rules.
Fails if the expression uses a rate
or increase
function on a metric that does not end with the _total
suffix.
It's a common mistake to use the
rate
orincrease
function on a metric that is not a counter. This validation can help to avoid it.
Fails if aggregation function is used before the rate
or increase
functions.
Avoid common mistake of using aggregation function before the
rate
orincrease
function.
Fails if the query containes numbers higher than 1000 without using underscores as separators for better readability.
Ignores numbers in the 10e2
and duration format.
Fails if an expression doesn't use an explicit metric name (also if used as __name__
label) in all its selectors(eg up{foo="bar"}
).
Such queries may be very expensive and can lead to performance issues.
Fails if the expression is not well formatted PromQL as would promtool promql format
do.
It does remove the comments from the expression before the validation, since the PromQL prettifier drops them, so this should avoid false positive diffs.
But if you want to ignore the expressions with comments, you can set the ignoreComments
to true.
Useful to make sure the expressions are formatted in a consistent way.
params:
showExpectedForm: true # Optional, will show how the query should be formatted
skipExpressionsWithComments: true # Optional, will skip the expressions with comments
Fails if the expression does any binary operation between bucket metrics of a classical histogram.
There are situations when the classic histogram is not atomic (for example remote write), this it may result in unexpected results. This calculation is often used to calculate SLOs a a difference between the
+Inf
bucket and one of the buckets which is the SLO threshold. To avoid this issue, it's recommended to calculate such differences before sending the data over the remote write for example.
All these validations require the prometheus
sectiong in the config to be set.
Queries live prometheus instance, requires the
prometheus
config to be set.
This validation runs the expression against the actual Prometheus instance and checks if it ends with error. Possibly you can set maximum allowed query execution time and maximum number of resulting time series.
params:
timeSeriesLimit: 100 # Optional, maximum series returned by the query
evaluationDurationLimit: 1m # Optional, maximum duration of the query evaluation
Queries live prometheus instance, requires the
prometheus
config to be set.
Fails if any used label is not present in the configured Prometheus instance.
Queries live prometheus instance, requires the
prometheus
config to be set.
Verifies if any of the selectors in the expression (eg up{foo="bar"}
) matches actual data in the configured Prometheus
instance.
params:
maximumMatchingSeries: 1000 # Optional, maximum number of matching series for single selector used in expression
Fails if the expression is not a valid LogQL query.
Fails if the LogQL expression does not use any range aggregation function, which is required if used in rules.
Fails, if the rule uses metric, that matches the specified regular expression for any tenant, but does not have the tenant configured in the source_tenants
of the rule group option the rule belongs to.
If you use Mimir, and know, that the metrics are coming from specific tenants, you can make sure the tenants are configured in the rule group
source_tenants
option.
params:
defaultTenant: <tenant_name> # Optional, if set, the tenant that will be assumed if the group does not have the `source_tenants` option set
sourceTenants:
<tenant_name>:
- regexp: <metric_name_regexp> # The regexp will be fully anchored (surrounded by ^...$)
negativeRegexp: <metric_name_regexp> # Optional, metrics matching the regexp will be excluded from the check, will be fully anchored (surrounded by ^...$)
description: <description> # Optional, will be shown in the validator output human-readable description
# Example:
# k8s:
# - regexp: "kube_.*|container_.*"
# description: "Metrics from KSM"
# - regexp: "container_.*"
# description: "Metrics from cAdvisor"
# - regexp: "kafka_.*"
# - regexp: "node_.*"
# description: "Node exporter metrics provided by the k8s infrastructure team"
# kafka:
# - regexp: "kafka_.*"
# negativeRegexp: "kafka_(consumer|producer)_.*"
# description: "Metrics from Kafka"
Validators that can be used on Alert
scope.
Fails if the label contains invalid Go template.
Fails if rule does not have all the specified annotations.
Alertmanager templates often expects some specific annotations, so they can be rendered correctly. Make sure all alerts has those!
params:
annotations: [ "foo", "bar" ]
Fails if rule has any of specified annotations.
params:
annotations: [ "foo", "bar" ]
Fails if rule does not have any of specified annotations.
params:
annotations: [ "foo", "bar" ]
Fails if rule annotation value does not match the specified regular expression.
params:
annotation: "foo"
regexp: ".*"
Fails if rule annotation value is not one of the allowed values.
params:
annotation: "foo"
allowedValues: [ "foo", "bar" ]
commaSeparatedValue: true
Fails if annotation value is not a valid URL. If resolveURL
is enabled, tries to make an HTTP request to the specified
URL and fails if the request does not succeed or returns 404 HTTP status code.
It's common practice to link a playbook with guide how to solve the alert in the alert itself. This way you can verify it's a working URL and possibly if it really exists.
params:
annotation: "playbook"
resolveUrl: true
Fails if the rule specified annotations does not contain valid PromQL if present.
params:
annotation: "foo"
Fails if the annotation contains invalid Go template.
Fails if the alert uses longer for
than the specified limit.
Too long
for
makes the alerts more fragile.
params:
limit: "1h"
Fails if the alert uses longer keep_firing_for
than the specified limit.
params:
limit: "1h"
Fails if the alert name does not match the specified regular expression.
params:
regexp: "[A-Z]\s+"
Validators that can be used on Recording rule
scope.
Fails if the name of the recorded metric does not match the specified regular expression.
params:
regexp: "[^:]+:[^:]+:[^:]+"
Fails if the name of the recorded metric matches the specified regular expression.
params:
regexp: "^foo_bar$"