fix(metricprovider): support Datadog v2 API Fixes #2813 (#2997)

* Add note in CONTRIBUTING.md that I would have found useful. Signed-off-by: mitchell amihod <[email protected]> * Fix gen-openapi to be more portable - make sure it includes the GOPATH in the call. Signed-off-by: mitchell amihod <[email protected]> * Update docs. * Expand working with v2 information * Contacted Datadog to get latest info re: v1 deprecation, api limits. * Add tips about rate limits, using helm for templates * Add more example templates Signed-off-by: mitchell amihod <[email protected]> * Update Datadog Analysis Type * Make Query omitempty since now possible it won't exist * Add some descriptions * Add new properties we need for v2 Queries: We can pass in key:query for queries Formula: Makes formulas using the keys from queries * Defaults! Use annotations to declare defaults for some fields. This lets us remove some guard rails from the code itself Interval: 5m - Move this from code to here ApiVersion: v1 - Move this from code to here * Enums! Much like defaults, having enums lets us make assumptions about the incoming metric so we dont need as many guardrails. ApiVersion: Enum to restrict to v1 or v2 Signed-off-by: mitchell amihod <[email protected]> * Output of make codegen Everything looks ok. Signed-off-by: mitchell amihod <[email protected]> * Pass in metric to provider factory. Validate metric. Validating the metric on initialization, rather than spread out throughout. You get earlier feedback if you have a bad metric defined. (Not perfect, but there's limitations with our annotation generator for the rules in the crd. eg: If we could use oneOf, we wouldn't need a lot of this validation) We check all the mutually exclusive props. The props where one requires another. We don't have to check for defaults and set them anymore, since they are guaranteed by the crd. rules: - ensure we have only query OR queries - restrict v1 to query only - make sure you only provide a formula with queries - make sure multiple queries are accompanied by a formula Signed-off-by: mitchell amihod <[email protected]> * Remove DefaultApiVersion, remove impossible AnalysisPhaseError ApiVersion is guaranteed to have value, and the enum ensures its v1/v2 when user provided. Updated v1 tests to reflect some of these new realities Signed-off-by: mitchell amihod <[email protected]> * extract urlBuilding from run run was getting a bit long according to the checks Signed-off-by: mitchell amihod <[email protected]> * Remove some unnecessary stuff for interval. It is a straightline to initialize since default is set to 5m for incoming metrics where it is not set. Signed-off-by: mitchell amihod <[email protected]> * Update createRequest Split into createRequest v1/v2 v1 : pretty much unchanged. just extracted v2: support for v2/query/scalar We don't need all the timeseries. I did some testing fetching both scalar and timeseries, and they pretty much lined up. Also confirmed with DD: From support ticket with DD: "...I have also tested the scalar api endpoint with the last aggregator as well as the timeseries api endpoint. They do indeed return the same values when retrieving the values via the api endpoints. Observing the time it takes to retrieve the values, they remain relatively the same..." re: query + v2: Keep backwards compat. if we get in a query, we turn it into a queries object to pass on to requestv2 queries into the QueriesPayload. Signed-off-by: mitchell amihod <[email protected]> * Handle v2 scalar responses * update the datadogResponseV2 for scalar values * handle no results so it has parity with v1 - empty will now usually result in `[]` unless something goes very wrong on dd side Signed-off-by: mitchell amihod <[email protected]> * update v1 no data tests to better reflect reality Signed-off-by: mitchell amihod <[email protected]> * update v2 tests * add some new test cases * update mock server to handle queries / formulas validation * update no data tests to reflect reality * stop all values being the same. it makes it difficult to know find which test case failed. move meaning from comments into the metric name. * stop using deprecated ioutil Signed-off-by: mitchell amihod <[email protected]> * re-codegen Signed-off-by: zachaller <[email protected]> * fix lint Signed-off-by: zachaller <[email protected]> --------- Signed-off-by: mitchell amihod <[email protected]> Signed-off-by: zachaller <[email protected]> Co-authored-by: zachaller <[email protected]>
argoproj · Oct 14, 2023 · ebf1a3e · ebf1a3e
1 parent 1d89c33
commit ebf1a3e
Show file tree

Hide file tree

Showing 20 changed files with 1,520 additions and 736 deletions.
diff --git a/Makefile b/Makefile
@@ -20,7 +20,7 @@ DEV_IMAGE ?= false
 
 # E2E variables
 E2E_INSTANCE_ID ?= argo-rollouts-e2e
-E2E_TEST_OPTIONS ?= 
+E2E_TEST_OPTIONS ?=
 E2E_PARALLEL ?= 1
 E2E_WAIT_TIMEOUT ?= 120
 GOPATH ?= $(shell go env GOPATH)
@@ -149,7 +149,7 @@ gen-mocks: install-go-tools-local ## generate mock files
 # generates openapi_generated.go
 .PHONY: gen-openapi
 gen-openapi: $(DIST_DIR)/openapi-gen ## generate openapi files
-	PATH=${DIST_DIR}:$$PATH openapi-gen \
+	PATH=${DIST_DIR}:$$PATH GOPATH=${GOPATH} openapi-gen \
 		--go-header-file ${CURRENT_DIR}/hack/custom-boilerplate.go.txt \
 		--input-dirs github.com/argoproj/argo-rollouts/pkg/apis/rollouts/v1alpha1 \
 		--output-package github.com/argoproj/argo-rollouts/pkg/apis/rollouts/v1alpha1 \

diff --git a/docs/CONTRIBUTING.md b/docs/CONTRIBUTING.md
@@ -178,6 +178,12 @@ make start-e2e E2E_INSTANCE_ID=''
 ```
 
 
+6. Working on CRDs? While editing them directly works when you are finding the shape of things you want, the final CRDs are autogenerated. Make sure to regenerate them before submitting PRs. They are controlled by the relevant annotations in the types file:
+
+eg: Analysis Templates are controlled by annotations in `pkg/apis/rollouts/v1alpha1/analysis_types.go`.
+
+Refer to https://book-v1.book.kubebuilder.io/beyond_basics/generating_crd.html and https://book.kubebuilder.io/reference/markers/crd-validation.html for more info on annotations you can use.
+
 ## Running Local Containers
 
 You may need to run containers locally, so here's how:

diff --git a/docs/analysis/datadog.md b/docs/analysis/datadog.md
@@ -26,10 +26,7 @@ spec:
           sum:requests.error.rate{service:{{args.service-name}}}
 ```
 
-The field `apiVersion` refers to the API version of Datadog (v1 or v2). Default value is `v1` if this is omitted.
-
-!!! note
-    Datadog is moving away from the legacy v1 API. Rate limits imposed by Datadog are therefore stricter when using v1. It is recommended to switch to v2 soon. If you switch to v2, you will not be able to use formulas (operations between individual queries).
+The field `apiVersion` refers to the API version of Datadog (v1 or v2). Default value is `v1` if this is omitted. See "Working with Datadog API v2" below for more information.
 
 Datadog api and app tokens can be configured in a kubernetes secret in argo-rollouts namespace.
 
@@ -46,3 +43,135 @@ stringData:
 ```
 
 `apiVersion` here is different from the `apiVersion` from the Datadog configuration above.
+
+### Working with Datadog API v2
+
+#### Moving to v2
+
+If your old v1 was just a simple metric query - no formula as part of the query - then you can just move to v2 by updating the `apiVersion` in your existing Analysis Template, and everything should work.
+
+If you have a formula, you will need to update how you configure your metric. Here is a before/after example of what your Analysis Template should look like:
+
+Before:
+
+```yaml
+apiVersion: argoproj.io/v1alpha1
+kind: AnalysisTemplate
+metadata:
+  name: log-error-rate
+spec:
+  args:
+  - name: service-name
+  metrics:
+  - name: error-rate
+    interval: 30s
+    successCondition: default(result, 0) < 10
+    failureLimit: 3
+    provider:
+      datadog:
+        apiVersion: v1
+        interval: 5m
+        query: "moving_rollup(sum:requests.errors{service:{{args.service-name}}}.as_count(), 60, 'sum') / sum:requests{service:{{args.service-name}}}.as_count()"
+```
+
+After:
+
+```yaml
+apiVersion: argoproj.io/v1alpha1
+kind: AnalysisTemplate
+metadata:
+  name: loq-error-rate
+spec:
+  args:
+  - name: service-name
+  metrics:
+  - name: error-rate
+    # Polling rate against the Datadog API
+    interval: 30s
+    successCondition: default(result, 0) < 10
+    failureLimit: 3
+    provider:
+      datadog:
+        apiVersion: v2
+        # The window of time we are looking at in DD. Basically we will fetch data from (now-5m) to now.
+        interval: 5m
+        queries:
+          a: sum:requests.errors{service:{{args.service-name}}}.as_count()
+          b: sum:requests{service:{{args.service-name}}}.as_count()
+        formula: "moving_rollup(a, 60, 'sum') / b"
+```
+
+#### Examples
+
+Simple v2 query with no formula
+
+```yaml
+apiVersion: argoproj.io/v1alpha1
+kind: AnalysisTemplate
+metadata:
+  name: canary-container-restarts
+spec:
+  args:
+    # This is set in rollout using the valueFrom: podTemplateHashValue functionality
+    - name: canary-hash
+    - name: service-name
+    - name: restarts.initial-delay
+      value: "60s"
+    - name: restarts.max-restarts
+      value: "4"
+  metrics:
+    - name: kubernetes.containers.restarts
+      initialDelay: "{{ args.restarts.initial-delay }}"
+      interval: 15s
+      failureCondition: default(result, 0) > {{ args.restarts.max-restarts }}
+      failureLimit: 0
+      provider:
+        datadog:
+          apiVersion: v2
+          interval: 5m
+          queries:
+            # The key is arbitrary - you will use this key to refer to the query if you use a formula.
+            q: "max:kubernetes.containers.restarts{service-name:{{args.service-name}},rollouts_pod_template_hash:{{args.canary-hash}}}"
+```
+
+### Tips
+
+#### Datadog Results
+
+Datadog queries can return empty results if the query takes place during a time interval with no metrics. The Datadog provider will return a `nil` value yielding an error during the evaluation phase like:
+
+```
+invalid operation: < (mismatched types <nil> and float64)
+```
+
+However, empty query results yielding a `nil` value can be handled using the `default()` function. Here is a succeeding example using the `default()` function:
+
+```yaml
+successCondition: default(result, 0) < 0.05
+```
+
+#### Templates and Helm
+
+Helm and Argo Rollouts both try to parse things between `{{ ... }}` when rendering templates. If you use Helm to deliver your manifests, you will need to escape `{{ args.whatever }}`. Using the example above, here it is set up for Helm:
+
+```yaml
+...<snip>
+metrics:
+  - name: kubernetes.containers.restarts
+      initialDelay: "{{ `{{ args.restarts.initial-delay }}` }}"
+    interval: 15s
+      failureCondition: default(result, 0) > {{ `{{ args.restarts.max-restarts }}` }}
+    failureLimit: 0
+    provider:
+      datadog:
+        apiVersion: v2
+        interval: 5m
+        queries:
+          q: "{{ `max:kubernetes.containers.restarts{kube_app_name:{{args.kube_app_name}},rollouts_pod_template_hash:{{args.canary-hash}}}` }}"
+```
+
+#### Rate Limits
+
+For the `v1` API, you ask for an increase on the `api/v1/query` route.
+
+For the `v2` API, the Ratelimit-Name you ask for an increase in is the `query_scalar_public`.
diff --git a/docs/features/kustomize/rollout_cr_schema.json b/docs/features/kustomize/rollout_cr_schema.json
@@ -244,18 +244,30 @@
                                             "datadog": {
                                                 "properties": {
                                                     "apiVersion": {
+                                                        "default": "v1",
+                                                        "enum": [
+                                                            "v1",
+                                                            "v2"
+                                                        ],
+                                                        "type": "string"
+                                                    },
+                                                    "formula": {
                                                         "type": "string"
                                                     },
                                                     "interval": {
+                                                        "default": "5m",
                                                         "type": "string"
                                                     },
+                                                    "queries": {
+                                                        "additionalProperties": {
+                                                            "type": "string"
+                                                        },
+                                                        "type": "object"
+                                                    },
                                                     "query": {
                                                         "type": "string"
                                                     }
                                                 },
-                                                "required": [
-                                                    "query"
-                                                ],
                                                 "type": "object"
                                             },
                                             "graphite": {
@@ -4750,18 +4762,30 @@
                                             "datadog": {
                                                 "properties": {
                                                     "apiVersion": {
+                                                        "default": "v1",
+                                                        "enum": [
+                                                            "v1",
+                                                            "v2"
+                                                        ],
+                                                        "type": "string"
+                                                    },
+                                                    "formula": {
                                                         "type": "string"
                                                     },
                                                     "interval": {
+                                                        "default": "5m",
                                                         "type": "string"
                                                     },
+                                                    "queries": {
+                                                        "additionalProperties": {
+                                                            "type": "string"
+                                                        },
+                                                        "type": "object"
+                                                    },
                                                     "query": {
                                                         "type": "string"
                                                     }
                                                 },
-                                                "required": [
-                                                    "query"
-                                                ],
                                                 "type": "object"
                                             },
                                             "graphite": {
@@ -9256,18 +9280,30 @@
                                             "datadog": {
                                                 "properties": {
                                                     "apiVersion": {
+                                                        "default": "v1",
+                                                        "enum": [
+                                                            "v1",
+                                                            "v2"
+                                                        ],
+                                                        "type": "string"
+                                                    },
+                                                    "formula": {
                                                         "type": "string"
                                                     },
                                                     "interval": {
+                                                        "default": "5m",
                                                         "type": "string"
                                                     },
+                                                    "queries": {
+                                                        "additionalProperties": {
+                                                            "type": "string"
+                                                        },
+                                                        "type": "object"
+                                                    },
                                                     "query": {
                                                         "type": "string"
                                                     }
                                                 },
-                                                "required": [
-                                                    "query"
-                                                ],
                                                 "type": "object"
                                             },
                                             "graphite": {

diff --git a/manifests/crds/analysis-run-crd.yaml b/manifests/crds/analysis-run-crd.yaml
@@ -179,13 +179,22 @@ spec:
                         datadog:
                           properties:
                             apiVersion:
+                              default: v1
+                              enum:
+                              - v1
+                              - v2
+                              type: string
+                            formula:
                               type: string
                             interval:
+                              default: 5m
                               type: string
+                            queries:
+                              additionalProperties:
+                                type: string
+                              type: object
                             query:
                               type: string
-                          required:
-                          - query
                           type: object
                         graphite:
                           properties:

diff --git a/manifests/crds/analysis-template-crd.yaml b/manifests/crds/analysis-template-crd.yaml
@@ -175,13 +175,22 @@ spec:
                         datadog:
                           properties:
                             apiVersion:
+                              default: v1
+                              enum:
+                              - v1
+                              - v2
+                              type: string
+                            formula:
                               type: string
                             interval:
+                              default: 5m
                               type: string
+                            queries:
+                              additionalProperties:
+                                type: string
+                              type: object
                             query:
                               type: string
-                          required:
-                          - query
                           type: object
                         graphite:
                           properties:

diff --git a/manifests/crds/cluster-analysis-template-crd.yaml b/manifests/crds/cluster-analysis-template-crd.yaml
@@ -175,13 +175,22 @@ spec:
                         datadog:
                           properties:
                             apiVersion:
+                              default: v1
+                              enum:
+                              - v1
+                              - v2
+                              type: string
+                            formula:
                               type: string
                             interval:
+                              default: 5m
                               type: string
+                            queries:
+                              additionalProperties:
+                                type: string
+                              type: object
                             query:
                               type: string
-                          required:
-                          - query
                           type: object
                         graphite:
                           properties: