Merge branch 'current' into feat/dbt-databend-cloud

dbt-labs · Jan 16, 2023 · 98eb3af · 98eb3af
2 parents d193c58 + 68a42e9
commit 98eb3af
Show file tree

Hide file tree

Showing 34 changed files with 1,577 additions and 305 deletions.
diff --git a/.github/config.yml b/.github/config.yml
@@ -0,0 +1,10 @@
+ # Comment to be posted to on PRs from first time contributors in your repository
+
+newPRWelcomeComment: >
+  Hello!👋 Thanks for contributing to the dbt product documentation and opening this pull request! ✨ 
+  
+We use Markdown and some HTML to write the dbt product documentation. When writing content, you can use our [style guide](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/content-style-guide.md) and [content types](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/content-types.md) to understand our writing standards and how we organize information in the dbt product docs.
+
+  We'll review your contribution and respond as soon as we can. 😄
+
+
diff --git a/.github/workflows/label.yml b/.github/workflows/label.yml
@@ -0,0 +1,38 @@
+name: Add/Remove Labels
+
+on:
+  pull_request_target:
+    types: [ opened, closed ]
+
+jobs:
+  add_new_contributor_label:
+    if: github.event.action == 'opened'
+    permissions:
+      contents: read
+      pull-requests: write
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/github-script@v6
+        with:
+          script: |
+            const creator = context.payload.sender.login
+            const opts = github.rest.issues.listForRepo.endpoint.merge({
+              ...context.issue,
+              creator,
+              state: 'all'
+            })
+            const issues = await github.paginate(opts)
+            for (const issue of issues) {
+              if (issue.number === context.issue.number) {
+                continue
+              }
+              if (issue.pull_request) {
+                return // creator is already a contributor
+              }
+            }
+            await github.rest.issues.addLabels({
+              issue_number: context.issue.number,
+              owner: context.repo.owner,
+              repo: context.repo.repo,
+              labels: ['new contributor']
+            })
diff --git a/_redirects b/_redirects
@@ -1,3 +1,4 @@
+/dbt-cloud/api-v4 /docs/dbt-cloud-apis/admin-cloud-api 301
 /docs/building-a-dbt-project/building-models/python-models /docs/build/python-models 301
 /docs/deploy/regions  /docs/deploy/regions-ip-addresses 301
 

diff --git a/website/docs/docs/build/incremental-models.md b/website/docs/docs/build/incremental-models.md
@@ -303,5 +303,64 @@ select ...
 
 </File>
 
+<VersionBlock firstVersiont="1.4">
+
+### About incremental_predicates
+
+`incremental_predicates` is an advanced use of incremental models, where data volume is large enough to justify additional investments in performance. This config accepts any valid SQL expression, however dbt does not check the syntax. 
+
+For example, this is a pattern we might expect to see on Snowflake:
+
+```yml
+models:
+  - name: my_incremental_model
+    config:
+      materialized: incremental
+      unique_key: id
+      # this will affect how the data is stored on disk, and indexed to limit scans
+      cluster_by: ['session_start']  
+      incremental_strategy: merge
+      # this limits the scan of the existing table to the last 7 days of data
+      incremental_predicates: "DBT_INTERNAL_DEST.session_start > datediff(day, -7, current_date)"
+      # `DBT_INTERNAL_DEST` and `DBT_INTERNAL_SOURCE` are the standard aliases for the target table and temporary table, respectively, during an incremental run using the merge strategy. 
+```
+
+This will template (in the `dbt.log` file) a `merge` statement like:
+```sql
+merge into <existing_table> DBT_INTERNAL_DEST
+    from <temp_table_with_new_records> DBT_INTERNAL_SOURCE
+    on
+        -- unique key
+        DBT_INTERNAL_DEST.id = DBT_INTERNAL_SOURCE.id
+        and
+        -- custom predicate: limits data scan in the "old" data / existing table
+        DBT_INTERNAL_DEST.session_start > datediff(day, -7, current_date)
+    when matched then update ...
+    when not matched then insert ...
+```
+
+Limit the data scan of _upstream_ tables within the body of their incremental model SQL, which will limit the amount of "new" data processed/transformed.
+
+```sql
+with large_source_table as (
+
+    select * from {{ ref('large_source_table') }}
+    {% if is_incremental() %}
+        where session_start > dateadd(day, -3, current_date)
+    {% endif %}
+
+),
+
+...
+```
+
+:::info
+The syntax depends on how you configure your `incremental_strategy`:
+- If using the `merge` strategy, you may need to explicitly alias any columns with either `DBT_INTERNAL_DEST` ("old" data) or `DBT_INTERNAL_SOURCE` ("new" data). 
+- There's a decent amount of conceptual overlap with the `insert_overwrite` incremental strategy.
+:::
+
+</VersionBlock>
+
 <Snippet src="discourse-help-feed-header" />
 <DiscourseHelpFeed tags="incremental"/>
diff --git a/website/docs/docs/build/metrics.md b/website/docs/docs/build/metrics.md
@@ -15,7 +15,7 @@ keywords:
 
 ## About Metrics 
 
-A metric is a timeseries aggregation over a <Term id="table" /> that supports zero or more dimensions. Some examples of metrics include:
+A metric is an aggregation over a <Term id="table" /> that supports zero or more dimensions. Some examples of metrics include:
 - active users
 - monthly recurring revenue (mrr)
 
@@ -69,7 +69,7 @@ metrics:
     expression: user_id 
 
     timestamp: signup_date
-    time_grains: [day, week, month, quarter, year, all_time]
+    time_grains: [day, week, month, quarter, year]
 
     dimensions:
       - plan
@@ -168,8 +168,8 @@ Metrics can have many declared **properties**, which define aspects of your metr
 | label       | A short for name / label for the metric                     | New Customers                   | yes        |
 | description | Long form, human-readable description for the metric        | The number of customers who.... | no        |
 | calculation_method | The method of calculation (aggregation or derived) that is applied to the expression  | count_distinct | yes       |
-| expression  | The expression to aggregate/calculate over | user_id, cast(user_id as int) | yes       |
-| timestamp   | The time-based component of the metric                      | signup_date                     | yes       |
+| expression  | The expression to aggregate/calculate over | user_id, cast(user_id as int) | <VersionBlock firstVersion="1.4"> no </VersionBlock> <VersionBlock lastVersion="1.3"> yes </VersionBlock> |
+| timestamp   | The time-based component of the metric                      | signup_date                     | <VersionBlock firstVersion="1.4"> no </VersionBlock> <VersionBlock lastVersion="1.3"> yes </VersionBlock>        |
 | time_grains | One or more "grains" at which the metric can be evaluated. For more information, see the "Custom Calendar" section.   | [day, week, month, quarter, year]              | yes       |
 | dimensions  | A list of dimensions to group or filter the metric by       | [plan, country]                 | no        |
 | window      | A dictionary for aggregating over a window of time. Used for rolling metrics such as 14 day rolling average. Acceptable periods are: [`day`,`week`,`month`, `year`, `all_time`] |  {count: 14, period: day}        | no        |
@@ -216,6 +216,7 @@ The type of calculation (aggregation or expression) that is applied to the sql p
 | average        | This metric type will apply the `average` aggregation to the specified field |
 | min            | This metric type will apply the `min` aggregation to the specified field |
 | max            | This metric type will apply the `max` aggregation to the specified field |
+| median            | This metric type will apply the `median` aggregation to the specified field, or an alternative `percentile_cont` aggregation if `median` is not available |
 |<VersionBlock firstVersion="1.3">derived </VersionBlock> <VersionBlock lastVersion="1.2">expression </VersionBlock>  | <VersionBlock firstVersion="1.2"> This metric type is defined as any _non-aggregating_ calculation of 1 or more metrics </VersionBlock> |
 
 <VersionBlock firstVersion="1.3">
@@ -324,6 +325,91 @@ Note that `value` must be defined as a string in YAML, because it will be compil
         value: "'2020-01-01'"
 ```
 
+### Calendar
+The dbt_metrics package contains a [basic calendar table](https://github.com/dbt-labs/dbt_metrics/blob/main/models/dbt_metrics_default_calendar.sql) that is created as part of your `dbt run`. It contains dates between 2010-01-01 and 2029-12-31. 
+
+If you want to use a custom calendar, you can replace the default with any table which meets the following requirements:
+- Contains a `date_day` column. 
+- Contains the following columns: `date_week`, `date_month`, `date_quarter`, `date_year`, or equivalents. 
+- Additional date columns need to be prefixed with `date_`, e.g. `date_4_5_4_month` for a 4-5-4 retail calendar date set. Dimensions can have any name (see following section).
+
+To do this, set the value of the `dbt_metrics_calendar_model` variable in your `dbt_project.yml` file: 
+```yaml
+#dbt_project.yml
+config-version: 2
+[...]
+vars:
+    dbt_metrics_calendar_model: my_custom_calendar
+```
+
+#### Dimensions from calendar tables
+You may want to aggregate metrics by a dimension in your custom calendar table, for example is_weekend. You can include this within the list of dimensions in the macro call without it needing to be defined in the metric definition.
+
+To do so, set a list variable at the project level called custom_calendar_dimension_list, as shown in the example below.
+
+```yaml
+#dbt_project.yml
+vars:
+  custom_calendar_dimension_list: ["is_weekend"]
+```
+
+<VersionBlock firstVersion="1.3">
+
+### Configuration 
+
+Metric nodes now accept `config` dictionaries like other dbt resources. Specify Metric configs in the metric yml itself, or for groups of metrics in the `dbt_project.yml` file.
+
+<!--tabs for config and project.yml -->
+
+<Tabs>
+<TabItem value="config" label="Metric yml">
+
+<File name="models/metrics.yml">
+
+```yml
+version: 2
+metrics:
+  - name: config_metric
+    label: Example Metric with Config
+    model: ref(‘my_model’)
+    calculation_method: count
+    timestamp: date_field
+    time_grains: [day, week, month]
+    config:
+      enabled: true
+```
+
+</File>
+</TabItem>
+
+<TabItem value="project" label="dbt_project.yml">
+
+<File name="dbt_project.yml">
+
+```yml
+metrics: 
+  your_project_name: 
+    +enabled: true
+```
+
+</File>
+</TabItem>
+</Tabs>
+
+<!--End of tabs for config and project.yml -->
+
+
+#### Accepted Metric Configurations
+
+The following is the list of currently accepted metric configs:
+
+| Config | Type | Accepted Values | Default Value | Description |
+|--------|------|-----------------|---------------|-------------|
+| `enabled` | boolean | True/False | True | Enables or disables a metric node. When disabled, dbt will not consider it as part of your project. |
+| `treat_null_values_as_zero` | boolean | True/False | True | Controls the `coalesce` behavior for metrics. By default, when there are no observations for a metric, the output of the metric as well as [Period over Period](#secondary-calculations) secondary calculations will include a `coalesce({{ field }}, 0)` to return 0's rather than nulls. Setting this config to False instead returns `NULL` values. |
+
+</VersionBlock>
+
 ## Querying Your Metric
 You can dynamically query metrics directly in dbt and verify them before running a job in the deployment environment.  To query your defined metric, you must have the [dbt_metrics package](https://github.com/dbt-labs/dbt_metrics) installed. Information on how to [install packages can be found here](https://docs.getdbt.com/docs/build/packages#how-do-i-add-a-package-to-my-project).
 
@@ -406,11 +492,72 @@ You may find some pieces of functionality, like secondary calculations, complica
 | end_date    | `'2022-12-31'` | Limits the date range of data used in the metric claculation by not querying data after this date | Optional |
 | where       | `plan='paying_customer'` | A sql statment, or series of sql statements, that alter the **final** CTE in the generated sql. Most often used to limit the data to specific values of dimensions provided | Optional |
 
-#### Secondary Calculations
+### Secondary Calculations
 Secondary calculations are window functions you can add to the metric calculation and perform on the primary metric or metrics. 
 
 You can use them to compare values to an earlier period, calculate year-to-date sums, and return rolling averages. You can add custom secondary calculations into dbt projects - for more information on this, reference the [package README](https://github.com/dbt-labs/dbt_metrics#secondary-calculations).
 
+The supported Secondary Calculations are:
+
+#### Period over Period:
+
+The period over period secondary calculation performs a calculation against the metric(s) in question by either determining the difference or the ratio between two points in time. The input variable, which looks at the grain selected in the macro, determines the other point. 
+
+| Input                  | Example | Description | Required |
+| -------------------------- | ----------- | ----------- | -----------|
+| `comparison_strategy`      | `ratio` or `difference` | How to calculate the delta between the two periods | Yes |
+| `interval`                 | 1 | Integer - the number of time grains to look back | Yes |
+| `alias`                    | `week_over_week` | The column alias for the resulting calculation | No |
+| `metric_list`              | `base_sum_metric` | List of metrics that the secondary calculation should be applied to. Default is all metrics selected | No |
+
+#### Period to Date:
+
+The period to date secondary calculation performs an aggregation on a defined period of time that is equal to or higher than the grain selected. For example, when you want to display a month_to_date value alongside your weekly grained metric.
+
+| Input                  | Example | Description | Required |
+| -------------------------- | ----------- | ----------- | -----------|
+| `aggregate`                | `max`, `average` | The aggregation to use in the window function. Options vary based on the primary aggregation and are enforced in [validate_aggregate_coherence()](https://github.com/dbt-labs/dbt_metrics/blob/main/macros/validation/validate_aggregate_coherence.sql). | Yes |
+| `period`                   | `"day"`, `"week"` | The time grain to aggregate to. One of [`"day"`, `"week"`, `"month"`, `"quarter"`, `"year"`]. Must be at equal or coarser (higher, more aggregated) granularity than the metric's grain (see [Time Grains](#time-grains) below). In example grain of `month`, the acceptable periods would be `month`, `quarter`, or `year`. | Yes |
+| `alias`                    | `month_to_date` | The column alias for the resulting calculation | No |
+| `metric_list`              | `base_sum_metric` | List of metrics that the secondary calculation should be applied to. Default is all metrics selected | No |
+
+#### Rolling:
+
+<VersionBlock firstVersion="1.3" >
+The rolling secondary calculation performs an aggregation on a number of rows in metric dataset. For example, if the user selects the `week` grain and sets a rolling secondary calculation to `4` then the value returned will be a rolling 4 week calculation of whatever aggregation type was selected. If the `interval` input is not provided then the rolling caclulation will be unbounded on all preceding rows.
+
+| Input                      | Example | Description | Required |
+| -------------------------- | ----------- | ----------- | -----------|
+| `aggregate`                | `max`, `average` | The aggregation to use in the window function. Options vary based on the primary aggregation and are enforced in [validate_aggregate_coherence()](https://github.com/dbt-labs/dbt_metrics/blob/main/macros/validation/validate_aggregate_coherence.sql). | Yes |
+| `interval`                 | 1 | Integer - the number of time grains to look back | No |
+| `alias`                    | `month_to_date` | The column alias for the resulting calculation | No |
+| `metric_list`              | `base_sum_metric` | List of metrics that the secondary calculation should be applied to. Default is all metrics selected | No |
+</VersionBlock>
+
+<VersionBlock lastVersion="1.2" >
+The rolling secondary calculation performs an aggregation on a number of rows in the metric dataset. For example, if the user selects the `week` grain and sets a rolling secondary calculation to `4`, then the value returned will be a rolling 4-week calculation of whatever aggregation type was selected.
+
+| Input                      | Example | Description | Required |
+| -------------------------- | ----------- | ----------- | -----------|
+| `aggregate`                | `max`, `average` | The aggregation to use in the window function. Options vary based on the primary aggregation and are enforced in [validate_aggregate_coherence()](https://github.com/dbt-labs/dbt_metrics/blob/main/macros/validation/validate_aggregate_coherence.sql). | Yes |
+| `interval`                 | 1 | Integer - the number of time grains to look back | Yes |
+| `alias`                    | `month_to_date` | The column alias for the resulting calculation | No |
+| `metric_list`              | `base_sum_metric` | List of metrics that the secondary calculation should be applied to. Default is all metrics selected | No |
+</VersionBlock>
+
+<VersionBlock firstVersion="1.3" >
+
+#### Prior: 
+The prior secondary calculation returns the value from a specified number of intervals before the row. 
+
+| Input                      | Example | Description | Required |
+| -------------------------- | ----------- | ----------- | -----------|
+| `interval`                 | 1 | Integer - the number of time grains to look back | Yes |
+| `alias`                    | `2_weeks_prior` | The column alias for the resulting calculation | No |
+| `metric_list`              | `base_sum_metric` | List of metrics that the secondary calculation should be applied to. Default is all metrics selected | No |
+
+</VersionBlock>
+
 ### Developing metrics with `metrics.develop`
 
 <VersionBlock firstVersion="1.3" >
@@ -493,6 +640,61 @@ Functionality for `develop` is only supported in v1.2 and higher. Please navigat
 
 </VersionBlock>
 
+#### Multiple/Derived Metrics with `metrics.develop`
+If you have a more complicated use case that you are interested in testing, the develop macro also supports this behavior. The only caveat is that you must include the raw tags for any provided metric yml that contains a derived metric. Example below:
+
+```
+{% set my_metric_yml -%}
+{% raw %}
+
+metrics:
+  - name: develop_metric
+    model: ref('fact_orders')
+    label: Total Discount ($)
+    timestamp: order_date
+    time_grains: [day, week, month]
+    calculation_method: average
+    expression: discount_total
+    dimensions:
+      - had_discount
+      - order_country
+
+  - name: derived_metric
+    label: Total Discount ($)
+    timestamp: order_date
+    time_grains: [day, week, month]
+    calculation_method: derived
+    expression: "{{ metric('develop_metric') }} - 1 "
+    dimensions:
+      - had_discount
+      - order_country
+
+  - name: some_other_metric_not_using
+    label: Total Discount ($)
+    timestamp: order_date
+    time_grains: [day, week, month]
+    calculation_method: derived
+    expression: "{{ metric('derived_metric') }} - 1 "
+    dimensions:
+      - had_discount
+      - order_country
+
+{% endraw %}
+{%- endset %}
+
+select * 
+from {{ metrics.develop(
+        develop_yml=my_metric_yml,
+        metric_list=['derived_metric']
+        grain='month'
+        )
+    }}
+```
+
+The above example will return a dataset that contains the metric provided in the metric list (`derived_metric`) and the parent metric (`develop_metric`). It will not contain `some_other_metric_not_using` as it is not designated in the metric list or a parent of the metrics included.
+
+**Important caveat** - You _must_ wrap the `expression` property for `derived` metrics in double quotes to render it. For example,  `expression: "{{ metric('develop_metric') }} - 1 "`.
+
 
 <Snippet src="discourse-help-feed-header" />
 <DiscourseHelpFeed tags="metrics"/>