Skip to content

kgmcquate/dbt-testgen

Repository files navigation

dbt-testgen

About

dbt-testgen is a dbt package that autogenerates dbt test YAML based on real data.

Code documentation available at here

Inspired by dbt-codegen and deequ Constraint Suggestion.

Install

dbt-testgen currently supports dbt 1.2.x or higher.

Include in packages.yml:

packages:
  - git: https://github.com/kgmcquate/dbt-testgen

Supported Databases

The following databases are supported:

  • Snowflake
  • Databricks
  • RedShift
  • BigQuery
  • Postgres
  • DuckDB

Integration tests are run for each of these databases in Actions.

Test types

dbt-testgen can generate these types of tests, using built-in tests, dbt_utils, and dbt-expectations:

Usage

The DBT config YAML is generated by a Jinja macro, get_test_suggestions, which you can run like this:

dbt compile -q --inline "{{ testgen.get_test_suggestions(ref('mymodel')) }}"

Output:

models:
- name: mymodel
  tests:
  - dbt_utils.recency:
      field: day
      datepart: day
      interval: 2
  columns:
  - name: user_id
    description: Numeric range test generated by dbt-testgen
    tests:
    - unique
    - not_null
    - dbt_utils.accepted_range:
        min_value: 1
        max_value: 30
  - name: username
    tests:
    - unique
    - not_null
    - dbt_expectations.expect_column_value_lengths_to_be_between:
        min_value: 8
        max_value: 15
        row_condition: '"username" is not null'
  - name: email
    tests:
    - unique
    - not_null
    - dbt_expectations.expect_column_value_lengths_to_be_between:
        min_value: 18
        max_value: 25
        row_condition: '"email" is not null'
  - name: user_status
    tests:
    - accepted_values:
        values:
        - active
        - inactive
    - dbt_expectations.expect_column_value_lengths_to_be_between:
        min_value: 6
        max_value: 8
        row_condition: '"user_status" is not null'
  - name: age
    tests:
    - dbt_utils.accepted_range:
        min_value: 22
        max_value: 35


You can output to a file like this:

dbt compile -q --inline "{{ testgen.get_test_suggestions(ref('mymodel')) }}" >> models/schema.yml


You can also merge with an existing properties YAML file:

EXISTING_YAML_BODY=`cat models/schema.yml`
dbt compile -q --inline "{{ testgen.get_test_suggestions(ref('users'), dbt_config=fromyaml(\"${EXISTING_YAML_BODY}\")) }}"


Here's an example of more advanced usage:

EXISTING_YAML_BODY=$(cat <<EOF
models:
- name: stg_jaffle_shop__customers
  config:
    tags:
    - pii
  columns:
  - name: customer_id
    tests:
    - unique
    - not_null

seeds:
- name: stg_jaffle_shop__orders
  config:
    materialized: view
  columns:
  - name: order_id
    tests:
    - unique
    - not_null
EOF
)

JINJA_TEMPLATE=$(cat <<EOF
{{ testgen.get_test_suggestions(
    ref('users'),
    sample = true,
    limit = 1000,
    resource_type = 'seeds',
    column_config = {'quote': true, 'tags': ['generated']},
    exclude_types = ['float'],
    tests = ['uniqueness', 'accepted_values', 'range'],
    uniqueness_composite_key_length = 2,
    accepted_values_max_cardinality = 10,
    range_stddevs = 1,
    dbt_config=fromyaml("${EXISTING_YAML_BODY}")
) }}
EOF
)

dbt compile -q --inline "${JINJA_TEMPLATE}"

Output:

models:
- name: stg_jaffle_shop__customers
  config:
    tags:
    - pii
  columns:
  - name: customer_id
    tests:
    - unique
    - not_null
seeds:
- name: stg_jaffle_shop__orders
  columns:
  - name: order_id
    tests:
    - unique
    - not_null
- name: users
  columns:
  - name: user_id
    tests:
    - unique
    - not_null
    - dbt_utils.accepted_range:
        min_value: -3.4017042154147523
        max_value: 34.40170421541475
    quote: true
    tags:
    - generated
  - name: username
    tests:
    - unique
    - not_null
    quote: true
    tags:
    - generated
  - name: email
    tests:
    - unique
    - not_null
    quote: true
    tags:
    - generated
  - name: user_status
    tests:
    - accepted_values:
        values:
        - active
        - inactive
    quote: true
    tags:
    - generated
  - name: age
    tests:
    - dbt_utils.accepted_range:
        min_value: 20.462630250354053
        max_value: 36.53736974964595
    quote: true
    tags:
    - generated

Macros

get_test_suggestions