Databricks: Integration Test Example

This repo shows how to set up unit tests that can run as part of checks that the queries are defined to meet business rules.

Design

The silver and gold layers will be run as part of the integration tests. The bronze layer in the integration tests is fake data that test the edge cases. After the silver and gold layers queries run a set of expectations are run to test that the business rules are met.

Based on this databricks example

Diagram of Notebooks

Pipeline Deployment & Run

The pipelines are set up to be run manually or to be run as a workflow in github actions.

In this repo, the github action triggers when a merge into master happens. This seems to be the most functional because the databricks job will take time and also require spinning up databricks compute instances. It will help to ensure that the unit tests are run before code goes into master but also keeping costs low.

The databricks bundle is designed to run a job which has two tasks. The first task runs the DLT pipeline and then on success the second task runs to clean up the tables and schema that were created by the DLT pipeline. An individual can run the databricks bundle, then check the DLT pipeline run for the success of the unit tests, and finally destroy the jobs and pipeline.

run databricks bundle deploy : just command just bundle-deploy
run databricks bundle run : just command just bundle-run
run databricks bundle destroy : just command just bundle-destroy

Two notes about the pipeline:

there is an intentionally failing exception in one of the TEST queries to show how failing works
you will need to add env vars for the databricks authentication token and for the url

Unit Tests

In src/utils.py, there is a function and a few unit tests on that function. The example code shows how to run tests which need a spark session. In conftest.py, the spark session is defined in a fixture. The fixture is scoped as 'class' so that the creation and deletion of the spark session object occurs with running the tests in the class.

For github actions, there is a job to run all tests in the src directory.

If the tests are run locally, you will need to add java's jdk. I installed temurin which handled the installation of the jdk on an Apple m1.

SQL Validation

Included in the code base is a python script that connects to the databricks instance and parses the event logs to report on failed expectations. The command that can be run is just sql-validations. Additionally, there are some unit tests on the code that can be run with just pytest-unittest.

Local Dev Set Up

Databricks-CLI

The commands to set up the pipelines use databricks-cli. You can brew install: link to install instructions. You will need to follow the instructions to set up an access token which will be added to your local setup through the databricks-cli.

Justfile

.justfile for common commands. just github

pre-commit

pre-commit : isort and black formatting when creating a commit

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
.github/workflows		.github/workflows
dbx_bundle_resources		dbx_bundle_resources
images		images
pipeline_code		pipeline_code
pipeline_integration_test		pipeline_integration_test
src		src
.gitignore		.gitignore
.justfile		.justfile
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
conftest.py		conftest.py
databricks.yml		databricks.yml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Databricks: Integration Test Example

Design

Pipeline Deployment & Run

Unit Tests

SQL Validation

Local Dev Set Up

Databricks-CLI

Justfile

pre-commit

About

Releases

Packages

Languages

upjohnc/dbx_unit_testing

Folders and files

Latest commit

History

Repository files navigation

Databricks: Integration Test Example

Design

Pipeline Deployment & Run

Unit Tests

SQL Validation

Local Dev Set Up

Databricks-CLI

Justfile

pre-commit

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages