-
Notifications
You must be signed in to change notification settings - Fork 0
How to use dbt with Databend Cloud
This is a primer tutorial about how to use dbt-databend-cloud. Here we use the official dbt tutorial as an example to introduce the use of Databend in dbt together. The software used in this example and its version requirements:
- dbt 1.01 or upper
- dbt-databend-cloud 0.0.1
Installing dbt and dbt-databend-cloud requires only one command because dbt is installed as a dependency when we install dbt-databend-cloud.
pip3 install dbt-databend-cloud
dbt can also be installed by yourself. Please refer to the official installation guide.
The jaffle_shop is a project provided by dbt-lab to demonstrate dbt functionality. You can get it directly from GitHub.
$ git clone https://github.com/dbt-labs/jaffle_shop
$ cd jaffle_shop
All files in the jaffle_shop project directory are as follows.
- dbt_project.yml is the dbt project configuration file, which holds the project name, database configuration file path information, etc.
- models directory contains the SQL model and table schema for the project. Note that this section was written by the data analyst himself.
- seed directory stores CSV files. Such CSV files can be dumped from database export tools. In the jaffle shop project, these CSV files are used as raw data to be processed. I'll go into more detail about them later when I use one of the files or directories above.
- Global Configuration
dbt has a default global profile: ~/.dbt/profiles.yml
, which we first set up in the user directory and configured the connection information for the Databend database. If you do not have ~/.dbt/profiles.yml
please create it first.
jaffle_shop_databend:
target: dev
outputs:
dev:
type: databend
host: tnc7yee14--query-perf.ch.datafusecloud.com
port: 443
schema: sjh_dbt
user: cloudapp
pass: ********
You can find your connect host, user, pass information by reading Connecting to a Warehouse | Databend Cloud .
- Project configuration
In the jaffle_shop project directory, the project configuration file dbt_project.yml
is available. Change the profile configuration item to jaffle_shop_databend
, which is the project name in profiles.yml
. Then, the project will query the database connection configuration in the ~/.dbt/profiles.yml
file.
cat dbt_project.yml
name: 'jaffle_shop_databend'
config-version: 2
version: '0.1'
profile: 'jaffle_shop_databend'
model-paths: ["models"]
seed-paths: ["seeds"]
test-paths: ["tests"]
analysis-paths: ["analysis"]
macro-paths: ["macros"]
target-path: "target"
clean-targets:
- "target"
- "dbt_modules"
- "logs"
require-dbt-version: [">=1.0.0", "<2.0.0"]
models:
jaffle_shop:
materialized: table
staging:
materialized: view
- Verify the configuration
You can run dbt debug
in the project directory to check whether the database and project configuration is correct.
Load the CSV data and materialize the CSV as a table in the target database. Note: In general, dbt projects do not need this step because the data for your pending projects is in the database.
dbt seed
03:04:20 Running with dbt=1.3.0
03:04:20 Unable to do partial parsing because profile has changed
03:04:20 [WARNING]: Configuration paths exist in your dbt_project.yml file which do not apply to any resources.
There are 2 unused configuration paths:
- models.jaffle_shop
- models.jaffle_shop.staging
03:04:20 Found 5 models, 20 tests, 0 snapshots, 0 analyses, 282 macros, 0 operations, 3 seed files, 0 sources, 0 exposures, 0 metrics
03:04:20
03:04:29 Concurrency: 1 threads (target='dev')
03:04:29
03:04:29 1 of 3 START seed file sjh_dbt.raw_customers ................................... [RUN]
sync ec1e4a32-9eb6-4f8c-88d7-35c40cfa89d1.csv duration:9s
03:04:42 1 of 3 OK loaded seed file sjh_dbt.raw_customers ............................... [INSERT 100 in 12.85s]
03:04:42 2 of 3 START seed file sjh_dbt.raw_orders ...................................... [RUN]
sync 3fd97eae-d20b-4d92-9dd7-9926e3ae4c86.csv duration:9s
03:04:55 2 of 3 OK loaded seed file sjh_dbt.raw_orders .................................. [INSERT 99 in 12.54s]
03:04:55 3 of 3 START seed file sjh_dbt.raw_payments .................................... [RUN]
sync 8da2b4b5-78c7-4230-b8c6-c1f030bdcb8f.csv duration:9s
03:05:07 3 of 3 OK loaded seed file sjh_dbt.raw_payments ................................ [INSERT 113 in 12.77s]
03:05:07
03:05:07 Finished running 3 seeds in 0 hours 0 minutes and 47.17 seconds (47.17s).
03:05:07
03:05:07 Completed successfully
03:05:07
03:05:07 Done. PASS=3 WARN=0 ERROR=0 SKIP=0 TOTAL=3
In the above result, it is clear that three tasks have been performed and three tables ( sjh_dbt.raw_customers, sjh_dbt.raw_orders, sjh_dbt.raw_payments) were loaded.
Next, go to the Databend Cloud database to see what happens.
We found an extra sjh_dbt
database, which is the engineering database that dbt created for us.
There are three tables in sjh_dbt database, corresponding to the above three task results respectively.
Before skipping to the next step, it's important to understand what role the model plays in dbt. In dbt, models are used to describe the structure of a set of tables or views, and there are two main types of files: SQL and YML. Also note that in the jaffle_shop project, table structures are held under the models/ directory and view structures are held under the models/staging/ directory according to the materialization configuration. For example, models/orders.sql is an SQL query statement with jinja syntax. We will create a table by this query statement.
$ cat models/orders.sql
{% set payment_methods = ['credit_card', 'coupon', 'bank_transfer', 'gift_card'] %}
with orders as (
select * from {{ ref('stg_orders') }}
),
payments as (
select * from {{ ref('stg_payments') }}
),
order_payments as (
select
order_id,
{% for payment_method in payment_methods -%}
sum(case when payment_method = '{{ payment_method }}' then amount else 0 end) as {{ payment_method }}_amount,
{% endfor -%}
sum(amount) as total_amount
from payments
group by order_id
),
final as (
select
orders.order_id,
orders.customer_id,
orders.order_date,
orders.status,
{% for payment_method in payment_methods -%}
order_payments.{{ payment_method }}_amount,
{% endfor -%}
order_payments.total_amount as amount
from orders
left join order_payments
on orders.order_id = order_payments.order_id
)
select * from final
Also, the constraint information that goes with this SQL is in the models/schema.yml file. The schema.yml is a registry of all models in the current directory. All models are organized into a tree structure that describes the description and attributes of each field. The tests items represent constraints for this field, which can be tested using the dbt test command. See the official documentation for more information.
The result shows three views (sjh_dbt.stg_customers, sjh_dbt.stg_orders, sjh_dbt.stg_payments) and two tables (sjh_dbt.customers, sjh_dbt.orders).
dbt seed
03:04:20 Running with dbt=1.3.0
03:04:20 Unable to do partial parsing because profile has changed
03:04:20 [WARNING]: Configuration paths exist in your dbt_project.yml file which do not apply to any resources.
There are 2 unused configuration paths:
- models.jaffle_shop
- models.jaffle_shop.staging
03:04:20 Found 5 models, 20 tests, 0 snapshots, 0 analyses, 282 macros, 0 operations, 3 seed files, 0 sources, 0 exposures, 0 metrics
03:04:20
03:04:29 Concurrency: 1 threads (target='dev')
03:04:29
03:04:29 1 of 3 START seed file sjh_dbt.raw_customers ................................... [RUN]
sync ec1e4a32-9eb6-4f8c-88d7-35c40cfa89d1.csv duration:9s
03:04:42 1 of 3 OK loaded seed file sjh_dbt.raw_customers ............................... [INSERT 100 in 12.85s]
03:04:42 2 of 3 START seed file sjh_dbt.raw_orders ...................................... [RUN]
sync 3fd97eae-d20b-4d92-9dd7-9926e3ae4c86.csv duration:9s
03:04:55 2 of 3 OK loaded seed file sjh_dbt.raw_orders .................................. [INSERT 99 in 12.54s]
03:04:55 3 of 3 START seed file sjh_dbt.raw_payments .................................... [RUN]
sync 8da2b4b5-78c7-4230-b8c6-c1f030bdcb8f.csv duration:9s
03:05:07 3 of 3 OK loaded seed file sjh_dbt.raw_payments ................................ [INSERT 113 in 12.77s]
03:05:07
03:05:07 Finished running 3 seeds in 0 hours 0 minutes and 47.17 seconds (47.17s).
03:05:07
03:05:07 Completed successfully
03:05:07
03:05:07 Done. PASS=3 WARN=0 ERROR=0 SKIP=0 TOTAL=3
❯ dbt run
03:29:28 Running with dbt=1.3.0
03:29:28 [WARNING]: Configuration paths exist in your dbt_project.yml file which do not apply to any resources.
There are 2 unused configuration paths:
- models.jaffle_shop.staging
- models.jaffle_shop
03:29:28 Found 5 models, 20 tests, 0 snapshots, 0 analyses, 282 macros, 0 operations, 3 seed files, 0 sources, 0 exposures, 0 metrics
03:29:28
03:29:38 Concurrency: 1 threads (target='dev')
03:29:38
03:29:38 1 of 5 START sql view model sjh_dbt.stg_customers .............................. [RUN]
03:29:52 1 of 5 OK created sql view model sjh_dbt.stg_customers ......................... [OK in 13.76s]
03:29:52 2 of 5 START sql view model sjh_dbt.stg_orders ................................. [RUN]
03:30:08 2 of 5 OK created sql view model sjh_dbt.stg_orders ............................ [OK in 16.28s]
03:30:08 3 of 5 START sql view model sjh_dbt.stg_payments ............................... [RUN]
03:30:22 3 of 5 OK created sql view model sjh_dbt.stg_payments .......................... [OK in 13.62s]
03:30:22 4 of 5 START sql view model sjh_dbt.customers .................................. [RUN]
03:30:35 4 of 5 OK created sql view model sjh_dbt.customers ............................. [OK in 13.29s]
03:30:35 5 of 5 START sql view model sjh_dbt.orders ..................................... [RUN]
03:30:48 5 of 5 OK created sql view model sjh_dbt.orders ................................ [OK in 13.32s]
03:30:48
03:30:48 Finished running 5 view models in 0 hours 1 minutes and 19.96 seconds (79.96s).
03:30:48
03:30:48 Completed successfully
03:30:48
03:30:48 Done. PASS=5 WARN=0 ERROR=0 SKIP=0 TOTAL=5
Go to the Databend database to verify that the creation is successful. The result illustrates that five more tables or views, such as customers, have been added, and the data in the tables or views have been transformed.
dbt also supports the generation of visual documents, using the following command.
- Generate
dbt docs generate
04:01:20 Running with dbt=1.3.0
04:01:20 Unable to do partial parsing because profile has changed
04:01:20 [WARNING]: Configuration paths exist in your dbt_project.yml file which do not apply to any resources.
There are 2 unused configuration paths:
- models.jaffle_shop
- models.jaffle_shop.staging
04:01:20 Found 5 models, 20 tests, 0 snapshots, 0 analyses, 282 macros, 0 operations, 3 seed files, 0 sources, 0 exposures, 0 metrics
04:01:20
04:01:26 Concurrency: 1 threads (target='dev')
04:01:26
04:01:26 Done.
04:01:26 Building catalog
04:01:33 Catalog written to /Users/hanshanjie/git-works/jaffle_shop/target/catalog.json
- Start server
Run dbt docs serve
.
The document, which contains the overall structure of the jaffle_shop project and a description of all the tables and views, can be viewed in a browser.
The use of Databend Cloud in dbt includes the following steps:
- Install dbt and dbt-databend-cloud
- Configuration
- Write SQL and YML files
- Running