forked from datahub-project/datahub
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat(ingest/dbt): handle complex dbt sql + improve docs (datahub-proj…
- Loading branch information
1 parent
9210572
commit 6092423
Showing
6 changed files
with
120 additions
and
32 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
### Setup | ||
|
||
This source pulls dbt metadata directly from the dbt Cloud APIs. | ||
|
||
You'll need to have a dbt Cloud job set up to run your dbt project, and "Generate docs on run" should be enabled. | ||
|
||
The token should have the "read metadata" permission. | ||
|
||
To get the required IDs, go to the job details page (this is the one with the "Run History" table), and look at the URL. | ||
It should look something like this: https://cloud.getdbt.com/next/deploy/107298/projects/175705/jobs/148094. | ||
In this example, the account ID is 107298, the project ID is 175705, and the job ID is 148094. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
### Setup | ||
|
||
The artifacts used by this source are: | ||
|
||
- [dbt manifest file](https://docs.getdbt.com/reference/artifacts/manifest-json) | ||
- This file contains model, source, tests and lineage data. | ||
- [dbt catalog file](https://docs.getdbt.com/reference/artifacts/catalog-json) | ||
- This file contains schema data. | ||
- dbt does not record schema data for Ephemeral models, as such datahub will show Ephemeral models in the lineage, however there will be no associated schema for Ephemeral models | ||
- [dbt sources file](https://docs.getdbt.com/reference/artifacts/sources-json) | ||
- This file contains metadata for sources with freshness checks. | ||
- We transfer dbt's freshness checks to DataHub's last-modified fields. | ||
- Note that this file is optional – if not specified, we'll use time of ingestion instead as a proxy for time last-modified. | ||
- [dbt run_results file](https://docs.getdbt.com/reference/artifacts/run-results-json) | ||
- This file contains metadata from the result of a dbt run, e.g. dbt test | ||
- When provided, we transfer dbt test run results into assertion run events to see a timeline of test runs on the dataset | ||
|
||
To generate these files, we recommend this workflow for dbt build and datahub ingestion. | ||
|
||
```sh | ||
dbt source snapshot-freshness | ||
dbt build | ||
cp target/run_results.json target/run_results_backup.json | ||
dbt docs generate | ||
cp target/run_results_backup.json target/run_results.json | ||
|
||
# Run datahub ingestion, pointing at the files in the target/ directory | ||
``` | ||
|
||
The necessary artifact files will then appear in the `target/` directory of your dbt project. | ||
|
||
We also have guides on handling more complex dbt orchestration techniques and multi-project setups below. | ||
|
||
:::note Entity is in manifest but missing from catalog | ||
|
||
This warning usually appears when the catalog.json file was not generated by a `dbt docs generate` command. | ||
Most other dbt commands generate a partial catalog file, which may impact the completeness of the metadata in ingested into DataHub. | ||
|
||
Following the above workflow should ensure that the catalog file is generated correctly. | ||
|
||
::: |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters