-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] DBT unit tests don't work properly when source table name
matches another source or model
#10433
Comments
Thanks for reporting this @yengibar-manasyan-sp ! Root causeIt appears that the root cause is that during unit testing, ExampleSee below for a reproducible example ("reprex") of the issue you reported when a model and a source have the same name (
|
name
matches another source or model
Acceptance criteriaThe key outcome we'd need is:
A possible building blockOne thing that is already unique across the DAG is Related codeSome related code is described in #5273 (comment). dbt-labs/dbt-adapters#236 and #10290 have proposed updates to the code that generates the CTE name, but I don't think either would fix this particular issue with unit tests. |
Previews: - [ref](https://docs-getdbt-com-git-dbeatty10-patch-1-dbt-labs.vercel.app/reference/dbt-jinja-functions/ref) - [source](https://docs-getdbt-com-git-dbeatty10-patch-1-dbt-labs.vercel.app/reference/dbt-jinja-functions/source) ## What are you changing in this pull request and why? This update is useful to understand some of the differences between [`ref()`](https://docs.getdbt.com/reference/dbt-jinja-functions/ref) and [`source()`](https://docs.getdbt.com/reference/dbt-jinja-functions/source). In turn, that is useful context for issues like dbt-labs/dbt-core#10433. The [`source()`](https://docs.getdbt.com/reference/dbt-jinja-functions/source) and [`ref()`](https://docs.getdbt.com/reference/dbt-jinja-functions/ref) functions are complementary: `source()` applies to: - [sources](https://docs.getdbt.com/docs/build/sources) Whereas `ref()` applies to: - [models](https://docs.getdbt.com/docs/build/models) (both [SQL models](https://docs.getdbt.com/docs/build/sql-models) and [Python models](https://docs.getdbt.com/docs/build/python-models)) - [seeds](https://docs.getdbt.com/docs/build/seeds) - [snapshots](https://docs.getdbt.com/docs/build/snapshots) `ref()` includes: - [versioned models](https://docs.getdbt.com/reference/dbt-jinja-functions/ref#versioned-ref) - [package-specific nodes](https://docs.getdbt.com/reference/dbt-jinja-functions/ref#ref-project-specific-models) - [cross-project nodes](https://docs.getdbt.com/docs/collaborate/govern/project-dependencies#how-to-write-cross-project-ref) ## Checklist - [x] Review the [Content style guide](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/content-style-guide.md) so my content adheres to these guidelines. --------- Co-authored-by: Mirna Wong <[email protected]>
Relevant codedbt-core/core/dbt/context/providers.py Lines 1595 to 1600 in c668846
Specifically, we could generate some kind of return self.adapter.Relation.add_ephemeral_prefix(globally_unique_identifier) |
Implementation ideasHere are a couple ideas of how to create the CTE in unit tests so that they are unique:
1. Use a quoted version of
|
return sha1("\n".join(package_strs).encode("utf-8")).hexdigest() |
If we don't care about readability of the CTE name, then we could just use the hexdigest as-is which for SHA1 would be a 40-character value like this:
e83c5163316f89bfbde7d9ab23ca2e25604af290
If there are readability concerns, then we could combine an abbreviated version of the hash along with the node name
(or identifier
), which might look like this:
dim_customers_e83c5163316f
Implementation ideas (continued)Here's some other ideas:
3. Allow configuration of a source
|
We've just come across this issue ourselves. Are you planning contributing to this in the nearest future, @dbeatty10 ? |
Did anyone find a workaround? |
There’s no workaround, @jhoffland. However, if you’re using dbt-core, you could modify the code to use the full path (source name + model) or hash like previous suggested |
Different bug but that workaround should work with this issue too: |
Hey all, I'd like to raise attention to that the cause of this issue doesn't only affect sources. The current implementation can lead to duplicate CTE names when a unit test contains models with the same alias and thus cause a syntax error. Reproducible Example
{{ config(alias="test_model", schema="schema_a") }} select 1 as example_value
{{ config(alias="test_model", schema="schema_b") }} select 2 as example_value
{{ config(schema="schema_c", materialized="table") }}
select *
from {{ ref("model_a") }}
union all
select *
from {{ ref("model_b") }}
unit_tests:
- name: unit_test_with_error
description: Checks whether model_c is the union of model_a and model_b
model: model_c
given:
- input: ref('model_a')
rows:
- {example_value: 1}
- input: ref('model_b')
rows:
- {example_value: 2}
expect:
rows:
- {example_value: 1}
- {example_value: 2} Running the test yields Runtime Error in unit_test unit_test_with_error (models/_unit_tests.yaml)
An error occurred during execution of unit test 'unit_test_with_error'. There may be an error in the unit test definition: check the data types.
Database Error
WITH query name "__dbt__cte__test_model" specified more than once
LINE 22: ), __dbt__cte__test_model as ( Environment
Which database adapter are you using with dbt? |
Is this a new bug in dbt-core?
Current Behavior
One of my models had the same name as the source Snowflake table. They live in different schemas.
dbt run
works fine and generates all the data as expected. However, DBT unit tests fail because unit tests can not differentiate mocked source tables from these tables. Here is an example:When I checked the generated SQL query, I saw that it adds
__dbt__cte__
prefixes to the CTE names and replaced raws forsource('src', 'MODEL_1')
with values fromthis
.Expected Behavior
Fix unit tests to support identical model and source table names in different schemas.
Steps To Reproduce
this
(the model itself)Relevant log output
No response
Environment
Which database adapter are you using with dbt?
snowflake
Additional Context
No response
The text was updated successfully, but these errors were encountered: