-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Unit tests not working for incremental models #10686
Comments
Thanks for reporting this @CraigWilson-ZOE ! If you comment out the unit test completely, does it still give you that error? The SQL that is creating that syntax error looks like it could be coming from See below for a simplified version of the situation you described. It worked without error for me, so I'm curious if it works for you or not?
|
can I pick it ? |
ok thanks. |
Sorry for the late reply, I have been on vacation.
I have commented out the unit test and I do not get the error anymore.
No we do not override any of those macros.
I have tried with the super simple version and it passed without error. {{
config(
materialized='incremental',
incremental_strategy='insert_overwrite',
unique_key='event_id',
partition_by={
"field": "date_day",
"data_type": "date"
}
)
}}
select 1 as event_id, cast('2002-02-02' as date) as date_day - name: unit_test__int_barcode_test_increment
description: Unit test for int_barcode_test
model: int_barcode_test
overrides:
macros:
is_incremental: true
given:
- input: this
rows:
- {event_id: 1, date_day: 2001-01-01}
expect:
rows:
- {event_id: 1, date_day: 2002-02-02} NOTE: I created a new model so I could run both at the same time. My original model still fails with the same error. |
Also worth posting here is my packages:
- package: dbt-labs/dbt_utils
version: 1.3.0
- package: fivetran/mailchimp_source
version: [">=0.4.0", "<0.6.0"]
- package: fivetran/mailchimp
version: [">=0.7.0", "<0.10.0"]
- package: fivetran/zendesk_source
version: [">=0.10.0", "<0.14.0"]
- package: fivetran/zendesk
version: [">=0.10.0", "<0.14.0"]
- package: fivetran/google_ads_source
version: [">=0.9.1", "<0.11.0"]
- package: fivetran/google_ads
version: [">=0.9.1", "<0.11.0"]
- package: fivetran/facebook_ads
version: [">=0.6.0", "<0.8.0"]
- package: fivetran/facebook_ads_source
version: [">=0.6.0", "<0.8.0"] |
Okay! The next step is to try to isolate whichever piece(s) of your model that is triggering the syntax error when it is executed via dbt unit tests. Are you able to tweak that super simple version so that it is closer and closer to your actual model until you find something that breaks the simplified version? |
Hi @CraigWilson-ZOE, I understand that you are sharing the compiled SQL rather than the model, correct? Could you please share it with me? I’d like to see if it’s similar to the other issues I was looking into in this PR: #10849. If the model has any kind of var, env_var, or other thins that cames from context this can cause this kind of error during the unit test phase |
Why I'm saying that @CraigWilson-ZOE . Your compiled sql is wrong due the double "..", which will be the result of unit test if you have a model like this
result
|
Are there any updates on this? I have the same problem @dbeatty10 |
I think it's a good idea if you post the model.sql file you're having issues , @kangshung |
Oops, I closed it too hastily! @kangshung as @devmessias said, what you reported in #10686 (comment) is a duplicate of #10433 But we still don't have steps to reproduce for what @CraigWilson-ZOE originally reported in this issue. @CraigWilson-ZOE does #10686 (comment) look similar to your model that is having the issues with unit testing? |
Hi All, Sorry I have been away from my computer for a while. Now that I am back and had some time to dive into this, it seems like a call to an internal dbt-core macro is failing. The SQL that is being executed is: select distinct
table_schema,
table_name,
case table_type
when 'BASE TABLE' then 'table'
when 'EXTERNAL TABLE' then 'external'
when 'MATERIALIZED VIEW' then 'materializedview'
else lower(table_type)
end as `table_type`
from `development`..INFORMATION_SCHEMA.TABLES
where lower(table_name) like lower ('')
and lower(table_name) not like lower ('') I believe this is coming from Is there a reason that the schema wouldn't be set in this instance? I have checked via logging output of the I do think this is a bug and needs to be fixed otherwise we are saying that some macros cannot be used in a incremental model. |
Hi @CraigWilson-ZOE, this seems similar to #10891 and #10139. This is not the raw code for the model, right? Can you share the raw code? As I discuss in #10908 maybe we should allow overriding dynamic variables |
In the meantime, testing a simpler model that uses the same macro in a non-incremental model might help. I tend to believe that your issue isn't related to the model being incremental or not |
Yes it does look the same as #10891 , thanks! I am not sure about overriding dynamic variables vs just having them work as expected by many people or provide an alternative.
I agree that it isn't related to it being incremental or not, it is due to the above issue. |
I don't think there is an easy way to do that, as pointed in #8499. The CTE approach will solve some limitations but will impact the performance , and will create somo additional problems according to #8499 (comment) But in your case, if you have just one macro call could you try to override the macro? |
@CraigWilson-ZOE could you share your line of code that you are using for I'm trying to reproduce the same error you are getting, but I'm not able when I use |
There are so many issues whose root cause is this limitation, and users are incorrectly associating it with other things. Wouldn't it be helpful to improve the unit test doc to emphasize this limitation, @dbeatty10? |
Here is a breakdown of the SQL that I have and what is causing the issue. This is a change I made to ensure that getting the latest partition data in BigQuery is much more efficient than doing a query on the target table. This has saved a lot of processing on our large tables. Macro to get the latest partition info: {% macro get_latest_partition_value(table, partition_data_type=none) %}
{% if not execute %}
{{ return(modules.datetime.date.fromisoformat('2023-01-01')) }}
{% endif %}
{% if not dbt_utils.get_relations_by_pattern(table.schema, table.name) %}
{{ return(modules.datetime.date.fromisoformat('1900-01-01')) }}
{% endif %}
{% set node = graph.nodes.get("model." + project_name + "." + table.name) %}
{% if node is none %}
{{ return(none) }}
{% endif %}
{% if not partition_data_type %}
{% set raw_partition_by = config.get('partition_by', none) %}
{% if not raw_partition_by %}
{% do exceptions.raise_compiler_error("partition_by must be set in the model config to get latest partition value") %}
{% endif %}
{% set partition_by = adapter.parse_partition_by(raw_partition_by) %}
{% set partition_data_type = partition_by.data_type %}
{% endif %}
{%- set query -%}
SELECT MAX(
{% if partition_data_type == 'date' %}
PARSE_DATE("%Y%m%d",partition_id)
{% elif partition_data_type == 'timestamp'%}
PARSE_TIMESTAMP("%Y%m%d",partition_id)
{% elif partition_data_type == 'datetime' %}
PARSE_DATETIME("%Y%m%d",partition_id)
{% elif partition_data_type == 'int64' %}
CAST(partition_id AS INT64)
{% else %}
{% do exceptions.raise_compiler_error("partition_by data type must be date, timestamp, datetime or int64") %}
{% endif %}
)
FROM {{table.database}}.{{table.schema}}.INFORMATION_SCHEMA.PARTITIONS
WHERE
table_name = '{{table.name}}'
AND partition_id IS NOT NULL
AND partition_id != '__NULL__'
AND partition_id != '__UNPARTITIONED__'
AND total_rows > 0
{% endset %}
{% set results = run_query(query) %}
{% set partition_max_value = results.columns[0].values()[0] %}
{{ return(partition_max_value) }}
{% endmacro %} And it is used like this (example): {% set partition_date = get_latest_partition_value(this) %}
{{
config(
materialized='incremental',
unique_key='correlation_id',
partition_by={
"field": "search_date",
"data_type": "date"
},
cluster_by=["message_type"],
)
}}
WITH
all_search_logs AS
(
SELECT DISTINCT
correlation_id,
timestamp AS search_time,
username,
payload,
origin_service,
message_type,
DATE(timestamp) AS search_date,
COALESCE(CAST(JSON_EXTRACT_SCALAR(metadata, '$.with_scores') AS BOOL), TRUE) AS with_scores
FROM
{{ source('pubsub_dataflow', 'pubsub_to_bigquery_partitioned') }}
{% if target.name != 'production' %}
WHERE
timestamp >= (CAST(DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY) AS TIMESTAMP))
{% else %}
{% if is_incremental() and partition_date is not none %}
WHERE
timestamp >= TIMESTAMP('{{ partition_date }}')
{% endif %}
{% endif %}
)
SELECT * FROM all_search_logs The |
This will not work with the current limitations of unit tests. However, you can override the macro since it is being called only once. In that case, having a feature that allows to override a dynamic variable is not necessary. |
Thanks for providing those details @CraigWilson-ZOE ! I was able to get the same error as you with a simplified version of your scenario. And I was also able to resolve it by providing an override for the ExplanationIt looks like these lines are the ones that won't work the way you want when you execute them in a unit test: FROM {{table.database}}.{{table.schema}}.INFORMATION_SCHEMA.PARTITIONS
WHERE
table_name = '{{table.name}}' There's an explanation in #10891 (comment). The TLDR is that within the context of unit tests, WorkaroundCould you try providing an override for your It would look something like this: - name: unit_test__int_barcode_scanned_events_incremental_mode
description: Unit test for int_barcode_scanned_events in incremental mode
model: int_barcode_scanned_events
overrides:
macros:
is_incremental: true
get_latest_partition_value: SOMETHING_HERE |
Ah that makes a lot of sense, and thanks for the example and explanation. Happy for this to be closed as I think a lot of it is covered in other issues as linked above. |
Thanks again for raising this and helping home in on the root causes @CraigWilson-ZOE. Going to close this in favor of #10891 and other relevant issues linked above. |
Is this a new bug in dbt-core?
Current Behavior
When running a unit tests for an incremental model I am getting the following error:
Expected Behavior
The unit tests to pass
Steps To Reproduce
Relevant log output
Environment
Which database adapter are you using with dbt?
bigquery
Additional Context
The SQL that seems to be produced and is causing the error is the following:
The text was updated successfully, but these errors were encountered: