-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dynamic Model Reference and depends_on hint #2716
Comments
Thanks for the walkthrough @brittianwarner. I don't think this is going to be possible today. dbt needs to be able to construct the dependency graph without stepping through Jinja conditional statements, hence the In the meantime, here are the options I see:
I know I recommended something similar in #2673. Does either of those make sense in your case? |
Thanks, @jtcohen6! Unfortunately, it would be a huge headache to build different versions of the model for each Customer as we could in theory have 20 models and we would have to support any combination of those depending on the customer. Let me think a bit more about your second recommendation and I will get back to you. In general, I feel it would be extremely powerful to be able to set dependencies based on a set of variables at runtime. Our use case involves having a dbt job for 100+ customers that all have their own "gotchas" so having this flexibility would be awesome. Another idea... Is there a way to manually add a dependency via a macro or something? |
What have you got in mind? |
I haven't fully thought through it, but it would be nice to insert/update/delete dependencies before any models execute. So I am picturing some sort of pre-run operation where I can change a dependency based on a variable or lookup table. So for my example, I would pass a customer_id to 'dbt run' and then I would have a macro that would lookup configuration data and set dependencies on the fly. Thoughts? |
I think I may have figured it out... I went ahead and added all of the depends_on hints at the top of the model and built a macro to build the sql needed... Here is an example:
|
@jtcohen6 Making sure you saw this. |
That's a neat approach! I think the critical piece is that you're including the following, outside of any conditional logic: -- depends_on: {{ ref('stg_model1_performance') }}
-- depends_on: {{ ref('stg_model2_performance') }}
-- depends_on: {{ ref('stg_model3_performance') }}
-- depends_on: {{ ref('stg_model4_performance') }}
-- depends_on: {{ ref('stg_model5_performance') }} Even if you don't end up selecting from all these models each time, dbt is able to build the dependency graph and establish them as parents (well, potential parents). Then, your macro can tell you which ones you actually need for a given customer. The database lookup feels like an unnecessary complexity, though that depends on how many potential customer-models variations you have and how often those mappings change. If the number of mappings is relatively stable and limited in number, I'd think about an approach like: {% macro customer_model_mapping() %}
{% set mappings = [
{"customer": "A", "model_types": ('model1', 'model2', 'model3')},
{"customer": "B", "model_types": ('model1', 'model2')},
{"customer": "C", "model_types": ('model2', 'model3')}
] %}
{% do return(mappings) %}
{% endmacro %}
{% macro get_models_for_customer(customer_id) %}
{% set customer_dict = customer_model_mapping()|selectattr("customer", customer_id) %}
{% set models_for_customer = customer_dict['model_types'] %}
{% do return(models_for_customer) %}
{% endmacro %} Whether you encode the variable into your macros, or pass it as an argument from your model, is totally up to you: {{
config(materialized='table')
}}
-- depends_on: {{ ref('stg_model1_performance') }}
-- depends_on: {{ ref('stg_model2_performance') }}
-- depends_on: {{ ref('stg_model3_performance') }}
-- depends_on: {{ ref('stg_model4_performance') }}
-- depends_on: {{ ref('stg_model5_performance') }}
{% set model_list = get_models_for_customer(var('customer_id')) %}
{% for model_name in model_list %}
{% set relation = `stg_` ~ model_name ~ 'performance' %}
select
*
from {{ ref(relation) }}
{% if not loop.last -%} union all {%- endif %}
{% endfor %}
{% endif %} I'm going to close this issue. It's a complex problem, and there are (a number of) complex approaches already possible today :) |
Hi folk. I'm still facing this challenge. I need to be able to dynamically generate a list of depends_on models as my use-case scales rather quickly. Does anyone have any updates on better ways to do this? |
Did you find a solution? I have a model where I union all models that start with
You can force dependencies, but I haven't found a way to do that dynamically as well @jtcohen6 Perhaps you know how to approach that? |
Unfortunately, it seems like it's not possible. I searched through the dbt Slack channels and every arrives on the same conclusion, that it's not yet possible. |
Thanks I did not solve it either, I used a workaround where I do not use the This makes sure that the model runs after the unioned models (at least for now and likely as well if new models are added) |
I found a way around this by adding this to the top of my file to dynamically generate the
compiled as:
It's not the prettiest but it works |
I have a similar issue with a dynamic dependency, but in a generic test that takes as a parameter a macro referring to other tables. This adds the complexity that {% test is_count_inferior_to_all_N_M_versions(model, all_versions_macro_name) %}
{% if execute %}
{% for model in graph.nodes.values() if '_all_versions_v' in model.name %}
{{ log("-- depends_on: {{ ref('" + model.name + "') }}") }}
{% set depends_on = "-- depends_on: {{ ref( '" + model.name + "' ) }}" %}
{{ depends_on }}
{% endfor %}
{% endif %}
with all_versions as (
select
1 as pk,
count(*) as all_versions_count
from (
-- We can't pass a macro directly in the YML so we get it from the context, which only exists during execution
{% if execute %}
{% set all_versions_macro = context.get(all_versions_macro_name) %}
{{ all_versions_macro() }}
{% else %}
select 1 -- default value for compile to succeed outside of execution
{% endif %}
) as a
where a.action_type in ('N', 'M')
),
last_version as (
select
1 as pk,
count(*) as last_version_count
from {{ model }}
)
select
all_versions_count,
last_version_count,
all_versions_count - last_version_count as diff
from last_version
inner join all_versions
on last_version.pk = all_versions.pk
where last_version_count > all_versions_count
{% endtest %} Sadly, it doesn't work. I still get the compilation error. I know the string generation is correct because if I simply copy-past what the lines printed by the log function and add them to my test definition, it works. But I don't want to do that because the concerned models will evolve in the future, and I don't want to have to change those comments manually every time (and probably forget about it at some point). The macro call also uses a context trick, but it works fine. |
We've run into this issue again and again over the last few years - dynamic dependency seems like such a useful feature, can we reopen this please? |
Describe the feature
I want to be able to dynamically union models depending on runtime variables. For example, we have Customers A, B, C. Customer A has models 1,2,3. Customer B has models 1,2. Customer C has models 2,3. I need to be able to dynamically union these models together, so when I call dbt run for Customer A I want to union models 1,2,3; Customer B union models 1,2; and Customer C union models 2,3. I am currently running into an issue where DBT is asking for a depends_on hint. It is important that I can dynamically change the dependencies on a model given variables. Open to any suggestions/alternatives!
dbt was unable to infer all dependencies for the model "combined_performance".
This typically happens when ref() is placed within a conditional block.
To fix this, add the following hint to the top of the model "combined_performance":
-- depends_on: {{ ref('stg_google_performance') }}
> in macro get_ad_relation (macros/get_customer_attributes.sql)
> called by model combined_performance (models/ads/combined_performance.sql)
> called by model combined_performance (models/ads/combined_performance.sql)
Example of dbt run:
dbt run --vars '{"Customer":"A"}'
dbt run --vars '{"Customer":"B"}'
dbt run --vars '{"Customer":"C"}'
Describe alternatives you've considered
None
Additional context
Using Snowflake.
Who will this benefit?
Anyone who is building a dbt model for many customers/variations.
Are you interested in contributing this feature?
Yes
The text was updated successfully, but these errors were encountered: