-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: Dynamic Schemas / Columns #1374
Comments
Hey @jordanarseno - cool ideas! I'm picturing something that looks a lot like these suggestions, but may be a more tractable given how dbt parses schema.yml files. Some options:
We can provide a method in the manifest that accepts a schema spec and incorporates it into the manifest. You could call this method from model code (or via a macro):
coupled with:
We could also provide a higher-level method that configures many columns at once, eg:
This is a really good idea, and something that I think we're going to tackle implicitly via #1334. While you'll still need to enumerate all of the columns, you'll be able to pull in descriptions from templates docs blocks for sure. So, this is a solvable problem, and the question becomes: do we want to do this? I think a feature like this would be valuable, but it will also add a bunch of complexity to the parsing code in dbt. Further, it will become more difficult to understand where certain model attributes are being configured from. This would be problematic if we eg. wanted to link to a schema.yml spec from dbt docs. While some descriptions (or tests) may be set in schema.yml files, others might be contributed from macros! I think the lack of provenance for these specifications is something that we should think really hard about when considering to prioritize a change like this. Let me know what you think! |
I agree that schema.yml files can be verbose to create and information is often repeated across schema specs. I don't think we'll want to leverage jinja to solve this problem, but it is very much an issue worth solving. Closing this in favor of a more actionable issue, but I'm certainly very happy to re-open it for discussion if anyone feels strongly about this :) |
Feature: Dynamic Schemas / Columns
Problem:
schema.yml
's by abstracting their contents into a callable file/macro.To illustrate, here's a artificial example. Assume I have a models already called
dog
andcat
.Also assume that the SQL required to query any animal type is very complex and is worthy of needing a macro for DRY reasons. Then, we would have a macro like:
macros/pet.sql
:And we would express our models like:
models/dog.sql
:{{ pet("dog") }}
models/cat.sql
:{{ pet("cat")
And finally, our schemas like:
models/dog.yml
:models/cat.yml
:Again, this is a contrived example, but the problem is magnified in larger projects. i.e. Both
dog.yml
andcat.yml
files are nearly identical except the animal type. Leading to WET code.Some Possible Solutions:
schema.yml
files to be macroable:models/dog.yml
:{{ schema_pet("dog") }}
models/cat.yml
:{{ schema_pet("cat") }}
macros/schema_pet.yml
In this situation, the model may not know which schema belonged to it (barring file name parsing), so the models would also need to specify, I believe:
models/dog.sql
:models/cat.sql
:models.name
parameter inside the schema.yml will cause a lot of complications, so even if this was static, but the rest of the file could be dynamic, this would be great.macros/pet_columns.yml
models/dog.yml
models/cat.yml
The
doc
macro would be great for this too, however it's not possible to provide args to the docs/doc macro.Forgive me if some of my suggestions are incompatible with the current dbt structure. I am new, and not familiar with how the internals work. These are only suggestions, and I welcome the feedback from the dbt masters!
The text was updated successfully, but these errors were encountered: