Feature: Dynamic Schemas / Columns #1374

jars · 2019-03-25T20:58:54Z

Feature: Dynamic Schemas / Columns

Problem:

unable to eliminate similar schema.yml's by abstracting their contents into a callable file/macro.
this issue originally raised in dbt Slack

To illustrate, here's a artificial example. Assume I have a models already called dog and cat.

Also assume that the SQL required to query any animal type is very complex and is worthy of needing a macro for DRY reasons. Then, we would have a macro like:

`macros/pet.sql`:

{% macro pet(type) %}
  SELECT * FROM {{ref(type)}} 
  WHERE im_actually_a_super_complex_macro_needing_sql_statement IS TRUE
{% endmacro %}

And we would express our models like:

`models/dog.sql`:

{{ pet("dog") }}

`models/cat.sql`:

{{ pet("cat")

And finally, our schemas like:

`models/dog.yml`:

version: 2
models:
  - name: dog
    columns:
        - name: name
          description: 'the name of the dog'
        - name: age
          description: 'the age of the dog'
        - name: breed
          description: 'the breed of the dog'

`models/cat.yml`:

version: 2
models:
  - name: cat
    columns:
        - name: name
          description: 'the name of the cat'
        - name: age
          description: 'the age of the cat'
        - name: breed
          description: 'the breed of the cat'

Again, this is a contrived example, but the problem is magnified in larger projects. i.e. Both dog.yml and cat.yml files are nearly identical except the animal type. Leading to WET code.

Some Possible Solutions:

Allow schema.yml files to be macroable:

`models/dog.yml`:

{{ schema_pet("dog") }}

`models/cat.yml`:

{{ schema_pet("cat") }}

`macros/schema_pet.yml`

{% macro schema_pet(type) %}
version: 2
models:
  - name: {{ type }}
    columns:
        - name: name
          description: 'the name of the {{ type }}'
        - name: age
          description: 'the age of the {{ type }}'
        - name: breed
          description: 'the breed of the {{ type }}'
{% endmacro %}

In this situation, the model may not know which schema belonged to it (barring file name parsing), so the models would also need to specify, I believe:

`models/dog.sql`:

{{ config(schema = 'dog') }}
{{ pet("dog") }}

`models/cat.sql`:

{{ config(schema = 'cat') }}
{{ pet("cat") }}

Actually, the majority of the duplication occurs in the columns section of the yml file. I believe introducing a dynamic models.name parameter inside the schema.yml will cause a lot of complications, so even if this was static, but the rest of the file could be dynamic, this would be great.

`macros/pet_columns.yml`

{% macro pet_columns(type) %}
- name: name
  description: 'the name of the {{ type }}'
- name: age
  description: 'the age of the {{ type }}'
- name: breed
  description: 'the breed of the {{ type }}'
{% endmacro %}

`models/dog.yml`

version: 2
models:
  - name: dog
    columns: {{ pet_columns('dog') }}

`models/cat.yml`

version: 2
models:
  - name: cat
    columns: {{ pet_columns('cat') }}

The doc macro would be great for this too, however it's not possible to provide args to the docs/doc macro.

Forgive me if some of my suggestions are incompatible with the current dbt structure. I am new, and not familiar with how the internals work. These are only suggestions, and I welcome the feedback from the dbt masters!

The text was updated successfully, but these errors were encountered:

drewbanin · 2019-03-26T14:40:41Z

Hey @jordanarseno - cool ideas!

I'm picturing something that looks a lot like these suggestions, but may be a more tractable given how dbt parses schema.yml files. Some options:

Provide a method to register model/column definitions

We can provide a method in the manifest that accepts a schema spec and incorporates it into the manifest. You could call this method from model code (or via a macro):

-- models/base/cat.sql

{% do add_pet_schema_columns('cat') %}

select ...

coupled with:

{% macro add_pet_schema_columns(type) %}

{% do manifest.set_column_schema(this, column='name', description='the name of the ' ~ type) %}
{% do manifest.set_column_schema(this, column='age', description='the age of the ' ~ type) %}
{% do manifest.set_column_schema(this, column='breed', description='the breed of the ' ~ type) %}

{% endmacro %}

We could also provide a higher-level method that configures many columns at once, eg:

{% macro add_pet_schema_columns(type) %}

{% do manifest.update_column_schemas(this,
  {
    "name": {"description": "...."},
    "age": {"description": "...."},
    "breed": {"description": "...."}
  }) %}

{% endmacro %}

Provide arguments to docs blocks

This is a really good idea, and something that I think we're going to tackle implicitly via #1334. While you'll still need to enumerate all of the columns, you'll be able to pull in descriptions from templates docs blocks for sure.

So, this is a solvable problem, and the question becomes: do we want to do this? I think a feature like this would be valuable, but it will also add a bunch of complexity to the parsing code in dbt. Further, it will become more difficult to understand where certain model attributes are being configured from. This would be problematic if we eg. wanted to link to a schema.yml spec from dbt docs. While some descriptions (or tests) may be set in schema.yml files, others might be contributed from macros! I think the lack of provenance for these specifications is something that we should think really hard about when considering to prioritize a change like this.

Let me know what you think!

drewbanin · 2019-11-26T01:59:33Z

I agree that schema.yml files can be verbose to create and information is often repeated across schema specs. I don't think we'll want to leverage jinja to solve this problem, but it is very much an issue worth solving.

Closing this in favor of a more actionable issue, but I'm certainly very happy to re-open it for discussion if anyone feels strongly about this :)

jars changed the title ~~Feature Request: Dynamic Schemas / Columns~~ Feature: Dynamic Schemas / Columns Mar 26, 2019

drewbanin added more_information_needed stale Issues that have gone stale labels Nov 26, 2019

drewbanin closed this as completed Nov 26, 2019

jtcohen6 mentioned this issue Jul 3, 2021

Make it possible for DBT to generate models from a piece of JINJA code #3469

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Dynamic Schemas / Columns #1374

Feature: Dynamic Schemas / Columns #1374

jars commented Mar 25, 2019 •

edited

Loading

drewbanin commented Mar 26, 2019 •

edited

Loading

drewbanin commented Nov 26, 2019

Feature: Dynamic Schemas / Columns #1374

Feature: Dynamic Schemas / Columns #1374

Comments

jars commented Mar 25, 2019 • edited Loading

Feature: Dynamic Schemas / Columns

macros/pet.sql:

models/dog.sql:

models/cat.sql:

models/dog.yml:

models/cat.yml:

Some Possible Solutions:

models/dog.yml:

models/cat.yml:

macros/schema_pet.yml

models/dog.sql:

models/cat.sql:

macros/pet_columns.yml

models/dog.yml

models/cat.yml

drewbanin commented Mar 26, 2019 • edited Loading

drewbanin commented Nov 26, 2019

jars commented Mar 25, 2019 •

edited

Loading

`macros/pet.sql`:

`models/dog.sql`:

`models/cat.sql`:

`models/dog.yml`:

`models/cat.yml`:

`models/dog.yml`:

`models/cat.yml`:

`macros/schema_pet.yml`

`models/dog.sql`:

`models/cat.sql`:

`macros/pet_columns.yml`

`models/dog.yml`

`models/cat.yml`

drewbanin commented Mar 26, 2019 •

edited

Loading