Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dbt doc blocks #1158

Closed
emilieschario opened this issue Nov 28, 2018 · 2 comments
Closed

dbt doc blocks #1158

emilieschario opened this issue Nov 28, 2018 · 2 comments
Labels
stale Issues that have gone stale

Comments

@emilieschario
Copy link
Contributor

Feature

Description

Many of the description and/or tests may be inherited from one model to another. I think doc blocks would like to solve this problem.

Who will this benefit?

This would be useful for anyone trying to write documentation.

Discussion

From slack here: https://getdbt.slack.com/archives/C2JRTGLLS/p1540411689000100
Me: Let’s talk about docs.
I have a model, for example accounts_xf that is based off accounts and the first row of the SELECT statement is accounts.*, so every column that I document in accounts, I’d ideally like to see documented in accounts_xf.
Right now, my approach is to have the column description value be a doc string that I can reuse (and I’ve been following the naming convention of account_col_account_id to indicate the original model and the column), but if I have 10-12 columns that I’m documenting that’s a lot of copy and pasting to maintain.
Looker (kinda) solves this in a different place with the set value. Similarly, I think what I would like to do is create a column block in which I can say that these five (or however many) name/description combos belong anywhere I want them.
Curious to hear about anyone else’s thoughts/solutions to this so far.

@drewbanin: good points @emilie! When we’ve thought about this in the past, we came up with the idea of extends, described here: #790
a schema spec for a model could extend other models, and the column descriptions / tests for the models it extends would propagate through to the model itself
this is useful when a chain of models all build on top of their successors, but it doesn’t really solve for the case where a column from a predecessor has been renamed, for instance.
....

@drewbanin: i like your idea of a column block. Do you think that’s something that should be specified inside of a markdown file? Or, do you think you would be able to make a sort of schema spec that’s “abstract” (it doesn’t apply to any particular model), then extend it in specific models?

Me: The reason I don’t love the idea of extends is because I think it’s got a very limited use-case, while columns blocks could be much more flexible. For example, if I want just my salesforce account id and name, that could be a block that I want in lots of other data places (email, product, etc), but I might not care about any of the columns in accounts.
I’d like to indent to indicate the block. So:
in a salesforce.md file…

{% doc account_columns block %}
      {% doc account_col_account_id %}
        words
      {% enddocs %}
      {%doc account_col_account_id %}
        words
      {% enddocs %}
{% enddocsblock %}

and then under my schema.yml file:

version: 2

models:
  - name: events
    description: A table containing clickstream events from the marketing website

    columns:
       {% doc account_columns block %}
        - name: other column
          description: This is a unique identifier for the other column
          tests:
              - unique
              - not_null

@drewbanin: oh, sure! What if you could do this:

# models/salesforce/standard_columns.yml
version: 2
models:
  - name: standard_salesforce_column_spec
    columns:
      - name: account_id
        description: "this is the id for the account"
      - name: name
        description: "{{ doc('some_docs_block_here') }}"
# models/salesforce/account.sql

version: 2
models:
  - name: account
    extends: [standard_salesforce_column_spec]
    columns:
      - name: other_column
        description: "more docs"
      ...

@drewbanin : ^ i think we’re getting at the same idea. What i do like about the extends approach is that tests can be carried over too. Maybe that’s undesirable though, and really the only thing worth repeating across models is the documentation itself?
Me: I see where you’re going and I like it. I don’t love the idea of it having to be in a separate file.
....
Me: I can think of many places where tests being extended would not be desireable.

@drewbanin: it sort of gets to the question “what is a column” haha
in that an account_id may be unique in the accounts table, but probably won’t be in the contacts table

Drew Pierce: I like the idea of extending in the yml file but at the column level. We carry columns over to other models but not every column. We also rename them sometimes. For example:

# models/base/salesforce_opportunities.yml
version: 2
models:
  - name: salesforce_opportunities
    columns:
      - name: id
        description: "this is the Salesforce Id for the opportunity"
        tests:
          - unique
# models/analytics/contracts.yml
version: 2
models:
  - name: contracts
    columns:
      - name: opportunity_id
        extends: salesforce_opportunities.id

Could maybe also support overrides, if I didn't want to carry the tests over I could do:

# models/analytics/contracts.yml
version: 2
models:
  - name: contracts
    columns:
      - name: opportunity_id
        extends: salesforce_opportunities.id
        tests:

From this thread: https://getdbt.slack.com/archives/C0VLZPLAE/p1543423359259300
@mikekaminsky: is there a way to do test / yml inheritance. I have a super-duper-customers model that has all of the fields as the customers model plus some extra super-duper ones. I’d like to be able to say that all of the docs / tests from customers should also be included insuper-duper-customers. Is there a way to do this? (Not sure the right term to use to search the docs for this so apologies if I’m just missing it…)

@cchanyi
Copy link

cchanyi commented Jun 17, 2019

I like this idea as well. We just started using DBT and I have found documenting is a lot of copy and pasting and not very DRY. We have a series of models that have the same macro and this represents many metrics (aggregations) that are done for a given set of dimensions. Many tables share this macro and they aren't really inherited from one to the next. (Why I think perhaps extends will not work in this case.) The reason we don't inherit is because each model has a distinct count and you cannot re-count a distinct count. Thus they all go off a base table with all dimensions.

Thus, we have a macro that re-uses a set of metrics and documenting this requires copying and pasting. If we add a metric to the macro, that is one change. But, this requires adding the description to each and every model. Even if we used a doc block, we'd still be adding to each model the line for the doc block. The above paradigm follows the macro idea better. If we can macro some SQL, we should be able to macro that same bit of documentation as well so it's only one change in one place.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 2, 2022

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

@github-actions github-actions bot added the stale Issues that have gone stale label Jan 2, 2022
@github-actions github-actions bot closed this as completed Jan 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale Issues that have gone stale
Projects
None yet
Development

No branches or pull requests

2 participants