-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dbt doc blocks #1158
Comments
I like this idea as well. We just started using DBT and I have found documenting is a lot of copy and pasting and not very DRY. We have a series of models that have the same macro and this represents many metrics (aggregations) that are done for a given set of dimensions. Many tables share this macro and they aren't really inherited from one to the next. (Why I think perhaps extends will not work in this case.) The reason we don't inherit is because each model has a distinct count and you cannot re-count a distinct count. Thus they all go off a base table with all dimensions. Thus, we have a macro that re-uses a set of metrics and documenting this requires copying and pasting. If we add a metric to the macro, that is one change. But, this requires adding the description to each and every model. Even if we used a doc block, we'd still be adding to each model the line for the doc block. The above paradigm follows the macro idea better. If we can macro some SQL, we should be able to macro that same bit of documentation as well so it's only one change in one place. |
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days. |
Feature
Description
Many of the description and/or tests may be inherited from one model to another. I think doc blocks would like to solve this problem.
Who will this benefit?
This would be useful for anyone trying to write documentation.
Discussion
From slack here: https://getdbt.slack.com/archives/C2JRTGLLS/p1540411689000100
Me: Let’s talk about docs.
I have a model, for example
accounts_xf
that is based offaccounts
and the first row of theSELECT
statement isaccounts.*
, so every column that I document inaccounts
, I’d ideally like to see documented inaccounts_xf
.Right now, my approach is to have the column description value be a doc string that I can reuse (and I’ve been following the naming convention of
account_col_account_id
to indicate the original model and the column), but if I have 10-12 columns that I’m documenting that’s a lot of copy and pasting to maintain.Looker (kinda) solves this in a different place with the
set
value. Similarly, I think what I would like to do is create a column block in which I can say that these five (or however many) name/description combos belong anywhere I want them.Curious to hear about anyone else’s thoughts/solutions to this so far.
@drewbanin: good points @emilie! When we’ve thought about this in the past, we came up with the idea of
extends
, described here: #790a schema spec for a model could extend other models, and the column descriptions / tests for the models it extends would propagate through to the model itself
this is useful when a chain of models all build on top of their successors, but it doesn’t really solve for the case where a column from a predecessor has been renamed, for instance.
....
@drewbanin: i like your idea of a column block. Do you think that’s something that should be specified inside of a markdown file? Or, do you think you would be able to make a sort of schema spec that’s “abstract” (it doesn’t apply to any particular model), then extend it in specific models?
Me: The reason I don’t love the idea of
extends
is because I think it’s got a very limited use-case, while columns blocks could be much more flexible. For example, if I want just my salesforce account id and name, that could be a block that I want in lots of other data places (email, product, etc), but I might not care about any of the columns in accounts.I’d like to indent to indicate the block. So:
in a
salesforce.md
file…and then under my
schema.yml
file:@drewbanin: oh, sure! What if you could do this:
@drewbanin : ^ i think we’re getting at the same idea. What i do like about the
extends
approach is that tests can be carried over too. Maybe that’s undesirable though, and really the only thing worth repeating across models is the documentation itself?Me: I see where you’re going and I like it. I don’t love the idea of it having to be in a separate file.
....
Me: I can think of many places where tests being extended would not be desireable.
@drewbanin: it sort of gets to the question “what is a column” haha
in that an account_id may be unique in the accounts table, but probably won’t be in the contacts table
Drew Pierce: I like the idea of extending in the yml file but at the column level. We carry columns over to other models but not every column. We also rename them sometimes. For example:
Could maybe also support overrides, if I didn't want to carry the tests over I could do:
From this thread: https://getdbt.slack.com/archives/C0VLZPLAE/p1543423359259300
@mikekaminsky: is there a way to do test / yml inheritance. I have a
super-duper-customers
model that has all of the fields as thecustomers
model plus some extra super-duper ones. I’d like to be able to say that all of the docs / tests fromcustomers
should also be included insuper-duper-customers
. Is there a way to do this? (Not sure the right term to use to search the docs for this so apologies if I’m just missing it…)The text was updated successfully, but these errors were encountered: