-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
new schema.yml syntax #790
Comments
the mechanism you're proposing to test (count of columns) doesn't feel right--seems like each individual column should be validated for existence if we're going to have this option at all. i also don't feel like this is something that must be prioritized for the initial release.
are we planning on implementing this in the near-term? i love the idea but am just worried that it adds near-term complexity.
i would propose not calling this |
@jthandy sure, for I spoke with @cmcarthur about the implementation for I feel the same way about |
👍 for description @drewbanin when we discussed this yesterday we came up with a very different looking schema.yml format, can you post the updated structure here (with |
👍 for description |
After speaking with @cmcarthur, we're going scope these schema definitions under a version: 2
models:
- name: events
description: "a description..."
columns:
- name: event_time
description: "def"
tests:
- primary_key
- unique
sources:
- name: snowplow
description: "Snowplow dataset"
tables:
- name: snowplow_event_2
description: An immutable log of events collected by Snowplow
sql_table_name: snowplow.web_page
columns:
- name: collector_tstamp
description: Timestamp for the event recorded by the collector See #814 for more information on the |
Support new schema.yml syntax (#790)
Related: #375
schema.yml files currently exist solely to specify schema tests for models. The schema.yml syntax should be extended to account for:
Bonus:
snowplow_sessions
<-snowplow_sessions_tmp
Proposed syntax:
options
strict
If
true
, the columns specified in thecolumns
section must match the actual columns in the model in the database. If there is a mismatch (either too many, or not enough columns), then an error will be raised. Iffalse
, then the check will not occur.extends:
If a model name is provided, then this model will "inherit" the schema from the parent model. This will entail copying over descriptions, column definitions, strictness, etc. This will be exceedingly useful for "chains" of models which share a similar schema, as duplicating the documentation would be both time consuming and error prone.
comments
Comments can either be long-form, unstructured Markdown, or, they can contain a
ref
to a documentation node. These documentation nodes will live in markdown files inside of markdown blocks, eg:This block will serve a few purposes:
This is a super natural use case for jinja. I can totally imagine writing macros to render tables, enforce docs guidelines, render links, etc etc etc.
Implementation
Each entry in the schema.yml files should be munged into the same JSON schema used for catalog entries. The two are very similar: they have comments, a list of columns, and those columns have names / types / etc. If we keep the data structures similar, then it should be easy to overlay the schema and catalog data on top of the manifest data for dbt docs purposes.
We should preserve backwards compatibility for schema tests either by 1) adding a version number header or 2) just continuing to parse the
constraints
section of the old schema.yml files.The text was updated successfully, but these errors were encountered: