-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(docs) add RFC file to introduce Notebook entity data model #4237
Conversation
Rendered : .md |
DataDoc could be viewed as a subset of Notebook. We could model Notebook instead and make DataDoc a subtype of Notebook
Notebook. DataDoc would be viewed as a subset of Notebook. Therefore we are going to model Notebook rather than DataDoc. | ||
We will include "subTypes" aspect to differentiate Notebook and DataDoc | ||
|
||
### Notebook Data Model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing that seems to be missing is lineage information. Though storing information about notebook contain may help with discovery among datadocs, I think we need some level of lineage between docs and datasets. Regarding this 1) Is this something we can fetch from QueryBooks or Jupyter notebooks? 2) How would we store information? Ideally, think it should be per cell (specifically query cell and chart cell), but up for debate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense to include lineage information, like linking the notebook to some datasets.
For querybook and jupyter notebooks, I feel we could only extract the lineage information from the query.
Regarding it's per cell or per notebook, I feel it fits the reality better to make it per cell. But we model the cell as one aspect of a notebook entity. I think it will be better to make it per data entity which is per notebook entity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great as the first step. Let's iterate on it as we build more support
…project#4237) * add RFC file to introduce DataDoc entity * add PR link * Model Notebook instead of DataDoc DataDoc could be viewed as a subset of Notebook. We could model Notebook instead and make DataDoc a subtype of Notebook * update picture file name * Put rfc number and resolve pr comments Co-authored-by: Xu Wang <[email protected]>
Checklist