Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs(ingest): add a guide for writing sources #2575

Merged
merged 4 commits into from
May 24, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion datahub-kubernetes/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
---
title: "Deploying with Kubernetes"
hide_title: true
---

# Deploying Datahub with Kubernetes
Expand Down
11 changes: 7 additions & 4 deletions docs-website/generateDocsDir.ts
Original file line number Diff line number Diff line change
Expand Up @@ -122,6 +122,7 @@ function markdown_guess_title(
filepath: string
): void {
if (contents.data.title) {
contents.data.sidebar_label = contents.data.title;
return;
}

Expand All @@ -138,13 +139,15 @@ function markdown_guess_title(
throw new Error(`too many h1 headers in ${filepath}`);
}
title = headers[0].slice(2).trim();
if (title.startsWith("DataHub ")) {
title = title.slice(8).trim();
}
}

contents.data.title = title;
contents.data.hide_title = true;

let sidebar_label = title;
if (sidebar_label.startsWith("DataHub ")) {
sidebar_label = sidebar_label.slice(8).trim();
}
contents.data.sidebar_label = sidebar_label;
}

function markdown_add_edit_url(
Expand Down
6 changes: 3 additions & 3 deletions docs-website/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,9 @@
"lint-check": "prettier -l generateDocsDir.ts sidebars.js src/pages/index.js"
},
"dependencies": {
"@docusaurus/core": "^2.0.0-alpha.75",
"@docusaurus/plugin-ideal-image": "^2.0.0-alpha.75",
"@docusaurus/preset-classic": "^2.0.0-alpha.75",
"@docusaurus/core": "^2.0.0-beta.0",
"@docusaurus/plugin-ideal-image": "^2.0.0-beta.0",
"@docusaurus/preset-classic": "^2.0.0-beta.0",
"clsx": "^1.1.1",
"react": "^16.12.0",
"react-dom": "^16.12.0",
Expand Down
2 changes: 1 addition & 1 deletion docs-website/sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ module.exports = {
// TODO: the titles of these should not be in question form in the sidebar
"docs/developers",
"docs/docker/development",
"metadata-ingestion/README",
"metadata-ingestion/adding-source",
"docs/what/graph",
"docs/what/search-index",
"docs/how/add-new-aspect",
Expand Down
280 changes: 140 additions & 140 deletions docs-website/yarn.lock

Large diffs are not rendered by default.

7 changes: 6 additions & 1 deletion docs/how/metadata-modelling.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,9 @@
# How to model metadata ?
---
title: "Metadata Modeling"
---

# How to model metadata?

[GMA](../what/gma.md) uses [rest.li](https://rest.li), which is LinkedIn's open source REST framework. All metadata in GMA needs to be modelled using [Pegasus schema (PDL)](https://linkedin.github.io/rest.li/pdl_schema) which is the data schema for [rest.li](https://rest.li).

Conceptually we’re modelling metadata as a hybrid graph of nodes ([entities](../what/entity.md)) and edges ([relationships](../what/relationship.md)), with additional documents ([metadata aspects](../what/aspect.md)) attached to each node. You can also think of it as a modified [Entity-Relationship Model](https://en.wikipedia.org/wiki/Entity%E2%80%93relationship_model).
Expand Down
2 changes: 1 addition & 1 deletion metadata-ingestion/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -814,4 +814,4 @@ In order to use this example, you must first configure the Datahub hook. Like in

## Developing

See the [developing guide](./developing.md).
See the [developing guide](./developing.md) or the [adding a source guide](./adding-source.md).
37 changes: 37 additions & 0 deletions metadata-ingestion/adding-source.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Adding a Metadata Ingestion Source

:::note

This guide assumes that you've already followed the metadata ingestion [developing guide](./developing.md) to set up your local environment.

:::

### 1. Set up the configuration model

We use [pydantic](https://pydantic-docs.helpmanual.io/) for configuration, and all models must inherit from `ConfigModel`. The [file source](./src/datahub/ingestion/source/mce_file.py) is a good example.

### 2. Set up the reporter

The reporter interface enables the source to report statistics, warnings, failures, and other information about the run. Some sources use the default `SourceReport` class, but others inherit and extend that class.

### 3. Implement the source itself

The core for the source is the `get_workunits` method, which produces a stream of MCE objects. The [file source](./src/datahub/ingestion/source/mce_file.py) is a good and simple example.

The MetadataChangeEventClass is defined in the [metadata models](./src/datahub/metadata/schema_classes.py). There are also some [convenience methods](./src/datahub/emitter/mce_builder.py) for commonly used operations.

### 4. Set up the dependencies

Declare the source's pip dependencies in the `plugins` variable of the [setup script](./setup.py).

### 5. Enable discoverability

Declare the source under the `entry_points` variable of the [setup script](./setup.py). This enables the source to be listed when running `datahub check plugins`, and sets up the source's shortened alias for use in recipes.

### 6. Write tests

Tests go in the `tests` directory. We use the [pytest framework](https://pytest.org/).

### 7. Write docs

Add the plugin to the table at the top of the README file, and add the source's documentation underneath the sources header.
2 changes: 2 additions & 0 deletions metadata-ingestion/developing.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,8 @@ The syntax for installing plugins is slightly different in development. For exam

Contributions welcome!

Also take a look at the guide to [adding a source](./adding-source.md).
hsheth2 marked this conversation as resolved.
Show resolved Hide resolved

### Testing

```shell
Expand Down