Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs(ingest): clarify setuptools requirement #2177

Merged
merged 1 commit into from
Mar 5, 2021
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 11 additions & 10 deletions metadata-ingestion/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,13 +16,6 @@ Before running any metadata ingestion job, you should make sure that DataHub bac

<!-- You can run this ingestion framework by building from source or by running docker images. -->

### Migrating from the old scripts
If you were previously using the `mce_cli.py` tool to push metadata into DataHub: the new way for doing this is by creating a recipe with a file source pointing at your JSON file and a DataHub sink to push that metadata into DataHub.
This [example recipe](./examples/recipes/example_to_datahub_rest.yml) demonstrates how to ingest the [sample data](./examples/mce_files/bootstrap_mce.json) (previously called `bootstrap_mce.dat`) into DataHub over the REST API.
Note that we no longer use the `.dat` format, but instead use JSON. The main differences are that the JSON uses `null` instead of `None` and uses objects/dictionaries instead of tuples when representing unions.

If you were previously using one of the `sql-etl` scripts: the new way for doing this is by using the associated source. See [below](#Sources) for configuration details. Note that the source needs to be paired with a sink - likely `datahub-kafka` or `datahub-rest`, depending on your needs.

### Building from source:

#### Pre-Requisites
Expand All @@ -39,7 +32,7 @@ If you were previously using one of the `sql-etl` scripts: the new way for doing
```sh
python3 -m venv venv
source venv/bin/activate
pip install --upgrade pip wheel
pip install --upgrade pip wheel setuptools
pip install -e .
./scripts/codegen.sh
```
Expand All @@ -51,7 +44,7 @@ Common issues:

This means Python's `wheel` is not installed. Try running the following commands and then retry.
```sh
pip install --upgrade pip wheel
pip install --upgrade pip wheel setuptools
pip cache purge
```
</details>
Expand All @@ -68,7 +61,7 @@ Common issues:
The underlying `avro-python3` package is buggy. In particular, it often only installs correctly when installed from a pre-built "wheel" but not when from source. Try running the following commands and then retry.
```sh
pip uninstall avro-python3 # sanity check, ok if this fails
pip install --upgrade pip wheel
pip install --upgrade pip wheel setuptools
pip cache purge
pip install avro-python3
```
Expand Down Expand Up @@ -179,6 +172,7 @@ source:
config:
username: user
password: pass
host_port: localhost:1433
database: DemoDatabase
table_pattern:
allow:
Expand Down Expand Up @@ -344,6 +338,13 @@ sink:
filename: ./path/to/mce/file.json
```

## Migrating from the old scripts
If you were previously using the `mce_cli.py` tool to push metadata into DataHub: the new way for doing this is by creating a recipe with a file source pointing at your JSON file and a DataHub sink to push that metadata into DataHub.
This [example recipe](./examples/recipes/example_to_datahub_rest.yml) demonstrates how to ingest the [sample data](./examples/mce_files/bootstrap_mce.json) (previously called `bootstrap_mce.dat`) into DataHub over the REST API.
Note that we no longer use the `.dat` format, but instead use JSON. The main differences are that the JSON uses `null` instead of `None` and uses objects/dictionaries instead of tuples when representing unions.

If you were previously using one of the `sql-etl` scripts: the new way for doing this is by using the associated source. See [below](#Sources) for configuration details. Note that the source needs to be paired with a sink - likely `datahub-kafka` or `datahub-rest`, depending on your needs.

## Contributing

Contributions welcome!
Expand Down