Skip to content

Commit

Permalink
feat(docs) Improves docs around developing datahub, removes deprecate…
Browse files Browse the repository at this point in the history
…d docs on building metadata service (#4552)
  • Loading branch information
pedro93 authored Apr 5, 2022
1 parent 179fe07 commit a20012f
Show file tree
Hide file tree
Showing 5 changed files with 62 additions and 196 deletions.
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,10 @@ DataHub is an open-source metadata platform for the modern data stack. Read abou

Please follow the [DataHub Quickstart Guide](https://datahubproject.io/docs/quickstart) to get a copy of DataHub up & running locally using [Docker](https://docker.com). As the guide assumes some basic knowledge of Docker, we'd recommend you to go through the "Hello World" example of [A Docker Tutorial for Beginners](https://docker-curriculum.com) if Docker is completely foreign to you.

## Development

If you're looking to build & modify datahub please take a look at our [Development Guide](https://datahubproject.io/docs/developers).

## Demo and Screenshots

There's a [hosted demo environment](https://datahubproject.io/docs/demo) where you can play around with DataHub before installing.
Expand Down
7 changes: 1 addition & 6 deletions docs/architecture/metadata-serving.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,12 +34,7 @@ To ensure that metadata changes are processed in the correct chronological order

### Metadata Query Serving

Primary-key based reads (e.g. getting schema metadata for a dataset based on the `dataset-urn`) on metadata are routed to the document store. Secondary index based reads on metadata are routed to the search index (or alternately can use the strongly consistent secondary index support described [here]()). Full-text and advanced search queries are routed to the search index. Complex graph queries such as lineage are routed to the graph index.

### Further Reading

Read the [metadata service developer guide](../how/build-metadata-service.md) to understand how to customize the DataHub metadata service tier.

Primary-key based reads (e.g. getting schema metadata for a dataset based on the `dataset-urn`) on metadata are routed to the document store. Secondary index based reads on metadata are routed to the search index (or alternately can use the strongly consistent secondary index support described [here]()). Full-text and advanced search queries are routed to the search index. Complex graph queries such as lineage are routed to the graph index.

[RecordTemplate]: https://github.com/linkedin/rest.li/blob/master/data/src/main/java/com/linkedin/data/template/RecordTemplate.java
[GenericRecord]: https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/generic/GenericRecord.java
Expand Down
56 changes: 56 additions & 0 deletions docs/developers.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,12 @@ title: "Local Development"

# DataHub Developer's Guide

## Pre-requirements
- [Java 1.8 SDK](https://adoptopenjdk.net/?variant=openjdk8&jvmVariant=hotspot)
- [Docker](https://www.docker.com/)
- [Docker Compose](https://docs.docker.com/compose/)
- Docker engine with at least 8GB of memory to run tests.

## Building the Project

Fork and clone the repository if haven't done so already
Expand All @@ -21,6 +27,56 @@ Use [gradle wrapper](https://docs.gradle.org/current/userguide/gradle_wrapper.ht
./gradlew build
```

Note that the above will also run run tests and a number of validations which makes the process considerably slower.

We suggest partially compiling DataHub according to your needs:

- Build Datahub's backend GMS (Generalized metadata service):
```
./gradlew :metadata-service:war:build
```
- Build Datahub's frontend:
```
./gradlew :datahub-frontend:build -x yarnTest -x yarnLint
```
- Build DataHub's command line tool:
```
./gradlew :metadata-ingestion:installDev
```
- Build DataHub's documentation:
```
./gradlew :docs-website:yarnLintFix :docs-website:build -x :metadata-ingestion:runPreFlightScript
# To preview the documentation
./gradlew :docs-website:serve
```

## Deploying local versions

Run just once to have the local `datahub` cli tool installed in your $PATH
```
cd smoke-test/
python3 -m venv venv
source venv/bin/activate
pip install --upgrade pip wheel setuptools
pip install -r requirements.txt
cd ../
```

Once you have compiled & packaged the project or appropriate module you can deploy the entire system via docker-compose by running:
```
datahub docker quickstart --build-locally
```

Replace whatever container you want in the existing deployment.
I.e, replacing datahub's backend (GMS):
```
(cd docker && COMPOSE_DOCKER_CLI_BUILD=1 DOCKER_BUILDKIT=1 docker-compose -p datahub -f docker-compose-without-neo4j.yml -f docker-compose-without-neo4j.override.yml -f docker-compose.dev.yml up -d --no-deps --force-recreate datahub-gms)
```

Running the local version of the frontend
```
(cd docker && COMPOSE_DOCKER_CLI_BUILD=1 DOCKER_BUILDKIT=1 docker-compose -p datahub -f docker-compose-without-neo4j.yml -f docker-compose-without-neo4j.override.yml -f docker-compose.dev.yml up -d --no-deps --force-recreate datahub-frontend-react)
```
## IDE Support
The recommended IDE for DataHub development is [IntelliJ IDEA](https://www.jetbrains.com/idea/).
You can run the following command to generate or update the IntelliJ project file
Expand Down
190 changes: 0 additions & 190 deletions docs/how/build-metadata-service.md

This file was deleted.

1 change: 1 addition & 0 deletions metadata-ingestion/sink_docs/datahub.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,7 @@ Note that a `.` is used to denote nested fields in the YAML recipe.
| `token` | | | Bearer token used for authentication. |
| `extra_headers` | | | Extra headers which will be added to the request. |
| `max_threads` | | `1` | Experimental: Max parallelism for REST API calls |
| `ca_certificate_path` | | | Path to CA certificate for HTTPS communications |

## DataHub Kafka

Expand Down

0 comments on commit a20012f

Please sign in to comment.