Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(graph): support using elasticsearch as graph backend. #2726

Merged
merged 28 commits into from
Jun 22, 2021
Merged
Show file tree
Hide file tree
Changes from 24 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
7b9cbd4
elasticsearch as graph backend
gabe-lyons Jun 21, 2021
17335de
Merge remote-tracking branch 'private/ElasticGraphSquashed' into Elas…
gabe-lyons Jun 21, 2021
679baa2
fixes
gabe-lyons Jun 21, 2021
0a83f50
adding deprecated mapper back in
gabe-lyons Jun 21, 2021
23fb6ee
Merge remote-tracking branch 'private/ElasticGraphSquashed' into Elas…
gabe-lyons Jun 21, 2021
496727f
search request handler revert
gabe-lyons Jun 21, 2021
342d646
Merge remote-tracking branch 'private/ElasticGraphSquashed' into Elas…
gabe-lyons Jun 21, 2021
7670230
using delete by query
gabe-lyons Jun 21, 2021
c43f5f5
Merge remote-tracking branch 'private/ElasticGraphSquashed' into Elas…
gabe-lyons Jun 21, 2021
8a40ccb
cleanup
gabe-lyons Jun 21, 2021
1ff40d9
Merge remote-tracking branch 'private/ElasticGraphSquashed' into Elas…
gabe-lyons Jun 21, 2021
01edd76
adding new values.yaml
gabe-lyons Jun 21, 2021
db9f0ea
Merge remote-tracking branch 'private/ElasticGraphSquashed' into Elas…
gabe-lyons Jun 21, 2021
9df81ce
updating
gabe-lyons Jun 22, 2021
5d7b008
Merge remote-tracking branch 'private/ElasticGraphSquashed' into Elas…
gabe-lyons Jun 22, 2021
dd713ed
finalizing update
gabe-lyons Jun 22, 2021
255c06d
Merge remote-tracking branch 'private/ElasticGraphSquashed' into Elas…
gabe-lyons Jun 22, 2021
a9117e0
fix import
gabe-lyons Jun 22, 2021
640d902
Merge remote-tracking branch 'private/ElasticGraphSquashed' into Elas…
gabe-lyons Jun 22, 2021
8e305ee
fixing typo
gabe-lyons Jun 22, 2021
21bacdd
removing lines
gabe-lyons Jun 22, 2021
a037e19
Merge remote-tracking branch 'private/ElasticGraphSquashed' into Elas…
gabe-lyons Jun 22, 2021
dd55e9c
fixing depends on
gabe-lyons Jun 22, 2021
63b39da
Merge remote-tracking branch 'private/ElasticGraphSquashed' into Elas…
gabe-lyons Jun 22, 2021
0c85b5a
handling null response
gabe-lyons Jun 22, 2021
33ea5a8
Merge remote-tracking branch 'private/ElasticGraphSquashed' into Elas…
gabe-lyons Jun 22, 2021
d551c26
== null
gabe-lyons Jun 22, 2021
91960f0
Merge remote-tracking branch 'private/ElasticGraphSquashed' into Elas…
gabe-lyons Jun 22, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions datahub-kubernetes/datahub/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@ helm install datahub datahub/
| global.sql.datasource.username | string | `"root"` | SQL user name |
| global.sql.datasource.password.secretRef | string | `"mysql-secrets"` | Secret that contains the MySQL password |
| global.sql.datasource.password.secretKey | string | `"mysql-password"` | Secret key that contains the MySQL password |
| global.graph_service_impl | string | `neo4j` | One of `neo4j` or `elasticsearch`. Determines which backend to use for the GMS graph service. Elastic is recommended for a simplified deployment. Neo4j will be the default for now to maintain backwards compatibility.

## Optional Chart Values

Expand Down
1 change: 1 addition & 0 deletions datahub-kubernetes/datahub/charts/datahub-gms/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ Current chart version is `0.2.0`
| global.sql.datasource.username | string | `"datahub"` | |
| global.sql.datasource.password.secretRef | string | `"mysql-secrets"` | |
| global.sql.datasource.password.secretKey | string | `"mysql-password"` | |
| global.graph_service_impl | string | `neo4j` | One of `neo4j` or `elasticsearch`. Determines which backend to use for the GMS graph service. Elastic is recommended for a simplified deployment. Neo4j will be the default for now to maintain backwards compatibility.
| image.pullPolicy | string | `"IfNotPresent"` | |
| image.repository | string | `"linkedin/datahub-gms"` | |
| image.tag | string | `"v0.8.3"` | |
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,9 @@ spec:
name: "{{ .password.secretRef }}"
key: "{{ .password.secretKey }}"
{{- end }}
- name: GRAPH_SERVICE_IMPL
value: {{ .Values.global.graph_service_impl }}
{{- if eq .Values.global.graph_service_impl "neo4j" }}
- name: NEO4J_HOST
value: "{{ .Values.global.neo4j.host }}"
- name: NEO4J_URI
Expand All @@ -126,6 +129,7 @@ spec:
secretKeyRef:
name: "{{ .Values.global.neo4j.password.secretRef }}"
key: "{{ .Values.global.neo4j.password.secretKey }}"
{{- end }}
{{- if .Values.global.springKafkaConfigurationOverrides }}
{{- range $configName, $configValue := .Values.global.springKafkaConfigurationOverrides }}
- name: SPRING_KAFKA_PROPERTIES_{{ $configName | replace "." "_" | upper }}
Expand Down
3 changes: 2 additions & 1 deletion datahub-kubernetes/datahub/charts/datahub-gms/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -151,6 +151,7 @@ readinessProbe:
# helm install datahub-gms datahub-gms/
global:
datahub_analytics_enabled: true
graph_service_impl: neo4j

elasticsearch:
host: "elasticsearch"
Expand Down Expand Up @@ -191,4 +192,4 @@ global:
- "broker"
- "mysql"
- "elasticsearch"
- "neo4j"
- "neo4j"
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ Current chart version is `0.2.0`
| global.hostAliases[0].hostnames[2] | string | `"elasticsearch"` | |
| global.hostAliases[0].hostnames[3] | string | `"neo4j"` | |
| global.hostAliases[0].ip | string | `"192.168.0.104"` | |
| global.graph_service_impl | string | `neo4j` | One of `neo4j` or `elasticsearch`. Determines which backend to use for the GMS graph service. Elastic is recommended for a simplified deployment. Neo4j will be the default for now to maintain backwards compatibility.
| image.pullPolicy | string | `"IfNotPresent"` | |
| image.repository | string | `"linkedin/datahub-mae-consumer"` | |
| image.tag | string | `"v0.8.3"` | |
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,9 @@ spec:
name: "{{ .password.secretRef }}"
key: "{{ .password.secretKey }}"
{{- end }}
- name: GRAPH_SERVICE_IMPL
value: {{ .Values.global.graph_service_impl }}
{{- if eq .Values.global.graph_service_impl "neo4j" }}
- name: NEO4J_HOST
value: "{{ .Values.global.neo4j.host }}"
- name: NEO4J_URI
Expand All @@ -111,6 +114,7 @@ spec:
secretKeyRef:
name: "{{ .Values.global.neo4j.password.secretRef }}"
key: "{{ .Values.global.neo4j.password.secretKey }}"
{{- end }}
- name: DATAHUB_ANALYTICS_ENABLED
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to be added back outside the if statement

shirshanka marked this conversation as resolved.
Show resolved Hide resolved
value: "{{ .Values.global.datahub_analytics_enabled }}"
{{- if .Values.global.springKafkaConfigurationOverrides }}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -145,6 +145,7 @@ readinessProbe:
failureThreshold: 8

global:
graph_service_impl: neo4j
datahub_analytics_enabled: true

elasticsearch:
Expand Down Expand Up @@ -175,4 +176,4 @@ global:
- "broker"
- "mysql"
- "elasticsearch"
- "neo4j"
- "neo4j"
78 changes: 78 additions & 0 deletions datahub-kubernetes/datahub/quickstart-values-without-neo4j.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
# Values to start up datahub after starting up the datahub-prerequisites chart with "prerequisites" release name
Copy link
Collaborator

@jjoyce0510 jjoyce0510 Jun 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternate possible name:

datahub-kubernetes/datahub/quickstart-values-elastic-graph-service.yaml

# Copy this chart and change configuration as needed.
datahub-gms:
enabled: true
image:
repository: linkedin/datahub-gms
tag: "v0.8.1"

datahub-frontend:
enabled: true
image:
repository: linkedin/datahub-frontend-react
tag: "v0.8.1"
# Set up ingress to expose react front-end
ingress:
enabled: false

elasticsearchSetupJob:
enabled: true
image:
repository: linkedin/datahub-elasticsearch-setup
tag: "v0.8.1"

kafkaSetupJob:
enabled: true
image:
repository: linkedin/datahub-kafka-setup
tag: "v0.8.1"

mysqlSetupJob:
enabled: true
image:
repository: acryldata/datahub-mysql-setup
tag: "v0.8.1"

datahubUpgrade:
enabled: true
image:
repository: acryldata/datahub-upgrade
tag: "v0.8.1"

datahub-ingestion-cron:
enabled: false

global:
graph_service_impl: elasticsearch

elasticsearch:
host: "elasticsearch-master"
port: "9200"
indexPrefix: demo

kafka:
bootstrap:
server: "prerequisites-kafka:9092"
zookeeper:
server: "prerequisites-zookeeper:2181"
schemaregistry:
url: "http://prerequisites-cp-schema-registry:8081"

sql:
datasource:
host: "prerequisites-mysql:3306"
hostForMysqlClient: "prerequisites-mysql"
port: "3306"
url: "jdbc:mysql://prerequisites-mysql:3306/datahub?verifyServerCertificate=false&useSSL=true&useUnicode=yes&characterEncoding=UTF-8"
driver: "com.mysql.jdbc.Driver"
username: "root"
password:
secretRef: mysql-secrets
secretKey: mysql-root-password

datahub:
gms:
port: "8080"
mae_consumer:
port: "9091"
appVersion: "1.0"
2 changes: 2 additions & 0 deletions datahub-kubernetes/datahub/quickstart-values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,8 @@ datahub-ingestion-cron:
enabled: false

global:
graph_service_impl: neo4j

elasticsearch:
host: "elasticsearch-master"
port: "9200"
Expand Down
2 changes: 1 addition & 1 deletion datahub-kubernetes/datahub/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ datahubUpgrade:
tag: "v0.8.3"

global:

graph_service_impl: neo4j
datahub_analytics_enabled: true
datahub_standalone_consumers_enabled: false

Expand Down
39 changes: 39 additions & 0 deletions docker/datahub-gms/env/docker-without-neo4j.env
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
DATASET_ENABLE_SCSI=false
EBEAN_DATASOURCE_USERNAME=datahub
EBEAN_DATASOURCE_PASSWORD=datahub
EBEAN_DATASOURCE_HOST=mysql:3306
EBEAN_DATASOURCE_URL=jdbc:mysql://mysql:3306/datahub?verifyServerCertificate=false&useSSL=true&useUnicode=yes&characterEncoding=UTF-8
EBEAN_DATASOURCE_DRIVER=com.mysql.jdbc.Driver
KAFKA_BOOTSTRAP_SERVER=broker:29092
KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081
ELASTICSEARCH_HOST=elasticsearch
ELASTICSEARCH_PORT=9200
GRAPH_SERVICE_IMPL=elasticsearch

MAE_CONSUMER_ENABLED=true
MCE_CONSUMER_ENABLED=true

# Uncomment to disable persistence of client-side analytics events
# DATAHUB_ANALYTICS_ENABLED=false

# Uncomment to configure kafka topic names
# Make sure these names are consistent across the whole deployment
# METADATA_AUDIT_EVENT_NAME=MetadataAuditEvent_v4
# METADATA_CHANGE_EVENT_NAME=MetadataChangeEvent_v4
# FAILED_METADATA_CHANGE_EVENT_NAME=FailedMetadataChangeEvent_v4

# Uncomment and set these to support SSL connection to Elasticsearch
# ELASTICSEARCH_USE_SSL=true
# ELASTICSEARCH_SSL_PROTOCOL=TLSv1.2
# ELASTICSEARCH_SSL_SECURE_RANDOM_IMPL=
# ELASTICSEARCH_SSL_TRUSTSTORE_FILE=
# ELASTICSEARCH_SSL_TRUSTSTORE_TYPE=
# ELASTICSEARCH_SSL_TRUSTSTORE_PASSWORD=
# ELASTICSEARCH_SSL_KEYSTORE_FILE=
# ELASTICSEARCH_SSL_KEYSTORE_TYPE=
# ELASTICSEARCH_SSL_KEYSTORE_PASSWORD=

# To use simple username/password authentication to Elasticsearch over HTTPS
# set ELASTICSEARCH_USE_SSL=true and uncomment:
# ELASTICSEARCH_USERNAME=
# ELASTICSEARCH_PASSWORD=
1 change: 1 addition & 0 deletions docker/datahub-gms/env/docker.env
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ NEO4J_HOST=http://neo4j:7474
NEO4J_URI=bolt://neo4j
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=datahub
GRAPH_SERVICE_IMPL=neo4j

MAE_CONSUMER_ENABLED=true
MCE_CONSUMER_ENABLED=true
Expand Down
1 change: 0 additions & 1 deletion docker/datahub-gms/start.sh
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,6 @@ dockerize \
-wait tcp://$EBEAN_DATASOURCE_HOST \
-wait tcp://$(echo $KAFKA_BOOTSTRAP_SERVER | sed 's/,/ -wait tcp:\/\//g') \
-wait $ELASTICSEARCH_PROTOCOL://$ELASTICSEARCH_HOST_URL:$ELASTICSEARCH_PORT -wait-http-header "$ELASTICSEARCH_AUTH_HEADER" \
-wait $NEO4J_HOST \
-timeout 240s \
java $JAVA_OPTS $JMX_OPTS \
-jar /jetty-runner.jar \
Expand Down
33 changes: 33 additions & 0 deletions docker/docker-compose-without-neo4j.override.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
---
version: '3.8'
services:
mysql:
container_name: mysql
hostname: mysql
image: mysql:5.7
env_file: mysql/env/docker.env
command: --character-set-server=utf8mb4 --collation-server=utf8mb4_unicode_ci
ports:
- "3306:3306"
volumes:
- ./mysql/init.sql:/docker-entrypoint-initdb.d/init.sql
- mysqldata:/var/lib/mysql

mysql-setup:
build:
context: ../
dockerfile: docker/mysql-setup/Dockerfile
image: acryldata/datahub-mysql-setup:head
env_file: mysql-setup/env/docker.env
hostname: mysql-setup
container_name: mysql-setup
depends_on:
- mysql

datahub-gms:
env_file: datahub-gms/env/docker-without-neo4j.env
depends_on:
- mysql

volumes:
mysqldata:
Loading