Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix 2592 neo4j connection options #2632

Merged
merged 2 commits into from
Jun 11, 2021
Merged

Fix 2592 neo4j connection options #2632

merged 2 commits into from
Jun 11, 2021

Conversation

RickardCardell
Copy link
Contributor

Makes it possible for a user to fix the issue described in #2592
This change might affect some users since a property got a new name.

I added another Neo4j connection property and changed the name of an existing property.
I changed the name of NEO4j_MAX_CONNECTION_LIFETIME_IN_HOURS since hours isn't granular enough and that seconds are used by all the other connection properties.

New: NEO4J_CONNECTION_LIVENESS_CHECK_TIMEOUT_IN_SECONDS
Changed:
old name: NEO4j_MAX_CONNECTION_LIFETIME_IN_HOURS
new name: NEO4j_MAX_CONNECTION_LIFETIME_IN_SECONDS

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable)

To prevent the neo4j driver getting a pooled connection that has closed.

Default value is -1, i.e disabled. Same as neo4j java-driver default value.
Set to 0 to always check that the pooled connection is open.

https://neo4j.com/docs/api/java-driver/current/org/neo4j/driver/Config.ConfigBuilder.html
Comment on lines 30 to 31
@Value("${NEO4j_MAX_CONNECTION_LIFETIME_IN_HOURS:1}")
@Value("${NEO4j_MAX_CONNECTION_LIFETIME_IN_SECONDS:3600}")
private Long neo4jMaxConnectionLifetime;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in order to not break anyone who may already be providing this value, should we accept both but default to using NEO4j_MAX_CONNECTION_LIFETIME_IN_SECONDS unless NEO4j_MAX_CONNECTION_LIFETIME_IN_HOURS is set elsewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can look into it. That approach would be more customer oriented.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done another attempt, and has now kept the old setting. It will take precedence of the new setting so that current users won't have any issues.

Comment on lines +36 to +41
@Value("${NEO4J_CONNECTION_LIVENESS_CHECK_TIMEOUT_IN_SECONDS:-1}")
private Long neo4jConnectionLivenessCheckTimeout;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it would benefit other folks to give this a reasonable default- I imagine if you're running into liveness issues others may be as well. What value have you found the be successful here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 has worked well under testing conditions, however it stresses Neo4j more. From the docs:

If this option is set too low, an additional network call will be incurred when acquiring a connection, which causes a performance hit.

So I thought, better to have it as the default in the java-driver, i.e disabled, and only tune it when needed.
Wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree 0 is a bit aggressive as a default- where you able to find success with a slightly higher timeout? Checking every 5 minutes, for example, seems like a reasonable default. If that doesn't work in your case, I agree -1 seems like a safer default for now until we can get more data points.

Copy link
Contributor Author

@RickardCardell RickardCardell Jun 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking every 5 minutes, for example, seems like a reasonable default.

Just to clarify, this is not a keep-alive feature that one would expect, but rather a feature that verifies that the connection is alive upon use.
So not as useful as NEO4j_MAX_CONNECTION_LIFETIME_IN_SECONDS perhaps, but it still adds value, when the connection to Neo4j is flaky or as an extra check to avoid exceptions in the GUI.

If this setting is used and the connection is closed, it will still generate an exception on the GMS, but the connection will be refreshed and the frontend won't notice anything, other than perhaps a delay.

So, yes, I would prefer if -1 is kept as default, since it's more of a power-user setting.

…o seconds (#2592)

To allow for setting values less than 1 hour. E.g Some load balancers in AWS has an idle timeout for only 350s before reseting the connections.
The default value hasn't changed (1hr to 3600s). Also, the old setting is kept for sake of backwards compatability.
Copy link
Contributor

@gabe-lyons gabe-lyons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm- thanks @RickardCardell

Copy link
Contributor

@shirshanka shirshanka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @RickardCardell!

@shirshanka shirshanka merged commit 2cb0618 into datahub-project:master Jun 11, 2021
wan54 pushed a commit to wan54/datahub that referenced this pull request Jul 15, 2021
* fix(react): update ispartofbuilderfromdataflow, update ui in datajob header (datahub-project#2619)

* fix(ingestion): improve robustness of glue ingestion source (datahub-project#2626)

fixes: datahub-project#2625

Co-authored-by: thomas.larsson <[email protected]>

* feat(react): custom properties are now sortable by name in the UI (datahub-project#2622)

* fix(ci): update trigger to always generate docker images (datahub-project#2636)

* fix(ingest): include urn as key for kafka emitter (datahub-project#2634)

* fix(react): url encoding urns and tag profile fix (datahub-project#2623)

* feat(react): replace user urn with username (datahub-project#2599)

Co-authored-by: shubham.garg <[email protected]>

* fix(ingest): improve redshift ingestion performance (datahub-project#2635)

* docs: update roadmap with accomplished items (datahub-project#2617)

* feat: No Code Metadata Modeling (datahub-project#2629)

Co-authored-by: Dexter Lee <[email protected]>
Co-authored-by: Gabe Lyons <[email protected]>
Co-authored-by: Shirshanka Das <[email protected]>

* docs(no-code): removing unused snapshot annotations (datahub-project#2642)

* feat(datahub-upgrade): Use the "head" image tag.  (datahub-project#2641)

* fix(datahub-upgrade): remove debug flag (datahub-project#2640)

* fix(build): don't purge ingestion codegen files on gradle clean (datahub-project#2645)

* fix(analytics): fix index names (datahub-project#2647)

* fix(nocode): Fix docker image tag for run_upgrade script (datahub-project#2648)

* docs(nocode): Update migration docs with workaround for stuck upgrade(datahub-project#2649)

* fix: use head tag by default in quickstart (datahub-project#2650)

* fix(datahub-upgrade): fixing mis-spelled schema registry url (datahub-project#2652)

* fix(gms): Return empty snapshots when one does not exist (datahub-project#2646)

* docs(nocode): fix links to datahub-upgrade (datahub-project#2651)

* feat(datahub-upgrade): improve no code upgrade logging (datahub-project#2653)

* fix(docker): pin containers to golden hash for release (datahub-project#2654)

* feat(nocode): Add datahub-upgrade job to helm chart and set version to v0.8.1 (datahub-project#2643)

* docs(nocode): Adding documentation for no-migration upgrade option (datahub-project#2656)

* revert: "fix(docker): pin containers to golden hash for release (datahub-project#2654)" (datahub-project#2659)

This reverts commit a483933 and moves
us back to using HEAD in quickstart on the master branch.

* fix(docker): use debug tag in local dev images (datahub-project#2658)

* fix(ingest): support mssql encryption via ODBC (datahub-project#2657)

* fix(NoCode): Update snapshot json to latest (datahub-project#2655)

* fix(ingest): exclude mssql-odbc from "all" extra (datahub-project#2660)

* docs(nocode): adding documentation to handle backwards incompatibility issues (datahub-project#2662)

* docs(nocode): fix typo in autocomplete annotation (datahub-project#2664)

* build(docker): test docker builds in pull request CI (datahub-project#2661)

* fix(ingest): fix MyPy stubs (datahub-project#2666)

* feat(ingest): Feast ingestion integration (datahub-project#2605)

* Add feast testing setup

* Init Feast test script

* Add feast to dependencies

* Update feast descriptors

* Sort integrations

* Working feast pytest

* Clean up feast docker-compose file

* Expand Feast tests

* Setup feast classes

* Add continuous and bytes data to feature types

* Update field type mapping

* Add PDLs

* Add MLFeatureSetUrn.java

* Comment out feast setup

* Add snapshot file and update inits

* Init Feast golden files generation

* Clean up Feast ingest

* Feast testing comments

* Yield Feature snapshots

* Fix Feature URN naming

* Update feast MCE

* Update Feature URN prefix

* Add MLEntity

* Update golden files with entities

* Specify feast sources

* Add feast source configs

* Working feast docker ingestion

* List entities and features before adding tables

* Add featureset names

* Remove unused

* Rename feast image

* Update README

* Add env to feast URNs

* Fix URN naming

* Remove redundant URN names

* Fix enum backcompatibility

* Move feast testing to docker

* Move URN generators to mce_builder

* Add source for features

* Switch TypeClass -> enum_type

* Rename source -> sourceDataset

* Add local Feast ingest image builds

* Rename Entity -> MLPrimaryKey

* Restore features and keys for each featureset

* Do not json encode source configs

* Remove old source properties from feature sets

* Regenerate golden file

* Fix race condition with Feast tests

* Exclude unknown source

* Update feature datatype enum

* Update README and fix typos

* Fix Entity typo

* Fix path to local docker image

* Specify feast config and version

* Fix feast env variables

* PR fixes

* Refactor feast ingest constants

* Make feature sources optional for back-compatibility

* Remove unused GCP files

* adding docker publish workflow

* Simplify name+namespace in PrimaryKeys

* adding docker publish workflow

* debug

* final attempt

* final final attempt

* final final final commit

* Switch to published ingestion image

* Update name and namespace in java files

* Rename FeatureSet -> FeatureTable

* Regenerate codegen

* Fix initial generation errors

* Update snapshot jsons

* Regenerated schemas

* Fix URN formats

* Revise builds

* Clean up feast URN builders

* Fix naming typos

* Fix Feature Set -> Feature Table

* Fix comments

* PR fixes

* All you need is Urn

* Regenerate snapshots and update validation

* Add UNKNOWN data type

* URNs for source types

* Add note on docker requirement

* Fix typo

* Reorder aspect unions

* Refactor feast ingest functions

* Update snapshot jsons

* Rebuild

Co-authored-by: Shirshanka Das <[email protected]>

* fix(no-code): Adding Chart input relationship annotations (datahub-project#2669)

* chart input relationship pdl fix

* commiting schema.avsc changes

* Add get_identifier to hive source in metadata ingestion (datahub-project#2667)

* feat(k8s): Add imagePullSecrets to all K8's jobs (datahub-project#2671)

* fix(analytics): Support sasl authentication to kafka (datahub-project#2675)

* feat(ingest): headers for codegen Python scripts (datahub-project#2637)

* fix(ingest): pin to new mypy version (datahub-project#2670)

* Fix 2592 neo4j connection options (datahub-project#2632)

* fix(ingest): upgrade acryl-pyhive to use sasl3 instead of sasl (datahub-project#2684)

* feat(ingest): support Oracle service names (datahub-project#2676)

* feat(sql_views): added views as datasets for SQLAlchemy DBs (datahub-project#2663)

* removing whitespace from service aspect (datahub-project#2688)

* Only work with dbt catalog data if load_catalog is False (datahub-project#2686)

* feat(docs): Docs for S3 ingestion with AWS Glue (datahub-project#2672)

* feat(datahub cli): DataHub CLI Quickstart  (datahub-project#2689)

* feat(gms): Merge MAE, MCE consumers into GMS (datahub-project#2690)

* fix(NoCode): Fix product analytics (datahub-project#2697)

* feat(gms):  support basic auth header when connecting to elasticSearch (datahub-project#2701)

* Add support for different oidc client authentication methods (datahub-project#2691)

* feat(aspects): support fetching of versioned aspects (datahub-project#2677)

* feat(entities): add markdown description update/viewer feature in dataset, datajob, dataflow, chart and dashboard, update ui/ux (datahub-project#2707)

* feat(ingest): Add test case and docs for SQL view ingestion (datahub-project#2709)

* fix(ingest): use `looker` data platform (datahub-project#2708)

* test(ingest): simplify docker cleanup commands (datahub-project#2699)

* feat: Adding annotation option to frontend service (datahub-project#2674)

* feat(ingest): expose additional types to Python via codegen (datahub-project#2712)

* Removed key not in catalog filter from dbt and added node filter using AllowDenyPattern (datahub-project#2702)

* added node_type_pattern in dbt yaml file (datahub-project#2705)

* fix(editable descriptions): adding indexing for editable descriptions (datahub-project#2710)

* feat(quickstart): use versioned docker images when on a release tag (datahub-project#2715)

* fix(noCode): Improving efficiency of EntityService "listLatestAspects" API  (datahub-project#2711)

* fix(react): update schema description edit behavior (datahub-project#2718)

* Update version for helm (datahub-project#2719)

* fixing docs sidebar (datahub-project#2721)

* fix(docs): update extending-the-metadata-model.md (datahub-project#2727)

* fix(docker): update default tags to head (datahub-project#2730)

* fix(looker): fix invalid URN syntax error (datahub-project#2737)

* fix(ci): increase Feast docker setup timeout (datahub-project#2722)

* fix(ingest): types for dbt (datahub-project#2716)

* feat(ingest): add support for Glue ETL jobs (datahub-project#2687)

* fix(ingest): fix lookml platform URN (datahub-project#2742)

* fix(ingest): update lookml test (datahub-project#2741)

* fix(gms): fixes for version aspect fetching (datahub-project#2723)

* feat(graph): support using elasticsearch as graph backend. (datahub-project#2726)

* feat(ingest): print docker logs on timeout (datahub-project#2743)

* fix(ci): increase wait-for-it timeout to fix flaky feast test (datahub-project#2747)

* feat(docker): reduce quickstart footprint (datahub-project#2744)

* feat(quickstart): remove orphaned docker containers on quickstart through cli (datahub-project#2748)

* fix(gms): add rest.li validation in gms (datahub-project#2745)

* fix(datahub-upgrade): add support for postgres migration (datahub-project#2740)

* fix(docker): modernize docker images and fix vulnerabilities (datahub-project#2746)

* feat(ingest): ingest last-modified from dbt sources.json (datahub-project#2729)

* feat(ingest): add option to specify source platform database in lookml ingestion (datahub-project#2749)

* fix(gms): make get return entity type (datahub-project#2739)

* fix(elastic-as-graph): adding elasticsearch setup back in (datahub-project#2752)

* fix(browse): sort by doc count descending (datahub-project#2751)

* Revert "fix(gms): add rest.li validation in gms (datahub-project#2745)" (datahub-project#2754)

This reverts commit f0b4adc.

* fix(docker): use head tag for datahub-ingestion (datahub-project#2760)

* feat(elastic-as-graph): defaulting to elastic in quickstart (datahub-project#2753)

* fix(react): reverse result of topological sort, autofocus in add tag modal (datahub-project#2759)

* feat: usage stats (part 1) (datahub-project#2750)

Co-authored-by: Gabe Lyons <[email protected]>

* feat: usage stats (part 2) (datahub-project#2762)

Co-authored-by: Gabe Lyons <[email protected]>

* docs: update for Jun townhall (datahub-project#2764)

* fix(frontend): auth session ttl specified in hours instead of days. M… (datahub-project#2695)

Fixes: datahub-project#2680

Co-authored-by: thomas.larsson <[email protected]>

* fix(react): move percent sign after number and update meta tag (datahub-project#2765)

* fix(docker): Fix dependency vulnerability (datahub-project#2763)

* fix(datahub-upgrade): removing the CleanupStep from datahub upgrade (datahub-project#2728)

* docs(ingest): move usage stats docs into the "sources" section (datahub-project#2766)

* fix(react): update the query frequency text label (datahub-project#2767)

* feat(k8s): add GCP deploy recipe (datahub-project#2768)

* fix(react): fix graphql apollo cache update issue cause of usagestats (datahub-project#2769)

* feat(logs): improve logging in GMS and datahub-frontend (datahub-project#2761)

* fix(nocode): removing service PDL (datahub-project#2772)

* fix(react): update platform text in dataset profile header (datahub-project#2771)

* feat(logs): add thresholding, misc cleanup (datahub-project#2773)

* docs: upgrade docusaurus, minor ingestion updates (datahub-project#2774)

* feat(ingest): add non-random sampling for mongo (datahub-project#2778)

* feat(react-graphql-mock): mock graphql queries and mutations with miragejs

* feat(react): configure Cypress + MirageJS + GraphQL mock for functional testing plus a couple of example tests

* fix(react): fix mutation helper for updateentitylink

* feat(react): update graphql issues with mock data

* fix(docker): Upgrade to 4.1.44 netty-all library (datahub-project#2784)

* refactor(ingest): use common get_sys_time method (datahub-project#2782)

* fix(docs): links to Feast entities (datahub-project#2780)

* feat(k8s): Upgrade to v0.8.4 (datahub-project#2792)

* fix(frontend): making nested SSL configs optional (datahub-project#2785)

* fix(ingest): use correct platform for MongoDB ingestion (datahub-project#2783)

* Update quickstart to include system prune (datahub-project#2794)

* docs(website): add releases page (datahub-project#2776)

* refactor(ingest): remove deprecated methods and warn on deprecated import (datahub-project#2797)

* fix(usage-stats): add indices target for usage stats query (datahub-project#2798)

* fix(ingest): handle case when view definition handler is not implemented (datahub-project#2796)

* fix(ingest): quote table names in hive (datahub-project#2801)

* fix(datahub-upgrade): add runtime dependency on logbackClassic (datahub-project#2799)

* fix(search): Filter out "removed" entities from autocomplete and analytics (datahub-project#2781)

* feat(ingest): Improve lookml sql derived tables detection, add cascading derived tables to lineage (datahub-project#2770)

* feat(ingest): SageMaker feature store ingestion (datahub-project#2758)

* fix(ingest): better warnings and error handling for rest sink (datahub-project#2800)

* fix(usage-stats): Add usage stats factory to mae processor config (datahub-project#2803)

* refactor(ingest): extract sqlalchemy uri generation logic (datahub-project#2786)

* fix(quickstart): Fixing manual mysql quickstart (datahub-project#2811)

* feat(ingest): support ingesting from multiple snowflake dbs (datahub-project#2793)

* feat(backup): Add restore indices and restore backup tasks (datahub-project#2779)

* docs(website): hide outdated FAQs page (datahub-project#2809)

* feat(ingest): refactor mce comparison and add pytest update golden files option (datahub-project#2812)

* fix(k8s): upgrade helm version (datahub-project#2810)

* feat(ingest): basic support for complex hive types (datahub-project#2804)

* fix(datahub-upgrade): fix vulnerabilities (datahub-project#2813)

* chore(helm): upgrade version to v0.8.5 (datahub-project#2814)

* feat(datahub-frontend): Adding basic file-based authentication to datahub-frontend (datahub-project#2818)

* fix(ingest): view handling resilience for redshift (datahub-project#2816)

* docs(ingest): add extra info for Redshift behind a proxy (datahub-project#2817)

* fix(docs): update OIDC docs  (datahub-project#2823)

* docs(docker): fix check command reference (datahub-project#2822)

Fixes datahub-project#2820.

* docs(elastic-for-graph): Add migrating from neo4j to elastic instructions (datahub-project#2826)

* fix(ingest): convert superset timestamps to micros (datahub-project#2827)

* Previous values were in seconds, which rendered incorrectly in the UI.

* feat(ci): separate metadata-ingestion into a separate workflow (datahub-project#2828)

* fix(cli): change docker nuke to also remove stopped containers (datahub-project#2825)

* fix(mae-consumer): support standalone mae consumer without neo4j (datahub-project#2824)

* docs: update H2 2021 roadmap (datahub-project#2829)

* docs(ingest): update links to Kafka docs (datahub-project#2834)

* fix(ingest): mask password in info-level logs (datahub-project#2835)

* fix(ingest): various BigQuery source fixes (datahub-project#2836)

* docs(ingest): clarify that the Kafka options are pass-through (datahub-project#2837)

* fix(ingest): do not fail dbt ingestion when encountering missing nodes  (datahub-project#2833)

* feat (helm datahub-gms): add ingress template to datahub-gms helm chart (datahub-project#2832)

Co-authored-by: christopher.coulson <[email protected]>

* feat(react): fix apollo client cache issues in entities profile pages (datahub-project#2841)

* fix(ingest): avoid setting timestamps unless source system provides it (datahub-project#2843)

* fix(docs): fix some broken links in the docs (datahub-project#2846)

* fix(ingest): Fix glob pattern and handle possible recursion in lookml (datahub-project#2851)

* feat(ingest): prettify stack traces in CLI (datahub-project#2845)

* fix(analytics): Fix SSL issue with analytics on frontend (datahub-project#2840)

* fix(ingest): check for dbt materialization before proceeding (datahub-project#2842)

* feat(docs): throttle and retry requests in doc generation (datahub-project#2847)

* feat(ingest): SageMaker jobs and models (datahub-project#2830)

* docs(ingest): remove hanging sentence from docs (datahub-project#2853)

* fix(ingest): delete pycache files when running clean (datahub-project#2852)

* fix(ingest): handle 'fields' list missing in bigquery-usage (datahub-project#2844)

* feat(k8s): Extract helm charts into a separate repo (datahub-project#2848)

* fix(react): fix searchbar width issue in small screen (datahub-project#2856)

* feat(react): implement visualizing historical versions of dataset schema (datahub-project#2855)

* feat(docs): carousels for videos and articles (datahub-project#2857)

* fix(frontend): handling null aspects in AspectType (datahub-project#2860)

* fix: fixing lint issue (datahub-project#2861)

* feat(ingest): support dynamic imports for transfomer methods (datahub-project#2858)

* feat(docs): swap Medium and videos sections (datahub-project#2859)

* feat(ingest): add browse paths + dataplatform for Feast features (datahub-project#2849)

* fix(build): increase retries for dependency fetches (datahub-project#2867)

* build(ingest): separate metadata-ingestion build workflow fully (datahub-project#2862)

* fix(quickstart): update compose spec version (datahub-project#2873)

Also update the defaults to point at the head tag instead of latest.

* docs(quickstart): add default password to quickstart (datahub-project#2875)

* feat(ingest): extract dbt meta fields (datahub-project#2876)

* feat(ingest): update golden files only when diff fails (datahub-project#2869)

* build(ingestion): add version prompt to release script (datahub-project#2866)

* fix(react): fix bug in description update modal (datahub-project#2874)

* fix(react): update dataset graphql mock

Co-authored-by: Thomas Larsson <[email protected]>
Co-authored-by: thomas.larsson <[email protected]>
Co-authored-by: Rickard Cardell <[email protected]>
Co-authored-by: Dexter Lee <[email protected]>
Co-authored-by: Harshal Sheth <[email protected]>
Co-authored-by: Gabe Lyons <[email protected]>
Co-authored-by: shubham garg <[email protected]>
Co-authored-by: shubham.garg <[email protected]>
Co-authored-by: Shirshanka Das <[email protected]>
Co-authored-by: John Joyce <[email protected]>
Co-authored-by: Kevin Hu <[email protected]>
Co-authored-by: zack3241 <[email protected]>
Co-authored-by: Vincenzo Lavorini <[email protected]>
Co-authored-by: Remi <[email protected]>
Co-authored-by: JeffSkinner <[email protected]>
Co-authored-by: Peter Mortier <[email protected]>
Co-authored-by: kuntalkumarbasu <[email protected]>
Co-authored-by: vijayan-nallasami-curve <[email protected]>
Co-authored-by: wan <[email protected]>
Co-authored-by: datascienceChris <[email protected]>
Co-authored-by: christopher.coulson <[email protected]>
Co-authored-by: Fredrik Sannholm <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants