Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(dbt): add sibling association logic to associate dbt elements with their target systems #5190

Merged
merged 21 commits into from
Jun 22, 2022

Conversation

gabe-lyons
Copy link
Contributor

Introduces a new aspect, Siblings.pdl. Although this aspect is currently only being used to associate dbt and target systems, the idea is to use this aspect or pattern to do other types of associations in the future.

The aspect gets ingested via the SiblingAssociationHook class. This hook listens for events that could indicate a dbt<>target pair and emits a Siblings aspects to both siblings in the case that it finds a pair. It infers this relationship by inspecting UpstreamLineage and Subtypes aspects, because there are only a few cases in which a dbt and target pair will be siblings, and these cases can be determined by those two aspects alone. It also listens for DatasetKey aspects in an attempt to reconstruct the index should one of the siblings get deleted via delete API.

The aspect comes into play in Search, Entity pages, and the Lineage API. In Search and Entity pages, Datahub will merge the metadata of the two siblings to make them appear as one. In the Lineage API, lineage of you & your sibling will be merged, and then siblings in that result set will be deduped.

Here are screenshots of those three experiences:
image

image

image

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

@github-actions
Copy link

github-actions bot commented Jun 16, 2022

Unit Test Results (build & test)

389 tests  +4   389 ✔️ +10   10m 18s ⏱️ -22s
  92 suites +1       0 💤 ±  0 
  92 files   +1       0  -   6 

Results for commit 48fd97d. ± Comparison against base commit 0949613.

♻️ This comment has been updated with latest results.

@github-actions
Copy link

github-actions bot commented Jun 16, 2022

Unit Test Results (metadata ingestion)

       8 files  ±0         8 suites  ±0   1h 20m 1s ⏱️ -3s
   565 tests ±0     562 ✔️ ±0    3 💤 ±0  0 ±0 
1 068 runs  ±0  1 025 ✔️ ±0  43 💤 ±0  0 ±0 

Results for commit 48fd97d. ± Comparison against base commit 0949613.

♻️ This comment has been updated with latest results.

Copy link
Contributor

@shirshanka shirshanka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@shirshanka shirshanka merged commit baf3f3f into datahub-project:master Jun 22, 2022
alexey-kravtsov pushed a commit to infobip/datahub that referenced this pull request Jul 8, 2022
maggiehays pushed a commit to maggiehays/datahub that referenced this pull request Aug 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants