Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(ingest/snowflake): missing view downstream cll if platform instance is set #8966

Conversation

mayurinehate
Copy link
Collaborator

  • commit 1 - add a test case to repro the behavior
  • commit 2 - update code and test case to fix this

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

@github-actions github-actions bot added the ingestion PR or Issue related to the ingestion of metadata label Oct 6, 2023
@mayurinehate mayurinehate changed the title Snowflake view downstream cll for platforminstance fix(ingest/snowflake): missing view downstream cll if platform instance is set Oct 6, 2023
upstream_column_info.table
).get_dataset_name()
if self.config.platform_instance and upstream_table_id.startswith(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this happen? Perhaps we should separate platform instance out in the sql parser instead? cc @hsheth2

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imo the ideal outcome would be for the sql parser to respect platform_instance if one is passed in to the schema resolver

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it already does that, but might be wrong

Copy link
Collaborator Author

@mayurinehate mayurinehate Oct 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the sql lineage returned has

  1. dataset urn - with platform_instance
  2. column to column lineage where each column is represented by (dataset urn, column_name)

What's missing here is that - we don't have raw dataset name (or fully qualified name) without platform instance in the upstream result.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hsheth2 @asikowitz any thoughts on how to proceed here ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, this bug in snowflake view lineage can also be solved by converting sql parsing result to fine grained lineage directly instead of raw snowflake data models again. Let me attempt that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. @hsheth2 please check now.

@maggiehays maggiehays added the hacktoberfest-accepted Acceptance for hacktoberfest https://hacktoberfest.com/participation/ label Oct 26, 2023
@mayurinehate mayurinehate force-pushed the snowflake_view_downstream_cll_for_platforminstance branch from 9dcd287 to dab2eed Compare October 26, 2023 07:15
@mayurinehate mayurinehate requested a review from hsheth2 October 26, 2023 07:17
@hsheth2 hsheth2 added the merge-pending-ci A PR that has passed review and should be merged once CI is green. label Oct 27, 2023
@hsheth2
Copy link
Collaborator

hsheth2 commented Oct 27, 2023

@mayurinehate looks like there's a conflict on this PR

@mayurinehate
Copy link
Collaborator Author

@mayurinehate looks like there's a conflict on this PR

Resolved.

@hsheth2 hsheth2 merged commit e02b909 into datahub-project:master Oct 27, 2023
53 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hacktoberfest-accepted Acceptance for hacktoberfest https://hacktoberfest.com/participation/ ingestion PR or Issue related to the ingestion of metadata merge-pending-ci A PR that has passed review and should be merged once CI is green.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants