-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(ingest/lookml): ingest field tags #10792
feat(ingest/lookml): ingest field tags #10792
Conversation
metadata-ingestion/tests/integration/looker/looker_mces_golden_deleted_stateful.json
Show resolved
Hide resolved
WalkthroughThe changes primarily involve enhancing the Looker integration by adding tag processing capabilities to the Changes
Poem
Tip Early access features: enabledWe are currently testing the following features in early access:
Note:
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Outside diff range and nitpick comments (3)
metadata-ingestion/src/datahub/ingestion/source/looker/looker_common.py (1)
Line range hint
1209-1209
: Consider monitoring the memory usage of cached data.The use of
functools.lru_cache
onget_explore
could potentially lead to memory leaks if not managed correctly. Consider implementing monitoring to track the size of the cache and its impact on memory, especially if the cache size grows significantly.metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py (2)
Line range hint
491-498
: Refactor nestedif
statements for clarity.Consider combining the nested
if
statements into a singleif
statement to improve code readability and maintainability.- if isinstance(conn_map[key], str): - platform = conn_map[key] - if "." in platform: - platform_db_split = conn_map[key].split(".") - connection = LookerConnectionDefinition( - platform=platform_db_split[0], - default_db=platform_db_split[1], - default_schema="", - ) - conn_map[key] = connection + if isinstance(conn_map[key], str) and "." in conn_map[key]: + platform_db_split = conn_map[key].split(".") + conn_map[key] = LookerConnectionDefinition( + platform=platform_db_split[0], + default_db=platform_db_split[1], + default_schema="", + )
Line range hint
1531-1533
: Improve exception handling for clarity and debugging.Utilize
raise ... from err
to provide better context in exception handling, distinguishing original exceptions from those raised during handling.- except SDKError: - raise ValueError( - "Failed to retrieve connections from looker client. Please check to ensure that you have manage_models permission enabled on this API key." - ) + except SDKError as e: + raise ValueError( + "Failed to retrieve connections from looker client. Please check to ensure that you have manage_models permission enabled on this API key." + ) from eAlso applies to: 1847-1849
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (23)
- metadata-ingestion/src/datahub/ingestion/source/looker/looker_common.py (3 hunks)
- metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py (1 hunks)
- metadata-ingestion/tests/integration/looker/golden_looker_mces.json (4 hunks)
- metadata-ingestion/tests/integration/looker/golden_test_allow_ingest.json (2 hunks)
- metadata-ingestion/tests/integration/looker/golden_test_external_project_view_mces.json (2 hunks)
- metadata-ingestion/tests/integration/looker/golden_test_file_path_ingest.json (2 hunks)
- metadata-ingestion/tests/integration/looker/golden_test_independent_look_ingest.json (2 hunks)
- metadata-ingestion/tests/integration/looker/golden_test_ingest.json (2 hunks)
- metadata-ingestion/tests/integration/looker/golden_test_ingest_joins.json (2 hunks)
- metadata-ingestion/tests/integration/looker/golden_test_ingest_unaliased_joins.json (2 hunks)
- metadata-ingestion/tests/integration/looker/looker_mces_golden_deleted_stateful.json (6 hunks)
- metadata-ingestion/tests/integration/looker/looker_mces_usage_history.json (2 hunks)
- metadata-ingestion/tests/integration/lookml/duplicate_field_ingestion_golden.json (3 hunks)
- metadata-ingestion/tests/integration/lookml/expected_output.json (3 hunks)
- metadata-ingestion/tests/integration/lookml/field_tag_ingestion_golden.json (1 hunks)
- metadata-ingestion/tests/integration/lookml/lkml_samples_duplicate_field/dataset_lineages.view.lkml (1 hunks)
- metadata-ingestion/tests/integration/lookml/lookml_mces_api_bigquery.json (3 hunks)
- metadata-ingestion/tests/integration/lookml/lookml_mces_api_hive2.json (3 hunks)
- metadata-ingestion/tests/integration/lookml/lookml_mces_offline.json (3 hunks)
- metadata-ingestion/tests/integration/lookml/lookml_mces_offline_platform_instance.json (3 hunks)
- metadata-ingestion/tests/integration/lookml/lookml_mces_with_external_urls.json (3 hunks)
- metadata-ingestion/tests/integration/lookml/refinements_ingestion_golden.json (3 hunks)
- metadata-ingestion/tests/integration/lookml/test_lookml.py (2 hunks)
Files skipped from review due to trivial changes (16)
- metadata-ingestion/tests/integration/looker/golden_looker_mces.json
- metadata-ingestion/tests/integration/looker/golden_test_allow_ingest.json
- metadata-ingestion/tests/integration/looker/golden_test_external_project_view_mces.json
- metadata-ingestion/tests/integration/looker/golden_test_file_path_ingest.json
- metadata-ingestion/tests/integration/looker/golden_test_independent_look_ingest.json
- metadata-ingestion/tests/integration/looker/golden_test_ingest.json
- metadata-ingestion/tests/integration/looker/golden_test_ingest_joins.json
- metadata-ingestion/tests/integration/looker/golden_test_ingest_unaliased_joins.json
- metadata-ingestion/tests/integration/looker/looker_mces_usage_history.json
- metadata-ingestion/tests/integration/lookml/expected_output.json
- metadata-ingestion/tests/integration/lookml/lookml_mces_api_bigquery.json
- metadata-ingestion/tests/integration/lookml/lookml_mces_api_hive2.json
- metadata-ingestion/tests/integration/lookml/lookml_mces_offline.json
- metadata-ingestion/tests/integration/lookml/lookml_mces_offline_platform_instance.json
- metadata-ingestion/tests/integration/lookml/lookml_mces_with_external_urls.json
- metadata-ingestion/tests/integration/lookml/refinements_ingestion_golden.json
Additional context used
Ruff
metadata-ingestion/tests/integration/lookml/test_lookml.py
715-716: Use a single
if
statement instead of nestedif
statements(SIM102)
metadata-ingestion/src/datahub/ingestion/source/looker/looker_common.py
1209-1209: Use of
functools.lru_cache
orfunctools.cache
on methods can lead to memory leaks(B019)
metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py
491-498: Use a single
if
statement instead of nestedif
statements(SIM102)
1531-1533: Within an
except
clause, raise exceptions withraise ... from err
orraise ... from None
to distinguish them from errors in exception handling(B904)
1847-1849: Within an
except
clause, raise exceptions withraise ... from err
orraise ... from None
to distinguish them from errors in exception handling(B904)
Additional comments not posted (7)
metadata-ingestion/tests/integration/lookml/lkml_samples_duplicate_field/dataset_lineages.view.lkml (1)
43-43
: Tag Addition ApprovedThe addition of the
softVersion
tag to theversion
dimension is correctly implemented according to LookML syntax.metadata-ingestion/tests/integration/lookml/duplicate_field_ingestion_golden.json (1)
119-119
: View Logic Representation ApprovedThe addition of the view logic in
viewProperties
is correctly serialized. Ensure that this JSON structure aligns with the expected schema in your integration tests.Verification successful
View Logic Representation Approved
The addition of the view logic in
viewProperties
is correctly serialized. The JSON structure aligns with the expected schema in your integration tests.
- The
viewLogic
contains the necessary fields and dimensions as expected.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify that the JSON structure aligns with the expected schema. echo "Run integration tests to ensure the JSON structure is correct."Length of output: 134
Script:
#!/bin/bash # Run the integration tests and capture their output to verify the JSON structure. pytest metadata-ingestion/tests/integration/lookml/Length of output: 127
Script:
#!/bin/bash # Verify that the JSON structure aligns with the expected schema using jq. jq empty metadata-ingestion/tests/integration/lookml/duplicate_field_ingestion_golden.jsonLength of output: 92
Script:
#!/bin/bash # Search for the schema definition in the codebase. rg 'schema' metadata-ingestion/tests/integration/lookml/Length of output: 148875
Script:
#!/bin/bash # Extract schema fields from the duplicate_field_ingestion_golden.json file and compare them with the expected structure. jq '.[] | .schemaName, .schemaField' metadata-ingestion/tests/integration/lookml/duplicate_field_ingestion_golden.jsonLength of output: 260
Script:
#!/bin/bash # Dump the entire JSON structure to understand its layout and locate the schema fields. jq '.' metadata-ingestion/tests/integration/lookml/duplicate_field_ingestion_golden.jsonLength of output: 16349
Script:
#!/bin/bash # Extract schema fields from the aspect.json.fields within the SchemaMetadata aspect. jq '[.[] | select(.aspectName == "viewProperties") | .aspect.json.viewLogic]' metadata-ingestion/tests/integration/lookml/duplicate_field_ingestion_golden.jsonLength of output: 1024
metadata-ingestion/tests/integration/lookml/field_tag_ingestion_golden.json (1)
553-559
: New Tag Entity Addition ApprovedThe addition of the
softVersion
tag entity is correctly implemented. Ensure that the tag data aligns with the expected attributes and values.Verification successful
New Tag Entity Addition Approved
The addition of the
softVersion
tag entity is correctly implemented, and the tag data aligns with the expected attributes and values.
entityType
: "tag"entityUrn
: "urn:li:tag:softVersion"changeType
: "UPSERT"aspectName
: "tagKey"aspect
:
json
:
name
: "softVersion"systemMetadata
:
lastObserved
: 1586847600000runId
: "lookml-test"lastRunId
: "no-run-id-provided"Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify that the tag data aligns with the expected attributes and values. echo "Run validation checks on the tag data."Length of output: 86
Script:
#!/bin/bash # Extract and display the tag data for `softVersion` from the `field_tag_ingestion_golden.json` file for verification. jq '.[] | select(.entityUrn == "urn:li:tag:softVersion")' metadata-ingestion/tests/integration/lookml/field_tag_ingestion_golden.jsonLength of output: 449
metadata-ingestion/tests/integration/looker/looker_mces_golden_deleted_stateful.json (1)
780-780
: Deletion of Dashboard ApprovedThe deletion of the dashboard with
urn:li:dashboard:(looker,dashboards.11)
is noted. Ensure that this deletion does not affect any dependent systems or integrations.Verification successful
Deletion of Dashboard Approved
The deletion of the dashboard with
urn:li:dashboard:(looker,dashboards.11)
is noted. The references found are within test files, indicating that the deletion is being tested. There is no impact on dependent systems or integrations.
metadata-ingestion/tests/integration/looker/looker_mces_golden_deleted_stateful.json
metadata-ingestion/tests/integration/looker/golden_test_allow_ingest.json
metadata-ingestion/tests/integration/looker/golden_looker_mces.json
metadata-ingestion/tests/integration/looker/test_looker.py
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the impact of the dashboard deletion on dependent systems. echo "Check for any integrations or systems that depend on this dashboard."Length of output: 146
Script:
#!/bin/bash # Search for references to the specific dashboard URN in the codebase rg 'urn:li:dashboard:\(looker,dashboards\.11\)'Length of output: 1116
metadata-ingestion/tests/integration/lookml/test_lookml.py (1)
Line range hint
871-891
: Well-structured test for duplicate field ingestion.This test function is well-designed to check the ingestion process for handling duplicate fields. It follows good practices by comparing the output with a golden file to ensure consistency.
metadata-ingestion/src/datahub/ingestion/source/looker/looker_common.py (1)
247-247
: Approved the addition of thetags
attribute to theViewField
class.This change effectively supports the ingestion of field-level tags from LookML into DataHub, which is aligned with the PR's objectives.
metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py (1)
1093-1093
: Ensure correct handling of tags in_get_fields
method.The implementation correctly checks for the presence of
tags
infield_dict
and assigns an empty list if it is absent. This is crucial for avoidingKeyError
whentags
is not provided.
metadata-ingestion/src/datahub/ingestion/source/looker/looker_common.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (3)
- metadata-ingestion/tests/integration/looker/golden_looker_mces.json (4 hunks)
- metadata-ingestion/tests/integration/looker/golden_test_independent_look_ingest.json (2 hunks)
- metadata-ingestion/tests/integration/looker/looker_mces_golden_deleted_stateful.json (6 hunks)
Files skipped from review as they are similar to previous changes (3)
- metadata-ingestion/tests/integration/looker/golden_looker_mces.json
- metadata-ingestion/tests/integration/looker/golden_test_independent_look_ingest.json
- metadata-ingestion/tests/integration/looker/looker_mces_golden_deleted_stateful.json
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mostly looks good, just two small things
metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py
Outdated
Show resolved
Hide resolved
metadata-ingestion/src/datahub/ingestion/source/looker/looker_common.py
Outdated
Show resolved
Hide resolved
…source.py Co-authored-by: Harshal Sheth <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (2)
- metadata-ingestion/src/datahub/ingestion/source/looker/looker_common.py (3 hunks)
- metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py (1 hunks)
Additional context used
Ruff
metadata-ingestion/src/datahub/ingestion/source/looker/looker_common.py
1204-1204: Use of
functools.lru_cache
orfunctools.cache
on methods can lead to memory leaks(B019)
metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py
491-498: Use a single
if
statement instead of nestedif
statements(SIM102)
1531-1533: Within an
except
clause, raise exceptions withraise ... from err
orraise ... from None
to distinguish them from errors in exception handling(B904)
1847-1849: Within an
except
clause, raise exceptions withraise ... from err
orraise ... from None
to distinguish them from errors in exception handling(B904)
Additional comments not posted (2)
metadata-ingestion/src/datahub/ingestion/source/looker/looker_common.py (1)
247-247
: LGTM! The addition of thetags
attribute toViewField
is correct.The default value of an empty list ensures backward compatibility.
metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py (1)
1093-1093
: LGTM! But verify the function usage in the codebase.The code changes are approved.
However, ensure that all function calls to
_get_fields
are correctly handled with the newtags
attribute.Verification successful
LGTM! The function calls to
_get_fields
handle the newtags
attribute correctly.The code changes are verified to be correctly integrated, ensuring the
tags
attribute is always present in thefield_dict
.
lookml_source.py
:
- Lines where
_get_fields
is called withlooker_view
dictionary keys.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify all function calls to `_get_fields` are correctly handled with the new `tags` attribute. # Test: Search for the function usage. Expect: Only occurances of the new signature. rg --type python $'_get_fields'Length of output: 61
Script:
#!/bin/bash # Description: Verify all function calls to `_get_fields` are correctly handled with the new `tags` attribute. # Test: Search for the function usage. Expect: Only occurrences of the new signature. rg '_get_fields'Length of output: 983
Script:
#!/bin/bash # Description: Extract lines around the calls to `_get_fields` in `lookml_source.py` to review the handling of the `tags` attribute. # Extract 5 lines before and after the calls to `_get_fields`. rg -C 5 '_get_fields' metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.pyLength of output: 1315
field: ViewField, reporter: SourceReport | ||
) -> Optional[GlobalTagsClass]: | ||
if field_type in LookerUtil.type_to_tag_map: | ||
return GlobalTagsClass( | ||
tags=[ | ||
schema_field_tags: List[TagAssociationClass] = [ | ||
TagAssociationClass(tag=builder.make_tag_urn(tag_name)) | ||
for tag_name in field.tags | ||
] | ||
|
||
if field.field_type in LookerUtil.type_to_tag_map: | ||
schema_field_tags.extend( | ||
[ | ||
TagAssociationClass(tag=tag_name) | ||
for tag_name in LookerUtil.type_to_tag_map[field_type] | ||
for tag_name in LookerUtil.type_to_tag_map[field.field_type] | ||
] | ||
) | ||
else: | ||
reporter.report_warning( | ||
"lookml", | ||
f"Failed to map view field type {field_type}. Won't emit tags for it", | ||
f"Failed to map view field type {field.field_type}. Won't emit tags for measure and dimension", | ||
) | ||
return None | ||
|
||
if schema_field_tags: | ||
return GlobalTagsClass(tags=schema_field_tags) | ||
|
||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simplify tag extraction logic.
The tag extraction logic can be simplified by combining the user-defined and predefined tags in a single list comprehension.
@staticmethod
def _get_tags_from_field_type(
field: ViewField, reporter: SourceReport
) -> Optional[GlobalTagsClass]:
schema_field_tags: List[TagAssociationClass] = [
TagAssociationClass(tag=builder.make_tag_urn(tag_name))
for tag_name in field.tags
]
if field.field_type in LookerUtil.type_to_tag_map:
schema_field_tags.extend(
TagAssociationClass(tag=tag_name)
for tag_name in LookerUtil.type_to_tag_map[field.field_type]
)
else:
reporter.report_warning(
"lookml",
f"Failed to map view field type {field.field_type}. Won't emit tags for measure and dimension",
)
if schema_field_tags:
return GlobalTagsClass(tags=schema_field_tags)
return None
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
field: ViewField, reporter: SourceReport | |
) -> Optional[GlobalTagsClass]: | |
if field_type in LookerUtil.type_to_tag_map: | |
return GlobalTagsClass( | |
tags=[ | |
schema_field_tags: List[TagAssociationClass] = [ | |
TagAssociationClass(tag=builder.make_tag_urn(tag_name)) | |
for tag_name in field.tags | |
] | |
if field.field_type in LookerUtil.type_to_tag_map: | |
schema_field_tags.extend( | |
[ | |
TagAssociationClass(tag=tag_name) | |
for tag_name in LookerUtil.type_to_tag_map[field_type] | |
for tag_name in LookerUtil.type_to_tag_map[field.field_type] | |
] | |
) | |
else: | |
reporter.report_warning( | |
"lookml", | |
f"Failed to map view field type {field_type}. Won't emit tags for it", | |
f"Failed to map view field type {field.field_type}. Won't emit tags for measure and dimension", | |
) | |
return None | |
if schema_field_tags: | |
return GlobalTagsClass(tags=schema_field_tags) | |
return None | |
@staticmethod | |
def _get_tags_from_field_type( | |
field: ViewField, reporter: SourceReport | |
) -> Optional[GlobalTagsClass]: | |
schema_field_tags: List[TagAssociationClass] = [ | |
TagAssociationClass(tag=builder.make_tag_urn(tag_name)) | |
for tag_name in field.tags | |
] | |
if field.field_type in LookerUtil.type_to_tag_map: | |
schema_field_tags.extend( | |
TagAssociationClass(tag=tag_name) | |
for tag_name in LookerUtil.type_to_tag_map[field.field_type] | |
) | |
else: | |
reporter.report_warning( | |
"lookml", | |
f"Failed to map view field type {field.field_type}. Won't emit tags for measure and dimension", | |
) | |
if schema_field_tags: | |
return GlobalTagsClass(tags=schema_field_tags) | |
return None |
@@ -602,7 +612,7 @@ | |||
else f"{field.field_type.value}. {field.description}" | |||
), | |||
globalTags=( | |||
LookerUtil._get_tags_from_field_type(field.field_type, reporter) | |||
LookerUtil._get_tags_from_field_type(field, reporter) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Improve readability with ternary operators.
Use ternary operators to improve readability for the description
and globalTags
fields.
@staticmethod
def view_field_to_schema_field(
field: ViewField,
reporter: SourceReport,
tag_measures_and_dimensions: bool = True,
) -> SchemaField:
return SchemaField(
fieldPath=field.name,
type=LookerUtil._get_field_type(field.type, reporter),
nativeDataType=field.type,
label=field.label,
description=(
field.description
if tag_measures_and_dimensions
else f"{field.field_type.value}. {field.description}"
),
globalTags=(
LookerUtil._get_tags_from_field_type(field, reporter)
if tag_measures_and_dimensions
else None
),
isPartOfKey=field.is_primary_key,
)
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
LookerUtil._get_tags_from_field_type(field, reporter) | |
LookerUtil._get_tags_from_field_type(field, reporter) | |
@staticmethod | |
def view_field_to_schema_field( | |
field: ViewField, | |
reporter: SourceReport, | |
tag_measures_and_dimensions: bool = True, | |
) -> SchemaField: | |
return SchemaField( | |
fieldPath=field.name, | |
type=LookerUtil._get_field_type(field.type, reporter), | |
nativeDataType=field.type, | |
label=field.label, | |
description=( | |
field.description | |
if tag_measures_and_dimensions | |
else f"{field.field_type.value}. {field.description}" | |
), | |
globalTags=( | |
LookerUtil._get_tags_from_field_type(field, reporter) | |
if tag_measures_and_dimensions | |
else None | |
), | |
isPartOfKey=field.is_primary_key, | |
) |
Co-authored-by: Harshal Sheth <[email protected]>
Co-authored-by: Harshal Sheth <[email protected]>
Tags defined on lookml view (https://cloud.google.com/looker/docs/reference/param-field-tags) will be ingested as DataHub tag on schema field.
Summary by CodeRabbit
New Features
Bug Fixes
Tests
Chores
globalTags
fields from various test JSON files to clean up data structure.