-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(ingestion/lookml): support looker -- if
comments
#11113
feat(ingestion/lookml): support looker -- if
comments
#11113
Conversation
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThe changes made to the Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Looker
participant DataHub
User->>Looker: Request data
Looker->>DataHub: Fetch metadata
DataHub->>Looker: Return enriched metadata
Looker->>User: Serve requested data
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configuration File (
|
IncompleteSqlTransformer( | ||
source_config=source_config | ||
), # complete any incomplete sql | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interesting - so we're pushing some of our transformation logic into the jinja / lkml parse layer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Outside diff range, codebase verification and nitpick comments (2)
metadata-ingestion/src/datahub/ingestion/source/looker/looker_file_loader.py (1)
33-39
: Occurrences ofliquid_variable
found in source files.The transition from
liquid_variable
tosource_config
is not fully complete, as there are still references toliquid_variable
in the source code. Please review the following locations to ensure that all necessary updates have been made:
metadata-ingestion/src/datahub/ingestion/source/looker/looker_template_language.py
metadata-ingestion/src/datahub/ingestion/source/looker/looker_liquid_tag.py
metadata-ingestion/src/datahub/ingestion/source/looker/lookml_config.py
Consider checking if these references are intentional or need to be updated to use
source_config
.Analysis chain
Encapsulation of configuration details.
The constructor now accepts
source_config
instead ofliquid_variable
, encapsulating configuration details more effectively. This change should improve code clarity and maintainability. Verify that all dependencies onliquid_variable
have been updated to usesource_config
.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the update from `liquid_variable` to `source_config`. # Test: Search for any remaining usage of `liquid_variable`. Expect: No occurrences. rg --type python $'liquid_variable'Length of output: 65
Script:
#!/bin/bash # Description: Verify the update from `liquid_variable` to `source_config`. # Test: Search for any remaining usage of `liquid_variable`. Expect: No occurrences. rg --glob '*.py' 'liquid_variable'Length of output: 4831
metadata-ingestion/src/datahub/ingestion/source/looker/lookml_config.py (1)
177-182
: Add tests forlooker_environment
functionalityThe
looker_environment
field is used in the codebase but does not appear to have direct test coverage. It's important to add tests to ensure that the environment-specific logic behaves correctly in both "prod" and "dev" settings.
- Consider adding tests in relevant test files to cover scenarios for both environments.
Analysis chain
Addition of
looker_environment
enhances configuration flexibility.The
looker_environment
field allows for specifying the Looker environment, enabling environment-specific logic in LookML files. This addition is well-aligned with the PR's objectives and should be thoroughly tested to ensure correct behavior in both "prod" and "dev" environments.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the usage and testing of `looker_environment`. # Test: Search for usage and tests of `looker_environment`. Expect: Correct application and coverage in tests. rg --type python $'looker_environment'Length of output: 68
Script:
#!/bin/bash # Description: Search for the usage of `looker_environment` across the codebase. # Search for occurrences of `looker_environment` in Python files. rg 'looker_environment' --glob '*.py' -A 5Length of output: 2969
Script:
#!/bin/bash # Description: Search for tests related to `looker_environment` across the codebase. # Search for occurrences of `looker_environment` in test files. rg 'looker_environment' --glob '*test*.py' -A 5Length of output: 47
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (11)
- metadata-ingestion/setup.py (1 hunks)
- metadata-ingestion/src/datahub/ingestion/source/looker/looker_constant.py (1 hunks)
- metadata-ingestion/src/datahub/ingestion/source/looker/looker_file_loader.py (3 hunks)
- metadata-ingestion/src/datahub/ingestion/source/looker/looker_template_language.py (2 hunks)
- metadata-ingestion/src/datahub/ingestion/source/looker/lookml_concept_context.py (1 hunks)
- metadata-ingestion/src/datahub/ingestion/source/looker/lookml_config.py (2 hunks)
- metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py (1 hunks)
- metadata-ingestion/tests/integration/lookml/vv-lineage-and-liquid-templates/data.model.lkml (2 hunks)
- metadata-ingestion/tests/integration/lookml/vv-lineage-and-liquid-templates/employee_income_source_as_per_env.view.lkml (1 hunks)
- metadata-ingestion/tests/integration/lookml/vv-lineage-and-liquid-templates/environment_activity_logs.view.lkml (1 hunks)
- metadata-ingestion/tests/integration/lookml/vv_lineage_liquid_template_golden.json (1 hunks)
Additional comments not posted (18)
metadata-ingestion/src/datahub/ingestion/source/looker/looker_constant.py (1)
2-10
: LGTM! Verify the usage of new constants.The introduction of these constants improves the standardization of SQL configurations. Ensure that these constants are utilized correctly throughout the codebase.
metadata-ingestion/tests/integration/lookml/vv-lineage-and-liquid-templates/environment_activity_logs.view.lkml (1)
1-12
: LGTM! Ensure environment-specific configurations are tested.The use of conditional SQL table names for different environments is a useful feature. Ensure that these configurations are thoroughly tested to avoid issues in different environments.
Verification successful
Environment-specific configurations are present in LookML files. The
-- if prod --
and-- if dev --
patterns are correctly used for SQL table names and other configurations. Ensure these configurations are thoroughly tested in their respective environments to prevent issues during deployment.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the correctness of environment-specific configurations in LookML files. # Test: Check for the presence of environment-specific configurations. Expect: Correct configurations for each environment. rg --type lookml '-- if prod' -A 2 rg --type lookml '-- if dev' -A 2Length of output: 897
Script:
#!/bin/bash # Description: Verify the correctness of environment-specific configurations in LookML files. # Test: Check for the presence of environment-specific configurations. Expect: Correct configurations for each environment. rg -e '-- if prod' -A 2 --type lookml rg -e '-- if dev' -A 2 --type lookmlLength of output: 140
Script:
#!/bin/bash # Description: Verify the correctness of environment-specific configurations in LookML files. # Find all .lkml files and search for environment-specific configurations. fd --extension lkml --exec rg -e '-- if prod' -A 2 {} fd --extension lkml --exec rg -e '-- if dev' -A 2 {}Length of output: 884
metadata-ingestion/tests/integration/lookml/vv-lineage-and-liquid-templates/data.model.lkml (1)
Line range hint
9-36
:
LGTM! Verify the integration of new datasets.The inclusion of new datasets and explore blocks enhances the data model. Ensure that these datasets are correctly integrated and utilized in the LookML model.
Verification successful
Integration of New Datasets Verified Successfully
The
include
andexplore
statements for the new datasetsenvironment_activity_logs
andemployee_income_source_as_per_env
are correctly integrated in the LookML model.
include: "environment_activity_logs.view.lkml"
include: "employee_income_source_as_per_env.view.lkml"
explore: environment_activity_logs
explore: employee_income_source_as_per_env
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the integration of new datasets in the LookML model. # Test: Check for the inclusion and exploration of new datasets. Expect: Correct integration of each dataset. rg --type lookml 'include: "environment_activity_logs.view.lkml"' rg --type lookml 'include: "employee_income_source_as_per_env.view.lkml"' rg --type lookml 'explore: environment_activity_logs {' rg --type lookml 'explore: employee_income_source_as_per_env {'Length of output: 617
Script:
#!/bin/bash # Description: Verify the integration of new datasets in the LookML model. # Test: Check for the inclusion and exploration of new datasets. Expect: Correct integration of each dataset. rg 'include: "environment_activity_logs.view.lkml"' --glob '*.lkml' rg 'include: "employee_income_source_as_per_env.view.lkml"' --glob '*.lkml' rg 'explore: environment_activity_logs \{' --glob '*.lkml' rg 'explore: employee_income_source_as_per_env \{' --glob '*.lkml'Length of output: 829
metadata-ingestion/tests/integration/lookml/vv-lineage-and-liquid-templates/employee_income_source_as_per_env.view.lkml (2)
14-14
: Clarify the source table selection logic.The comment
-- if dev -- dev_income_source -- if prod -- prod_income_source
suggests conditional logic for selecting the source table. Ensure that this logic is implemented correctly in the Looker environment, possibly using the newly introducedlooker_environment
configuration.
6-12
: Ensure correct handling of environment-specific logic.The use of
{% if %}
and{% elsif %}
for selecting tables based on environment-specific flags is a good approach for dynamic SQL generation. Verify thatdw_eff_dt_date
,finance_dw_eff_dt_date
,dw_eff_dt_week
, andfinance_dw_eff_dt_week
are correctly defined in the Looker environment and that their_is_selected
attributes are properly set.Verification successful
Environment-specific logic is correctly implemented and tested.
The flags
dw_eff_dt_date
,finance_dw_eff_dt_date
,dw_eff_dt_week
, andfinance_dw_eff_dt_week
are defined and their_is_selected
attributes are used appropriately in the codebase, as evidenced by their presence in LookML files and the test filetest_lookml.py
.
metadata-ingestion/tests/integration/lookml/test_lookml.py
: Contains definitions and tests for_is_selected
attributes.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the definition and usage of environment-specific flags. # Test: Search for the flags in the codebase. Expect: Definitions and usage of `_is_selected` attributes. rg --type lkml $'dw_eff_dt_date|finance_dw_eff_dt_date|dw_eff_dt_week|finance_dw_eff_dt_week'Length of output: 123
Script:
#!/bin/bash # Description: Verify the definition and usage of environment-specific flags. # Test: Search for the flags in the codebase. Expect: Definitions and usage of `_is_selected` attributes. rg 'dw_eff_dt_date|finance_dw_eff_dt_date|dw_eff_dt_week|finance_dw_eff_dt_week'Length of output: 4843
metadata-ingestion/src/datahub/ingestion/source/looker/looker_file_loader.py (1)
10-10
: Refactor aligns with improved template processing.The replacement of
resolve_liquid_variable_in_view_dict
withprocess_lookml_template_language
suggests a more robust approach to handling LookML templates. Ensure that the new function covers all necessary template processing scenarios.Verification successful
Refactor aligns with improved template processing.
The
process_lookml_template_language
function is implemented inmetadata-ingestion/src/datahub/ingestion/source/looker/looker_template_language.py
. It uses a series of transformers to process LookML templates, indicating a comprehensive approach to handling template processing scenarios. This aligns with the review comment's suggestion of a more robust method.
metadata-ingestion/src/datahub/ingestion/source/looker/looker_template_language.py
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the implementation and coverage of `process_lookml_template_language`. # Test: Search for the function implementation. Expect: Comprehensive handling of LookML templates. ast-grep --lang python --pattern $'def process_lookml_template_language($_, $_):\n $$$'Length of output: 90
Script:
#!/bin/bash # Description: Search for the function `process_lookml_template_language` without specifying parameters to find any implementation. rg 'def process_lookml_template_language' --glob '*.py' -A 10Length of output: 1353
metadata-ingestion/src/datahub/ingestion/source/looker/looker_template_language.py (7)
111-164
: Good use of abstract base class pattern.The
LookMLViewTransformer
class is well-structured, effectively using the abstract base class pattern to enforce implementation of the_apply_transformation
method in subclasses. Thetransform
method is central to the class's functionality and is well-implemented.
166-175
: LGTM!The
LiquidVariableTransformer
class correctly implements the transformation logic for resolving liquid variables.
178-204
: Well-implemented SQL completion logic.The
IncompleteSqlTransformer
class effectively handles incomplete SQL fragments by adding necessarySELECT
andFROM
clauses.
207-219
: Efficient pattern removal.The
DropDerivedViewPatternTransformer
class uses regular expressions efficiently to remove${}
patterns from SQL table names.
222-254
: Well-constructed regex for conditional comments.The
LookMlIfCommentTransformer
class effectively evaluates Looker-- if --
comments using well-constructed regular expressions.
257-286
: Effective transformation management.The
TransformedLookMlView
class is well-structured and effectively manages the transformation process using a sequence of transformers.
289-318
: Modular and extensible transformation process.The
process_lookml_template_language
function is well-designed, applying transformations in a prioritized manner. The use of transformers enhances modularity and extensibility.metadata-ingestion/src/datahub/ingestion/source/looker/lookml_concept_context.py (1)
318-318
: Improved string handling.The modification to the
datahub_transformed_sql_table_name
method enhances robustness by ensuring the output is free from excess whitespace and surrounding spaces.metadata-ingestion/setup.py (1)
174-174
: New dependency added:deepmerge
.The addition of the
deepmerge>=1.1.1
dependency enhances the project's capability to handle deep merging of data structures.metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py (1)
672-672
: Verify usage ofself.source_config
inget_internal_workunits
.The change from
self.source_config.liquid_variable
toself.source_config
allows broader access to configuration properties. Ensure that the method logic is updated to handle this broader scope and that any specific logic previously relying onliquid_variable
is correctly adjusted.Verification successful
Usage of
self.source_config
inget_internal_workunits
is correctly handled.The method now accesses multiple properties of
self.source_config
, such asbase_folder
,project_name
, andmodel_pattern
. This indicates that the broader configuration scope is being utilized effectively, and the logic appears consistent with this change. No issues found.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the usage of `self.source_config` within the `get_internal_workunits` method. # Test: Search for the usage of `self.source_config` in the method. Expect: Correct handling of the broader configuration scope. ast-grep --lang python --pattern $'def get_internal_workunits(self) -> Iterable[MetadataWorkUnit]:\n $$$\n self.source_config\n $$$'Length of output: 133
Script:
#!/bin/bash # Description: Use ripgrep to search for `self.source_config` usage within the `get_internal_workunits` method. # Test: Search for `self.source_config` in the file to understand its usage context. rg --context 5 'def get_internal_workunits' -A 50 metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py | rg 'self\.source_config'Length of output: 426
metadata-ingestion/tests/integration/lookml/vv_lineage_liquid_template_golden.json (2)
1583-1768
: Verify conditional comments inviewProperties
forenvironment_activity_logs
.The
viewProperties
aspect uses conditional comments to define SQL logic. Verify that these comments are correctly interpreted and executed in the Looker environment.
1769-2048
: Verify conditional logic inviewProperties
foremployee_income_source_as_per_env
.The
viewProperties
aspect uses conditional logic to dynamically select data sources. Verify that this logic is correctly implemented and functions as intended in different environments.Verification successful
Conditional logic in
viewProperties
is correctly implemented.The
viewLogic
aspect foremployee_income_source_as_per_env
includes conditional logic to dynamically select data sources, confirming its correct implementation. No further action is required.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the correctness of conditional logic in `viewProperties`. # Test: Search for the usage of conditional logic in the dataset entry. Expect: Correct implementation and functionality. rg --json -A 5 '"entityUrn": "urn:li:dataset:(urn:li:dataPlatform:looker,lkml_samples.view.employee_income_source_as_per_env,PROD)"' | rg '"viewLogic":'Length of output: 154
Script:
#!/bin/bash # Description: Re-verify the correctness of conditional logic in `viewProperties`. # Test: Search for the `viewLogic` aspect within the dataset entry more effectively. rg '"entityUrn": "urn:li:dataset:(urn:li:dataPlatform:looker,lkml_samples.view.employee_income_source_as_per_env,PROD)"' -A 50 metadata-ingestion/tests/integration/lookml/vv_lineage_liquid_template_golden.json | rg '"viewLogic":'Length of output: 231
Script:
#!/bin/bash # Description: Search for any instance of `viewLogic` in the entire JSON file to verify its existence and format. # Test: Search for the `viewLogic` aspect in the entire file. rg '"viewLogic":' metadata-ingestion/tests/integration/lookml/vv_lineage_liquid_template_golden.jsonLength of output: 4472
metadata-ingestion/src/datahub/ingestion/source/looker/looker_template_language.py
Show resolved
Hide resolved
metadata-ingestion/src/datahub/ingestion/source/looker/looker_template_language.py
Outdated
Show resolved
Hide resolved
metadata-ingestion/src/datahub/ingestion/source/looker/looker_template_language.py
Show resolved
Hide resolved
metadata-ingestion/src/datahub/ingestion/source/looker/looker_template_language.py
Outdated
Show resolved
Hide resolved
metadata-ingestion/src/datahub/ingestion/source/looker/looker_template_language.py
Outdated
Show resolved
Hide resolved
…-fork into ing-664-if-prod-lookml
-- if
comments
There are many transformations that we need to perform on the LookML view to make it suitable for metadata ingestion.
These transformations include:
The Python module looker_template_language.py handles all these transformations. To keep the code readable and extensible, we have added a transformer for each of the operations mentioned above. If we need to perform any additional transformations in the future, we can easily add a new transformer to handle that scenario.
Each transformer works on specific attributes of the LookML view. For example, the #4 transformation is only applicable to the view.derived.sql attribute, while the other transformations apply to both the view.sql_table_name and view.derived.sql attributes.
The class LookMLViewTransformer contains the logic to ensure that the transformer is applied to specific attributes and returns a dictionary containing the transformed data. For example, in cases #1 and #2, it returns:
transformed derived_table:
Where as original was:
Each transformation generates a section of the transformed dictionary with a new attribute named datahub_transformed_<original-attribute-name>.
The class TransformedLookMLView is collecting all such outputs to create a new transformed LookML view. It creates a copy of the original view dictionary and updates the copy with the transformed output. The deepmerge library is used because Python's dict.update function doesn't merge nested fields. The transformed LookML view will contain the following attributes: