Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ingest): add ingestion source for SAP Analytics Cloud #10958

Merged
merged 9 commits into from
Aug 26, 2024

Conversation

Masterchen09
Copy link
Contributor

@Masterchen09 Masterchen09 commented Jul 21, 2024

@hsheth2 As discussed, here is the ingestion source for the SAP Analytics Cloud with everything included (docs, tests, configuration in the UI, ...).

Two remarks regarding two changes which are also included in this PR, but have nothing to do with the SAP Analytics Cloud source:

  • It seems the getLogoFromPlatform function is not used anymore, therefore I have removed it...
  • I have noticed that the ingestion sources are currently not sorted when they are selected in the UI...therefore I have added a sort function in SelectTemplateStep to sort the sources alphabetically, while making sure, that the "Custom" source is always displayed last...;-)

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

Summary by CodeRabbit

  • New Features

    • Enhanced integration with SAP Analytics Cloud (SAC) for data ingestion.
    • Support for various SAC connection parameters, including tenant URL and token URL.
    • New source configuration for SAC integration introduced, expanding platform capabilities.
  • Bug Fixes

    • Improved sorting logic for displayed sources in the UI to prioritize non-custom entries.
  • Documentation

    • New documentation detailing SAC integration setup, configuration notes, and limitations.
  • Tests

    • Introduced integration tests for the SAC API to ensure proper interaction and validation.

@github-actions github-actions bot added ingestion PR or Issue related to the ingestion of metadata product PR or Issue related to the DataHub UI/UX devops PR or Issue related to DataHub backend & deployment community-contribution PR or Issue raised by member(s) of DataHub Community labels Jul 21, 2024
Copy link
Contributor

coderabbitai bot commented Jul 21, 2024

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

The recent updates significantly enhance the integration of SAP Analytics Cloud (SAC) into the application, focusing on improved data ingestion and user configuration capabilities. Key changes include the introduction of SAC-related constants, a dedicated source connector for SAC, and comprehensive documentation. These enhancements empower users to seamlessly connect to SAC, manage their data sources effectively, and streamline workflows, ultimately boosting productivity.

Changes

Files Change Summary
.../RecipeForm/constants.ts
.../RecipeForm/sac.ts
.../constants.ts
Introduced SAC-related constants, updated RECIPE_FIELDS for SAC, and modified logo mappings for SAC.
.../SelectTemplateStep.tsx Added a custom sorting mechanism for filteredSources, prioritizing non-custom sources.
.../sources.json Added a new entry for SAP Analytics Cloud with essential metadata and configuration parameters.
.../sac/sac.py Implemented the SACSource connector for managing SAC resources and extracting metadata.
.../tests/integration/sac/test_sac.py Created integration tests for the SAC API using mock requests to validate interactions with the service.
.../sources/sac/sac_pre.md Provided detailed configuration and setup documentation for integrating SAC with DataHub.
.../data_platforms.json Added a new entry for SAP Analytics Cloud in the data platforms configuration.
.../logging_manager.py Adjusted logging configuration to include settings for the pyodata library, enhancing logging control.
.../setup.py Introduced new dependencies for the sac plugin and ensured registration within the system's entry points.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant App
    participant SACSource
    participant SACAPI

    User->>App: Configure SAC
    App->>SACSource: Initialize with config
    SACSource->>SACAPI: Authenticate
    SACAPI-->>SACSource: Access Token
    SACSource->>SACAPI: Retrieve Resources
    SACAPI-->>SACSource: Resource Metadata
    SACSource->>App: Return Resources
    App->>User: Display Resources
Loading

🐇 In fields of data, I hop and play,
With SAC now in tow, we’ll dance all day!
New constants, tests, and docs galore,
Let’s feast on insights, oh what a score!
Integrate with joy, let metrics soar!
Hoppy times ahead, who could ask for more? 🎉


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 20574cf and cd38fef.

Files ignored due to path filters (1)
  • datahub-web-react/src/images/saclogo.svg is excluded by !**/*.svg
Files selected for processing (19)
  • datahub-web-react/src/app/ingest/source/builder/RecipeForm/constants.ts (3 hunks)
  • datahub-web-react/src/app/ingest/source/builder/RecipeForm/sac.ts (1 hunks)
  • datahub-web-react/src/app/ingest/source/builder/SelectTemplateStep.tsx (1 hunks)
  • datahub-web-react/src/app/ingest/source/builder/constants.ts (3 hunks)
  • datahub-web-react/src/app/ingest/source/builder/sources.json (1 hunks)
  • datahub-web-react/src/app/ingest/source/conf/sac/sac.ts (1 hunks)
  • datahub-web-react/src/app/ingest/source/conf/sources.tsx (2 hunks)
  • metadata-ingestion/docs/sources/sac/sac_pre.md (1 hunks)
  • metadata-ingestion/docs/sources/sac/sac_recipe.yml (1 hunks)
  • metadata-ingestion/setup.py (3 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/common/subtypes.py (2 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/sac/init.py (1 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/sac/sac.py (1 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/sac/sac_common.py (1 hunks)
  • metadata-ingestion/src/datahub/utilities/logging_manager.py (1 hunks)
  • metadata-ingestion/tests/integration/sac/metadata.xml (1 hunks)
  • metadata-ingestion/tests/integration/sac/sac_mces_golden.json (1 hunks)
  • metadata-ingestion/tests/integration/sac/test_sac.py (1 hunks)
  • metadata-service/war/src/main/resources/boot/data_platforms.json (1 hunks)
Additional context used
Ruff
metadata-ingestion/src/datahub/ingestion/source/sac/__init__.py

1-1: datahub.ingestion.source.sac.sac.SACSource imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)

yamllint
metadata-ingestion/docs/sources/sac/sac_recipe.yml

[error] 11-11: syntax error: found unknown escape character '$'

(syntax)

LanguageTool
metadata-ingestion/docs/sources/sac/sac_pre.md

[duplication] ~5-~5: Possible typo: you repeated a word
Context: ...llowing properties: - Purpose: API Access - Access: - Data Import Service - Au...

(ENGLISH_WORD_REPEAT_RULE)


[formatting] ~40-~40: If the ‘because’ clause is essential to the meaning, do not use a comma before the clause.
Context: ...Story or an Application will be ingested, because there is no dedicated API to retrieve m...

(COMMA_BEFORE_BECAUSE)


[formatting] ~41-~41: If the ‘because’ clause is essential to the meaning, do not use a comma before the clause.
Context: ...rowse Paths for models cannot be created, because the folder where the models are saved i...

(COMMA_BEFORE_BECAUSE)


[formatting] ~42-~42: If the ‘because’ clause is essential to the meaning, do not use a comma before the clause.
Context: ... is only ingested for Import Data Models, because there is no possibility to get the sche...

(COMMA_BEFORE_BECAUSE)


[formatting] ~43-~43: If the ‘because’ clause is essential to the meaning, do not use a comma before the clause.
Context: ...or Import Data Models cannot be ingested, because the API is not providing any informatio...

(COMMA_BEFORE_BECAUSE)


[uncategorized] ~44-~44: A comma may be missing after the conjunctive/linking adverb ‘Currently’.
Context: ...t providing any information about it. - Currently only SAP BW and SAP HANA are supported ...

(SENT_START_CONJUNCTIVE_LINKING_ADVERB_COMMA)


[typographical] ~45-~45: The word “therefore” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...the models are Live Data or Import Data Models, therefore these models will be ingested only with...

(THUS_SENTENCE)

Additional comments not posted (68)
datahub-web-react/src/app/ingest/source/conf/sac/sac.ts (1)

1-26: LGTM!

The configuration for the new SAP Analytics Cloud ingestion source is well-structured and follows best practices. The placeholder recipe provides clear guidance on configuring the source.

metadata-ingestion/src/datahub/ingestion/source/sac/sac_common.py (1)

1-45: LGTM!

The data models are well-defined and follow best practices. The use of dataclass(frozen=True) ensures immutability, and optional fields are correctly annotated.

metadata-ingestion/docs/sources/sac/sac_recipe.yml (5)

1-5: LGTM!

The source type and stateful ingestion configuration are correctly specified.


7-8: LGTM!

The tenant and token URL configuration are correctly specified with placeholders and comments.


14-18: LGTM!

The ingestion configuration for stories and applications is correctly specified.


20-30: LGTM!

The resource ID, name, and folder pattern configuration are correctly specified to allow all.


32-40: LGTM!

The connection mapping configuration is correctly specified for SAP Analytics Cloud connections.

datahub-web-react/src/app/ingest/source/conf/sources.tsx (2)

20-20: LGTM!

The import statement for sacConfig is correctly specified.


52-52: LGTM!

The sacConfig is correctly added to the SOURCE_TEMPLATE_CONFIGS array.

metadata-ingestion/src/datahub/ingestion/source/common/subtypes.py (2)

21-23: LGTM!

The new constants SAC_MODEL, SAC_IMPORT_DATA_MODEL, and SAC_LIVE_DATA_MODEL are correctly specified.


79-80: LGTM!

The new constants SAC_STORY and SAC_APPLICATION are correctly specified.

metadata-ingestion/docs/sources/sac/sac_pre.md (3)

28-37: LGTM!

The concept mapping section is clear and accurate.


38-45: LGTM!

The limitations section is clear and accurate.

Tools
LanguageTool

[formatting] ~40-~40: If the ‘because’ clause is essential to the meaning, do not use a comma before the clause.
Context: ...Story or an Application will be ingested, because there is no dedicated API to retrieve m...

(COMMA_BEFORE_BECAUSE)


[formatting] ~41-~41: If the ‘because’ clause is essential to the meaning, do not use a comma before the clause.
Context: ...rowse Paths for models cannot be created, because the folder where the models are saved i...

(COMMA_BEFORE_BECAUSE)


[formatting] ~42-~42: If the ‘because’ clause is essential to the meaning, do not use a comma before the clause.
Context: ... is only ingested for Import Data Models, because there is no possibility to get the sche...

(COMMA_BEFORE_BECAUSE)


[formatting] ~43-~43: If the ‘because’ clause is essential to the meaning, do not use a comma before the clause.
Context: ...or Import Data Models cannot be ingested, because the API is not providing any informatio...

(COMMA_BEFORE_BECAUSE)


[uncategorized] ~44-~44: A comma may be missing after the conjunctive/linking adverb ‘Currently’.
Context: ...t providing any information about it. - Currently only SAP BW and SAP HANA are supported ...

(SENT_START_CONJUNCTIVE_LINKING_ADVERB_COMMA)


[typographical] ~45-~45: The word “therefore” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...the models are Live Data or Import Data Models, therefore these models will be ingested only with...

(THUS_SENTENCE)


5-5: Possible typo: repeated word

The word "Access" is repeated.

-    - Access:
-        - Data Import Service
+    - Access: Data Import Service

Likely invalid or redundant comment.

Tools
LanguageTool

[duplication] ~5-~5: Possible typo: you repeated a word
Context: ...llowing properties: - Purpose: API Access - Access: - Data Import Service - Au...

(ENGLISH_WORD_REPEAT_RULE)

datahub-web-react/src/app/ingest/source/builder/SelectTemplateStep.tsx (1)

107-117: LGTM!

The sorting logic is correct and efficiently places "custom" at the end while sorting the remaining items alphabetically.

datahub-web-react/src/app/ingest/source/builder/RecipeForm/sac.ts (12)

1-12: LGTM!

The SAC_TENANT_URL field is correctly defined and the configuration is accurate.


14-23: LGTM!

The SAC_TOKEN_URL field is correctly defined and the configuration is accurate.


25-34: LGTM!

The SAC_CLIENT_ID field is correctly defined and the configuration is accurate.


36-45: LGTM!

The SAC_CLIENT_SECRET field is correctly defined and the configuration is accurate.


47-55: LGTM!

The INGEST_STORIES field is correctly defined and the configuration is accurate.


57-65: LGTM!

The INGEST_APPLICATIONS field is correctly defined and the configuration is accurate.


67-81: LGTM!

The RESOURCE_ID_ALLOW field is correctly defined and the configuration is accurate.


83-97: LGTM!

The RESOURCE_ID_DENY field is correctly defined and the configuration is accurate.


99-113: LGTM!

The RESOURCE_NAME_ALLOW field is correctly defined and the configuration is accurate.


115-129: LGTM!

The RESOURCE_NAME_DENY field is correctly defined and the configuration is accurate.


131-145: LGTM!

The FOLDER_ALLOW field is correctly defined and the configuration is accurate.


147-161: LGTM!

The FOLDER_DENY field is correctly defined and the configuration is accurate.

datahub-web-react/src/app/ingest/source/builder/constants.ts (4)

37-37: Import statement for sacLogo looks good.

The import path is consistent with other logo imports.


126-126: Constant declaration for SAC looks good.

The constant follows the existing pattern and is correctly defined.


127-127: URN declaration for SAC looks good.

The URN follows the existing pattern and is correctly defined.


167-167: Mapping of SAC_URN to sacLogo looks good.

The mapping is correctly added and follows the existing pattern.

metadata-ingestion/src/datahub/utilities/logging_manager.py (1)

281-283: Logging configurations for pyodata look good.

The logging levels are correctly set to WARNING for the specified components.

metadata-ingestion/tests/integration/sac/test_sac.py (10)

1-18: Imports and mock constants look good.

The imports and constants are correctly defined and necessary for the tests.


20-51: test_sac function and mock token URL matching look good.

The function and mock request are correctly defined and necessary for the tests.


53-65: Metadata mock request and matching function look good.

The mock request and matching function are correctly defined and necessary for the tests.


67-120: Resources mock request and matching function look good.

The mock request and matching function are correctly defined and necessary for the tests.


122-178: Resource mock request and matching function look good.

The mock request and matching function are correctly defined and necessary for the tests.


180-199: Models mock request and matching function look good.

The mock request and matching function are correctly defined and necessary for the tests.


201-273: Model metadata mock request and matching function look good.

The mock request and matching function are correctly defined and necessary for the tests.


275-296: Pipeline creation and execution look good.

The pipeline creation and execution are correctly defined and necessary for the tests.


297-302: Golden file check looks good.

The golden file check is correctly defined and necessary for the tests.


305-310: Authorization check function looks good.

The authorization check function is correctly defined and necessary for the tests.

metadata-service/war/src/main/resources/boot/data_platforms.json (1)

668-676: Ensure the new SAP Analytics Cloud entry follows the correct format.

The new entry for SAP Analytics Cloud seems correctly formatted and consistent with other entries in the JSON structure. Ensure that the logoUrl points to a valid location and that the urn and other attributes are accurate and necessary for integration.

datahub-web-react/src/app/ingest/source/builder/RecipeForm/constants.ts (4)

86-86: New import for SAP Analytics Cloud constants.

The new import statement for SAP Analytics Cloud constants is correctly added. Ensure that the imported constants are defined and used appropriately in the rest of the file.


174-187: New constants for SAP Analytics Cloud.

The new constants related to SAP Analytics Cloud are correctly imported from the sac module. Ensure that these constants are defined in the sac module and are used appropriately in the RECIPE_FIELDS object.


536-549: New entry in RECIPE_FIELDS for SAP Analytics Cloud.

The new entry for SAP Analytics Cloud in the RECIPE_FIELDS object is correctly added. Ensure that the fields, filterFields, and advancedFields are correctly defined and consistent with other entries.


554-561: Update to CONNECTORS_WITH_TEST_CONNECTION to include SAP Analytics Cloud.

The update to include SAP Analytics Cloud in the CONNECTORS_WITH_TEST_CONNECTION set is correctly added. Ensure that the test connection functionality for SAP Analytics Cloud is implemented and works as expected.

metadata-ingestion/tests/integration/sac/sac_mces_golden.json (1)

1-662: New test data for SAP Analytics Cloud integration.

The new JSON file contains comprehensive test data for SAP Analytics Cloud integration. Ensure that the test data covers all necessary aspects and scenarios for SAP Analytics Cloud integration, including dashboards, datasets, and upstream lineage.

The structure and content of the test data appear to be consistent and well-formed. Verify that the entityUrn, aspectName, and other attributes are accurate and necessary for the intended tests.

datahub-web-react/src/app/ingest/source/builder/sources.json (6)

291-291: LGTM! The urn attribute is correctly formatted.

The urn uniquely identifies the data platform as "urn:li:dataPlatform:sac".


292-292: LGTM! The name attribute is correctly formatted.

The name attribute is set to "sac".


293-293: LGTM! The displayName attribute is correctly formatted.

The displayName attribute is set to "SAP Analytics Cloud".


294-294: LGTM! The description attribute is clear and informative.

The description provides an overview of the functionality, indicating that it allows for the importation of stories, applications, and models from SAP Analytics Cloud.


295-295: LGTM! The docsUrl attribute is correctly formatted.

The docsUrl points to the relevant documentation for this data source.


296-296: LGTM! The recipe attribute is clear and informative.

The recipe outlines the configuration parameters required for integration, including tenant_url, token_url, client_id, and client_secret, with placeholders for sensitive information.

metadata-ingestion/src/datahub/ingestion/source/sac/sac.py (8)

1-11: LGTM! The imports are necessary and relevant.

The imports include standard libraries, third-party libraries, and internal modules.


92-105: LGTM! The ConnectionMappingConfig class is well-defined.

The class includes fields for platform, platform_instance, and env, providing the necessary configuration for connection mapping.


108-161: LGTM! The SACSourceConfig class is comprehensive.

The class includes various fields for configuration, such as tenant_url, token_url, client_id, client_secret, and patterns for selecting resources.


164-166: LGTM! The SACSourceReport class is well-defined.

The class is a dataclass that extends StaleEntityRemovalSourceReport.


169-221: LGTM! The SACSource class is comprehensive and well-implemented.

The class includes various methods for handling data ingestion, such as __init__, create, test_connection, get_workunit_processors, get_workunits_internal, get_resources, and others.


421-630: LGTM! The get_model_workunits method is well-implemented.

The method generates work units for datasets based on the provided model, handling various aspects of dataset ingestion, such as dataset properties, schema metadata, and upstream lineage.


632-733: LGTM! The get_resources method is well-implemented.

The method retrieves resources from the SAP Analytics Cloud tenant and generates Resource objects, handling various aspects of resource retrieval, such as filtering, pattern matching, and parsing resource models.


735-805: LGTM! The utility methods are well-implemented.

The methods include get_import_data_model_ids, get_import_data_model_columns, get_query_name, get_view_name, get_schema_field_data_type, and _add_sap_sac_custom_auth_header, providing necessary functionality for the SAC source connector.

metadata-ingestion/tests/integration/sac/metadata.xml (4)

1-2: LGTM! The XML header and root element are correctly formatted.

The XML header and root element provide the necessary metadata for the XML document.


3-402: LGTM! The Schema element is well-defined.

The Schema element defines the namespace and includes various EntityType and Association elements, providing the necessary structure for the XML document.


300-401: LGTM! The EntityContainer element is well-defined.

The EntityContainer element defines various EntitySet and AssociationSet elements, providing the necessary structure for the XML document.


403-404: LGTM! The closing tags are correctly formatted.

The closing tags for the Schema and root elements provide the necessary structure for the XML document.

metadata-ingestion/setup.py (2)

306-310: Add SAC dependency group.

The sac dependency group is correctly added with the necessary dependencies.


728-728: Register SAC entry point.

The sac entry point is correctly registered, linking to the SACSource class.

metadata-ingestion/docs/sources/sac/sac_recipe.yml Outdated Show resolved Hide resolved
metadata-ingestion/docs/sources/sac/sac_pre.md Outdated Show resolved Hide resolved
metadata-ingestion/docs/sources/sac/sac_pre.md Outdated Show resolved Hide resolved
metadata-ingestion/docs/sources/sac/sac_pre.md Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Outside diff range, codebase verification and nitpick comments (5)
metadata-ingestion/docs/sources/sac/sac_pre.md (5)

3-3: Add comma after 'Currently'.

A comma is needed after "Currently".

- Currently only SAP BW and SAP HANA are supported for ingesting the upstream lineages of Live Data Models - a warning is logged for all other connection types, please feel free to open an [issue on GitHub](https://github.com/datahub-project/datahub/issues/new/choose) with the warning message to have this fixed.
+ Currently, only SAP BW and SAP HANA are supported for ingesting the upstream lineages of Live Data Models - a warning is logged for all other connection types, please feel free to open an [issue on GitHub](https://github.com/datahub-project/datahub/issues/new/choose) with the warning message to have this fixed.

12-12: Add missing comma.

A comma is needed after "environments".

- To map individual connections in SAP Analytics Cloud to platforms, platform instances or environments the `connection_mapping` configuration can be used within the recipe:
+ To map individual connections in SAP Analytics Cloud to platforms, platform instances, or environments, the `connection_mapping` configuration can be used within the recipe:
Tools
LanguageTool

[uncategorized] ~12-~12: Possible missing comma found.
Context: ...oud to platforms, platform instances or environments the connection_mapping configuration ...

(AI_HYDRA_LEO_MISSING_COMMA)


40-43: Remove comma before 'because'.

The comma before "because" is unnecessary and should be removed.

- Only models which are used in a Story or an Application will be ingested, because there is no dedicated API to retrieve models (only for Stories and Applications).
- Browse Paths for models cannot be created, because the folder where the models are saved is not returned by the API.
- Schema metadata is only ingested for Import Data Models, because there is no possibility to get the schema metadata of the other model types.
- Lineages for Import Data Models cannot be ingested, because the API is not providing any information about it.
+ Only models which are used in a Story or an Application will be ingested because there is no dedicated API to retrieve models (only for Stories and Applications).
+ Browse Paths for models cannot be created because the folder where the models are saved is not returned by the API.
+ Schema metadata is only ingested for Import Data Models because there is no possibility to get the schema metadata of the other model types.
+ Lineages for Import Data Models cannot be ingested because the API is not providing any information about it.

45-45: Separate 'therefore' from the sentence.

The word "therefore" should be separated from the sentence.

- For some models (e.g., builtin models) it cannot be detected whether the models are Live Data or Import Data Models, therefore these models will be ingested only with the `Story` subtype.
+ For some models (e.g., builtin models) it cannot be detected whether the models are Live Data or Import Data Models. Therefore, these models will be ingested only with the `Story` subtype.

5-5: Possible typo: repeated word.

The word "Access" is repeated.

- Purpose: API Access
- Access:
- Data Import Service
+ Purpose: API Access
+ Data Import Service
Tools
LanguageTool

[duplication] ~5-~5: Possible typo: you repeated a word
Context: ...llowing properties: - Purpose: API Access - Access: - Data Import Service - Au...

(ENGLISH_WORD_REPEAT_RULE)

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between cd38fef and 6befa6a.

Files ignored due to path filters (1)
  • datahub-web-react/src/images/saclogo.svg is excluded by !**/*.svg
Files selected for processing (18)
  • datahub-web-react/src/app/ingest/source/builder/RecipeForm/constants.ts (3 hunks)
  • datahub-web-react/src/app/ingest/source/builder/RecipeForm/sac.ts (1 hunks)
  • datahub-web-react/src/app/ingest/source/builder/SelectTemplateStep.tsx (1 hunks)
  • datahub-web-react/src/app/ingest/source/builder/constants.ts (3 hunks)
  • datahub-web-react/src/app/ingest/source/builder/sources.json (1 hunks)
  • datahub-web-react/src/app/ingest/source/conf/sac/sac.ts (1 hunks)
  • datahub-web-react/src/app/ingest/source/conf/sources.tsx (2 hunks)
  • metadata-ingestion/docs/sources/sac/sac_pre.md (1 hunks)
  • metadata-ingestion/docs/sources/sac/sac_recipe.yml (1 hunks)
  • metadata-ingestion/setup.py (3 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/common/subtypes.py (2 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/sac/sac.py (1 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/sac/sac_common.py (1 hunks)
  • metadata-ingestion/src/datahub/utilities/logging_manager.py (1 hunks)
  • metadata-ingestion/tests/integration/sac/metadata.xml (1 hunks)
  • metadata-ingestion/tests/integration/sac/sac_mces_golden.json (1 hunks)
  • metadata-ingestion/tests/integration/sac/test_sac.py (1 hunks)
  • metadata-service/war/src/main/resources/boot/data_platforms.json (1 hunks)
Files skipped from review due to trivial changes (1)
  • metadata-ingestion/docs/sources/sac/sac_recipe.yml
Files skipped from review as they are similar to previous changes (15)
  • datahub-web-react/src/app/ingest/source/builder/RecipeForm/constants.ts
  • datahub-web-react/src/app/ingest/source/builder/RecipeForm/sac.ts
  • datahub-web-react/src/app/ingest/source/builder/SelectTemplateStep.tsx
  • datahub-web-react/src/app/ingest/source/builder/constants.ts
  • datahub-web-react/src/app/ingest/source/builder/sources.json
  • datahub-web-react/src/app/ingest/source/conf/sac/sac.ts
  • datahub-web-react/src/app/ingest/source/conf/sources.tsx
  • metadata-ingestion/setup.py
  • metadata-ingestion/src/datahub/ingestion/source/common/subtypes.py
  • metadata-ingestion/src/datahub/ingestion/source/sac/sac.py
  • metadata-ingestion/src/datahub/ingestion/source/sac/sac_common.py
  • metadata-ingestion/src/datahub/utilities/logging_manager.py
  • metadata-ingestion/tests/integration/sac/metadata.xml
  • metadata-ingestion/tests/integration/sac/sac_mces_golden.json
  • metadata-ingestion/tests/integration/sac/test_sac.py
Additional context used
LanguageTool
metadata-ingestion/docs/sources/sac/sac_pre.md

[duplication] ~5-~5: Possible typo: you repeated a word
Context: ...llowing properties: - Purpose: API Access - Access: - Data Import Service - Au...

(ENGLISH_WORD_REPEAT_RULE)


[uncategorized] ~12-~12: Possible missing comma found.
Context: ...oud to platforms, platform instances or environments the connection_mapping configuration ...

(AI_HYDRA_LEO_MISSING_COMMA)

Additional comments not posted (1)
metadata-service/war/src/main/resources/boot/data_platforms.json (1)

667-676: Ensure logo URL is correct.

Verify that the logo URL /assets/platforms/saclogo.svg is correct and the file exists in the specified location.

@Masterchen09 Masterchen09 force-pushed the feat-sap-analytics-cloud branch from 6befa6a to 3beb17a Compare July 26, 2024 19:35
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Outside diff range, codebase verification and nitpick comments (5)
metadata-ingestion/docs/sources/sac/sac_pre.md (5)

1-1: Add a comma after 'Currently'.

A comma is needed after "Currently" for better readability.

- Currently only SAP BW and SAP HANA are supported for ingesting the upstream lineages of Live Data Models - a warning is logged for all other connection types, please feel free to open an [issue on GitHub](https://github.com/datahub-project/datahub/issues/new/choose) with the warning message to have this fixed.
+ Currently, only SAP BW and SAP HANA are supported for ingesting the upstream lineages of Live Data Models - a warning is logged for all other connection types, please feel free to open an [issue on GitHub](https://github.com/datahub-project/datahub/issues/new/choose) with the warning message to have this fixed.

12-12: Add a comma after 'environments'.

A comma is needed after "environments" for better readability.

- To map individual connections in SAP Analytics Cloud to platforms, platform instances or environments the `connection_mapping` configuration can be used within the recipe:
+ To map individual connections in SAP Analytics Cloud to platforms, platform instances or environments, the `connection_mapping` configuration can be used within the recipe:
Tools
LanguageTool

[uncategorized] ~12-~12: Possible missing comma found.
Context: ...oud to platforms, platform instances or environments the connection_mapping configuration ...

(AI_HYDRA_LEO_MISSING_COMMA)


40-43: Remove comma before 'because'.

The comma before "because" is unnecessary and should be removed.

- Only models which are used in a Story or an Application will be ingested, because there is no dedicated API to retrieve models (only for Stories and Applications).
- Browse Paths for models cannot be created, because the folder where the models are saved is not returned by the API.
- Schema metadata is only ingested for Import Data Models, because there is no possibility to get the schema metadata of the other model types.
- Lineages for Import Data Models cannot be ingested, because the API is not providing any information about it.
+ Only models which are used in a Story or an Application will be ingested because there is no dedicated API to retrieve models (only for Stories and Applications).
+ Browse Paths for models cannot be created because the folder where the models are saved is not returned by the API.
+ Schema metadata is only ingested for Import Data Models because there is no possibility to get the schema metadata of the other model types.
+ Lineages for Import Data Models cannot be ingested because the API is not providing any information about it.

44-44: Add comma after 'Currently'.

A comma is needed after "Currently" for better readability.

- Currently only SAP BW and SAP HANA are supported for ingesting the upstream lineages of Live Data Models - a warning is logged for all other connection types, please feel free to open an [issue on GitHub](https://github.com/datahub-project/datahub/issues/new/choose) with the warning message to have this fixed.
+ Currently, only SAP BW and SAP HANA are supported for ingesting the upstream lineages of Live Data Models - a warning is logged for all other connection types, please feel free to open an [issue on GitHub](https://github.com/datahub-project/datahub/issues/new/choose) with the warning message to have this fixed.

45-45: Separate 'therefore' from the sentence.

The word "therefore" should be separated from the sentence.

- For some models (e.g., builtin models) it cannot be detected whether the models are Live Data or Import Data Models, therefore these models will be ingested only with the `Story` subtype.
+ For some models (e.g., builtin models) it cannot be detected whether the models are Live Data or Import Data Models. Therefore, these models will be ingested only with the `Story` subtype.
Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 6befa6a and 3beb17a.

Files ignored due to path filters (1)
  • datahub-web-react/src/images/saclogo.svg is excluded by !**/*.svg
Files selected for processing (18)
  • datahub-web-react/src/app/ingest/source/builder/RecipeForm/constants.ts (3 hunks)
  • datahub-web-react/src/app/ingest/source/builder/RecipeForm/sac.ts (1 hunks)
  • datahub-web-react/src/app/ingest/source/builder/SelectTemplateStep.tsx (1 hunks)
  • datahub-web-react/src/app/ingest/source/builder/constants.ts (3 hunks)
  • datahub-web-react/src/app/ingest/source/builder/sources.json (1 hunks)
  • datahub-web-react/src/app/ingest/source/conf/sac/sac.ts (1 hunks)
  • datahub-web-react/src/app/ingest/source/conf/sources.tsx (2 hunks)
  • metadata-ingestion/docs/sources/sac/sac_pre.md (1 hunks)
  • metadata-ingestion/docs/sources/sac/sac_recipe.yml (1 hunks)
  • metadata-ingestion/setup.py (3 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/common/subtypes.py (2 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/sac/sac.py (1 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/sac/sac_common.py (1 hunks)
  • metadata-ingestion/src/datahub/utilities/logging_manager.py (1 hunks)
  • metadata-ingestion/tests/integration/sac/metadata.xml (1 hunks)
  • metadata-ingestion/tests/integration/sac/sac_mces_golden.json (1 hunks)
  • metadata-ingestion/tests/integration/sac/test_sac.py (1 hunks)
  • metadata-service/war/src/main/resources/boot/data_platforms.json (1 hunks)
Files skipped from review due to trivial changes (2)
  • datahub-web-react/src/app/ingest/source/builder/RecipeForm/sac.ts
  • metadata-ingestion/docs/sources/sac/sac_recipe.yml
Files skipped from review as they are similar to previous changes (14)
  • datahub-web-react/src/app/ingest/source/builder/RecipeForm/constants.ts
  • datahub-web-react/src/app/ingest/source/builder/SelectTemplateStep.tsx
  • datahub-web-react/src/app/ingest/source/builder/constants.ts
  • datahub-web-react/src/app/ingest/source/builder/sources.json
  • datahub-web-react/src/app/ingest/source/conf/sac/sac.ts
  • datahub-web-react/src/app/ingest/source/conf/sources.tsx
  • metadata-ingestion/setup.py
  • metadata-ingestion/src/datahub/ingestion/source/common/subtypes.py
  • metadata-ingestion/src/datahub/ingestion/source/sac/sac.py
  • metadata-ingestion/src/datahub/ingestion/source/sac/sac_common.py
  • metadata-ingestion/src/datahub/utilities/logging_manager.py
  • metadata-ingestion/tests/integration/sac/metadata.xml
  • metadata-ingestion/tests/integration/sac/sac_mces_golden.json
  • metadata-ingestion/tests/integration/sac/test_sac.py
Additional context used
LanguageTool
metadata-ingestion/docs/sources/sac/sac_pre.md

[duplication] ~5-~5: Possible typo: you repeated a word
Context: ...llowing properties: - Purpose: API Access - Access: - Data Import Service - Au...

(ENGLISH_WORD_REPEAT_RULE)


[uncategorized] ~12-~12: Possible missing comma found.
Context: ...oud to platforms, platform instances or environments the connection_mapping configuration ...

(AI_HYDRA_LEO_MISSING_COMMA)

Additional comments not posted (1)
metadata-service/war/src/main/resources/boot/data_platforms.json (1)

669-676: LGTM!

The new entry for SAP Analytics Cloud is correctly added and consistent with other entries.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 3beb17a and b56983f.

Files ignored due to path filters (1)
  • datahub-web-react/src/images/saclogo.svg is excluded by !**/*.svg
Files selected for processing (18)
  • datahub-web-react/src/app/ingest/source/builder/RecipeForm/constants.ts (3 hunks)
  • datahub-web-react/src/app/ingest/source/builder/RecipeForm/sac.ts (1 hunks)
  • datahub-web-react/src/app/ingest/source/builder/SelectTemplateStep.tsx (1 hunks)
  • datahub-web-react/src/app/ingest/source/builder/constants.ts (3 hunks)
  • datahub-web-react/src/app/ingest/source/builder/sources.json (1 hunks)
  • datahub-web-react/src/app/ingest/source/conf/sac/sac.ts (1 hunks)
  • datahub-web-react/src/app/ingest/source/conf/sources.tsx (2 hunks)
  • metadata-ingestion/docs/sources/sac/sac_pre.md (1 hunks)
  • metadata-ingestion/docs/sources/sac/sac_recipe.yml (1 hunks)
  • metadata-ingestion/setup.py (3 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/common/subtypes.py (2 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/sac/sac.py (1 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/sac/sac_common.py (1 hunks)
  • metadata-ingestion/src/datahub/utilities/logging_manager.py (1 hunks)
  • metadata-ingestion/tests/integration/sac/metadata.xml (1 hunks)
  • metadata-ingestion/tests/integration/sac/sac_mces_golden.json (1 hunks)
  • metadata-ingestion/tests/integration/sac/test_sac.py (1 hunks)
  • metadata-service/war/src/main/resources/boot/data_platforms.json (1 hunks)
Files skipped from review due to trivial changes (2)
  • datahub-web-react/src/app/ingest/source/builder/RecipeForm/sac.ts
  • metadata-ingestion/src/datahub/utilities/logging_manager.py
Files skipped from review as they are similar to previous changes (15)
  • datahub-web-react/src/app/ingest/source/builder/RecipeForm/constants.ts
  • datahub-web-react/src/app/ingest/source/builder/SelectTemplateStep.tsx
  • datahub-web-react/src/app/ingest/source/builder/constants.ts
  • datahub-web-react/src/app/ingest/source/builder/sources.json
  • datahub-web-react/src/app/ingest/source/conf/sac/sac.ts
  • datahub-web-react/src/app/ingest/source/conf/sources.tsx
  • metadata-ingestion/docs/sources/sac/sac_recipe.yml
  • metadata-ingestion/setup.py
  • metadata-ingestion/src/datahub/ingestion/source/common/subtypes.py
  • metadata-ingestion/src/datahub/ingestion/source/sac/sac.py
  • metadata-ingestion/src/datahub/ingestion/source/sac/sac_common.py
  • metadata-ingestion/tests/integration/sac/metadata.xml
  • metadata-ingestion/tests/integration/sac/sac_mces_golden.json
  • metadata-ingestion/tests/integration/sac/test_sac.py
  • metadata-service/war/src/main/resources/boot/data_platforms.json
Additional context used
LanguageTool
metadata-ingestion/docs/sources/sac/sac_pre.md

[duplication] ~5-~5: Possible typo: you repeated a word
Context: ...llowing properties: - Purpose: API Access - Access: - Data Import Service - Au...

(ENGLISH_WORD_REPEAT_RULE)

Additional comments not posted (7)
metadata-ingestion/docs/sources/sac/sac_pre.md (7)

40-40: Remove comma before 'because'

The comma before "because" is unnecessary and should be removed.

- Only models which are used in a Story or an Application will be ingested, because there is no dedicated API to retrieve models (only for Stories and Applications).
+ Only models which are used in a Story or an Application will be ingested because there is no dedicated API to retrieve models (only for Stories and Applications).

41-41: Remove comma before 'because'

The comma before "because" is unnecessary and should be removed.

- Browse Paths for models cannot be created, because the folder where the models are saved is not returned by the API.
+ Browse Paths for models cannot be created because the folder where the models are saved is not returned by the API.

42-42: Remove comma before 'because'

The comma before "because" is unnecessary and should be removed.

- Schema metadata is only ingested for Import Data Models, because there is no possibility to get the schema metadata of the other model types.
+ Schema metadata is only ingested for Import Data Models because there is no possibility to get the schema metadata of the other model types.

43-43: Remove comma before 'because'

The comma before "because" is unnecessary and should be removed.

- Lineages for Import Data Models cannot be ingested, because the API is not providing any information about it.
+ Lineages for Import Data Models cannot be ingested because the API is not providing any information about it.

44-44: Add comma after 'Currently'

A comma is needed after "Currently".

- Currently only SAP BW and SAP HANA are supported for ingesting the upstream lineages of Live Data Models - a warning is logged for all other connection types, please feel free to open an [issue on GitHub](https://github.com/datahub-project/datahub/issues/new/choose) with the warning message to have this fixed.
+ Currently, only SAP BW and SAP HANA are supported for ingesting the upstream lineages of Live Data Models - a warning is logged for all other connection types, please feel free to open an [issue on GitHub](https://github.com/datahub-project/datahub/issues/new/choose) with the warning message to have this fixed.

45-45: Separate 'therefore' from the sentence

The word "therefore" should be separated from the sentence.

- For some models (e.g., builtin models) it cannot be detected whether the models are Live Data or Import Data Models, therefore these models will be ingested only with the `Story` subtype.
+ For some models (e.g., builtin models) it cannot be detected whether the models are Live Data or Import Data Models. Therefore, these models will be ingested only with the `Story` subtype.

5-5: Possible typo: repeated word "Access"

The word "Access" is repeated, which might be a typo.

-    - Access:
+    - Data Import Service

Likely invalid or redundant comment.

Tools
LanguageTool

[duplication] ~5-~5: Possible typo: you repeated a word
Context: ...llowing properties: - Purpose: API Access - Access: - Data Import Service - Au...

(ENGLISH_WORD_REPEAT_RULE)

@Masterchen09 Masterchen09 force-pushed the feat-sap-analytics-cloud branch from b56983f to 7d5ee53 Compare July 28, 2024 12:33
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between b56983f and 7d5ee53.

Files ignored due to path filters (1)
  • datahub-web-react/src/images/saclogo.svg is excluded by !**/*.svg
Files selected for processing (18)
  • datahub-web-react/src/app/ingest/source/builder/RecipeForm/constants.ts (3 hunks)
  • datahub-web-react/src/app/ingest/source/builder/RecipeForm/sac.ts (1 hunks)
  • datahub-web-react/src/app/ingest/source/builder/SelectTemplateStep.tsx (1 hunks)
  • datahub-web-react/src/app/ingest/source/builder/constants.ts (3 hunks)
  • datahub-web-react/src/app/ingest/source/builder/sources.json (1 hunks)
  • datahub-web-react/src/app/ingest/source/conf/sac/sac.ts (1 hunks)
  • datahub-web-react/src/app/ingest/source/conf/sources.tsx (2 hunks)
  • metadata-ingestion/docs/sources/sac/sac_pre.md (1 hunks)
  • metadata-ingestion/docs/sources/sac/sac_recipe.yml (1 hunks)
  • metadata-ingestion/setup.py (4 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/common/subtypes.py (2 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/sac/sac.py (1 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/sac/sac_common.py (1 hunks)
  • metadata-ingestion/src/datahub/utilities/logging_manager.py (1 hunks)
  • metadata-ingestion/tests/integration/sac/metadata.xml (1 hunks)
  • metadata-ingestion/tests/integration/sac/sac_mces_golden.json (1 hunks)
  • metadata-ingestion/tests/integration/sac/test_sac.py (1 hunks)
  • metadata-service/war/src/main/resources/boot/data_platforms.json (1 hunks)
Files skipped from review due to trivial changes (1)
  • datahub-web-react/src/app/ingest/source/builder/RecipeForm/sac.ts
Files skipped from review as they are similar to previous changes (14)
  • datahub-web-react/src/app/ingest/source/builder/RecipeForm/constants.ts
  • datahub-web-react/src/app/ingest/source/builder/SelectTemplateStep.tsx
  • datahub-web-react/src/app/ingest/source/builder/constants.ts
  • datahub-web-react/src/app/ingest/source/builder/sources.json
  • datahub-web-react/src/app/ingest/source/conf/sac/sac.ts
  • datahub-web-react/src/app/ingest/source/conf/sources.tsx
  • metadata-ingestion/docs/sources/sac/sac_recipe.yml
  • metadata-ingestion/setup.py
  • metadata-ingestion/src/datahub/ingestion/source/sac/sac.py
  • metadata-ingestion/src/datahub/ingestion/source/sac/sac_common.py
  • metadata-ingestion/src/datahub/utilities/logging_manager.py
  • metadata-ingestion/tests/integration/sac/metadata.xml
  • metadata-ingestion/tests/integration/sac/sac_mces_golden.json
  • metadata-ingestion/tests/integration/sac/test_sac.py
Additional context used
LanguageTool
metadata-ingestion/docs/sources/sac/sac_pre.md

[duplication] ~5-~5: Possible typo: you repeated a word
Context: ...llowing properties: - Purpose: API Access - Access: - Data Import Service - Au...

(ENGLISH_WORD_REPEAT_RULE)

Additional comments not posted (9)
metadata-ingestion/src/datahub/ingestion/source/common/subtypes.py (2)

21-23: New constants for SAP Analytics Cloud added to DatasetSubTypes.

The new constants SAC_MODEL, SAC_IMPORT_DATA_MODEL, and SAC_LIVE_DATA_MODEL are correctly defined and consistent with existing constants.


77-80: New constants for SAP Analytics Cloud added to BIAssetSubTypes.

The new constants SAC_STORY and SAC_APPLICATION are correctly defined and consistent with existing constants.

metadata-ingestion/docs/sources/sac/sac_pre.md (6)

30-36: Concept mapping is accurate.

The mappings between SAP Analytics Cloud and DataHub concepts are accurate and consistent.


40-40: Remove comma before 'because'.

The comma before "because" is unnecessary and should be removed.


44-44: Add comma after 'Currently'.

A comma is needed after "Currently".


45-45: Separate 'therefore' from the sentence.

The word "therefore" should be separated from the sentence.


1-1: Possible typo: repeated word "Access".

There seems to be a repeated word "Access" in the configuration notes.

-    - Access:
+    - 

Likely invalid or redundant comment.


5-5: Possible typo: repeated word "Access".

There seems to be a repeated word "Access" in the configuration notes.

-    - Access:
+    - 

Likely invalid or redundant comment.

Tools
LanguageTool

[duplication] ~5-~5: Possible typo: you repeated a word
Context: ...llowing properties: - Purpose: API Access - Access: - Data Import Service - Au...

(ENGLISH_WORD_REPEAT_RULE)

metadata-service/war/src/main/resources/boot/data_platforms.json (1)

669-676: New entry for SAP Analytics Cloud added.

The new entry for SAP Analytics Cloud is correctly defined and consistent with existing entries.

Copy link
Collaborator

@hsheth2 hsheth2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mainly focused on the python code in this pass - this is looking pretty good

@@ -471,6 +477,7 @@
"fivetran": snowflake_common | bigquery_common | sqlglot_lib,
"qlik-sense": sqlglot_lib | {"requests", "websocket-client"},
"sigma": sqlglot_lib | {"requests"},
"sac": sac,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imo sac isn't a sufficiently well-know abbreviation - let's name the source sap-analytics-cloud or sap-sac

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least in context of SAP, SAC ist a common abbreviation for SAP Analytics Cloud...if I search for SAC in Google or Bing in an incognito window, besides the Swiss Alpine Club (😂), the next results are for the SAP Analytics Cloud (but of course there is still some kind of geo localization and the results are probably specific for Germany).

-> sap-sac is definitely wrong (the S in SAC stands for SAP) and sap-analytics-cloud is too long...

As the abbreviation would be also used for the platform urn, I would like to stick with SAC...;-)


# SAP Analytics Cloud
SAC_STORY = "Story"
SAC_APPLICATION = "Application"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

metadata-ingestion/tests/integration/sac/test_sac.py Outdated Show resolved Hide resolved
env: PROD
```

The key in the connection mapping is the technical name of the connection resp. its id.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bit confused by this line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed it a bit, not sure if it is less confusing then...

You have to maintain the connections to the different systems in the SAC...there you have to specify a name: https://developers.sap.com/tutorials/hana-cloud-analytics-cloud-consume-2/jcr:content.github-proxy.1633841405.file/step7-I7.png

The key in the connection mapping dictionary is this name and is used to find the connection in the dictionary...

platform_instance = connection.platform_instance
env = connection.env
else:
platform = model.system_type.lower()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it make sense to print some debug logs here? e.g. no connection_mapping found, doing best-effort inference

or are the logs below sufficient?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a log with info level (I think debug is not enough?) - besides the log message you would notice that the best-effort approach was used, because there will be platform instances which are named after the connections in SAC...

elif data_type in ("decimal", "int32"):
return SchemaFieldDataTypeClass(type=NumberTypeClass())
else:
logger.warning(f"Unknown data type {data_type} found")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
logger.warning(f"Unknown data type {data_type} found")
logger.info(f"Unknown data type {data_type} found")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have replaced the log with report_warning -> I think it is better to have this as a warning...usually there are not many possibilities for the data types in import data models - in case there will be ever an unknown data type, we should know it to enhance the method accordingly...I am affraid a log with info level will be missed easily...

if property_type == "DATE":
return SchemaFieldDataTypeClass(type=DateTypeClass())
else:
if data_type == "string":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so property_type is capitalized and data_type is lower case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we shouldn't really question things like this...SAP and REST APIs is a "complicated" topic (at least in context of SAC)...

metadata-ingestion/src/datahub/ingestion/source/sac/sac.py Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 7d5ee53 and 4ad9e23.

Files selected for processing (4)
  • metadata-ingestion/docs/sources/sac/sac_pre.md (1 hunks)
  • metadata-ingestion/src/datahub/ingestion/source/sac/sac.py (1 hunks)
  • metadata-ingestion/src/datahub/utilities/logging_manager.py (1 hunks)
  • metadata-ingestion/tests/integration/sac/test_sac.py (1 hunks)
Files skipped from review as they are similar to previous changes (3)
  • metadata-ingestion/src/datahub/ingestion/source/sac/sac.py
  • metadata-ingestion/src/datahub/utilities/logging_manager.py
  • metadata-ingestion/tests/integration/sac/test_sac.py
Additional context used
LanguageTool
metadata-ingestion/docs/sources/sac/sac_pre.md

[duplication] ~5-~5: Possible typo: you repeated a word
Context: ...llowing properties: - Purpose: API Access - Access: - Data Import Service - Au...

(ENGLISH_WORD_REPEAT_RULE)


[uncategorized] ~12-~12: Possible missing comma found.
Context: ...oud to platforms, platform instances or environments the connection_mapping configuration ...

(AI_HYDRA_LEO_MISSING_COMMA)

Additional comments not posted (7)
metadata-ingestion/docs/sources/sac/sac_pre.md (7)

40-40: Remove comma before 'because'

The comma before "because" is unnecessary and should be removed.


41-41: Remove comma before 'because'

The comma before "because" is unnecessary and should be removed.


42-42: Remove comma before 'because'

The comma before "because" is unnecessary and should be removed.


43-43: Remove comma before 'because'

The comma before "because" is unnecessary and should be removed.


44-44: Add comma after 'Currently'

A comma is needed after "Currently".


45-45: Separate 'therefore' from the sentence

The word "therefore" should be separated from the sentence.


3-3: Possible typo: repeated word

The word "the" might be repeated here.

- The OAuth client is required to have the following properties:
+ The OAuth client is required to have the following properties:

Likely invalid or redundant comment.

metadata-ingestion/docs/sources/sac/sac_pre.md Outdated Show resolved Hide resolved
@Masterchen09
Copy link
Contributor Author

@hsheth2 I have noticed, that the ingestion of the schema metadata of Import Data Models is extremely slow (20 seconds vs. 15 minutes with our productive SAC tenant), because the Import Data Service of the SAC is extremely slow...therefore I have added a config (by default it is enabled) to be able to disable the ingestion of schema metadata.

Additionally the Import Data Service of the SAC is frequently returning an 500 Internal Server Error...therefore I have also added a HTTPAdapter with a Retry to the Session object...ingesting the schema metadata is relatively new (I have implemented it before creating the PR, while we are using the rest of the source for some time): In the last two weeks the Internal Server Error already occurred three times, not sure if the Retry will help to solve this issue, but we will see...

@Masterchen09 Masterchen09 force-pushed the feat-sap-analytics-cloud branch from b43b6e0 to 74e0c7f Compare July 31, 2024 21:37
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between b43b6e0 and 74e0c7f.

Files selected for processing (1)
  • metadata-ingestion/src/datahub/ingestion/source/sac/sac.py (1 hunks)
Files skipped from review as they are similar to previous changes (1)
  • metadata-ingestion/src/datahub/ingestion/source/sac/sac.py

aspect=BrowsePathsV2Class(
path=[
BrowsePathEntryClass(id=folder_name)
for folder_name in resource.ancestor_path.split("/")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any danger of folder names having / in them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/ are replaced with %2F, when the folders which are returned by the API are evaluated:

ancestors = json.loads(entity.ancestorPath)
ancestor_path = "/".join(
ancestor.replace("/", "%2F") for ancestor in ancestors
)

The replacement was originally in there because of the browse paths v1, which are still ingested along the browse paths v2 - do we need the v1 browse paths anymore or should I remove them? - ...in the old browse paths the slashes were used to separate the levels, I don't think the replacement is needed for v2 anymore...? However for the config.folder_pattern it's probably easier if the folders are joined using the slash inkl. the replacement of slashes in folder names...:/

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't really need v1 browse paths anymore, but I agree that having this around makes folder_pattern simpler so it's fine to leave it in


if (b.name === 'custom') {
return -1;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is just to push the "custom" button to the bottom, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly

@hsheth2 hsheth2 added the merge-pending-ci A PR that has passed review and should be merged once CI is green. label Aug 7, 2024
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Outside diff range, codebase verification and nitpick comments (4)
metadata-ingestion/src/datahub/ingestion/source/sac/sac.py (4)

282-374: Consider adding debug logs for traceability.

Adding debug logs in the get_resource_workunits method can help trace the generation of workunits and diagnose issues more effectively.

logger.debug(f"Generating workunits for resource: {resource.resource_id}")

585-686: Add error handling for API calls.

Consider adding error handling for API calls in the get_resources method to enhance robustness and manage potential exceptions.

try:
    entities = self.client.entity_sets.Resources.get_entities()...
except Exception as e:
    logger.error(f"Failed to retrieve resources: {e}")

688-697: Add error handling for API call.

Consider adding error handling for the API call in get_import_data_model_ids to manage potential exceptions and improve robustness.

try:
    response = self.session.get(...)
    response.raise_for_status()
except requests.RequestException as e:
    logger.error(f"Failed to retrieve import data model IDs: {e}")

699-724: Add error handling for API call.

Consider adding error handling for the API call in get_import_data_model_columns to enhance robustness and manage potential exceptions.

try:
    response = self.session.get(...)
    response.raise_for_status()
except requests.RequestException as e:
    logger.error(f"Failed to retrieve columns for model {model_id}: {e}")
Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 74e0c7f and 3dd7c06.

Files selected for processing (1)
  • metadata-ingestion/src/datahub/ingestion/source/sac/sac.py (1 hunks)
Additional comments not posted (10)
metadata-ingestion/src/datahub/ingestion/source/sac/sac.py (10)

93-106: LGTM!

The ConnectionMappingConfig class is well-defined and uses Pydantic's Field appropriately.


109-167: LGTM!

The SACSourceConfig class is well-structured, with comprehensive configuration options and a useful validator for URLs.


200-205: LGTM!

The __init__ method correctly initializes the SACSource with configuration and context.


207-209: LGTM!

The close method properly manages resource cleanup by closing the session and calling the superclass's close method.


211-214: LGTM!

The create method is well-implemented, using a factory pattern to instantiate SACSource.


217-238: LGTM!

The test_connection method is robust, with proper error handling and connectivity checks.


240-250: LGTM!

The get_workunit_processors method effectively extends the base processors with additional functionality for incremental lineage and stale entity removal.


279-280: LGTM!

The get_report method correctly returns the source report.


544-583: LGTM!

The get_sac_connection method is well-implemented, with robust retry logic and proper session configuration.


771-775: LGTM!

The _add_sap_sac_custom_auth_header function is simple and correctly adds the custom authentication header.

Comment on lines +189 to +205
class SACSource(StatefulIngestionSourceBase, TestableSource):
config: SACSourceConfig
report: SACSourceReport
platform = "sac"

session: OAuth2Session
client: pyodata.Client

ingested_dataset_entities: Set[str] = set()
ingested_upstream_dataset_keys: Set[str] = set()

def __init__(self, config: SACSourceConfig, ctx: PipelineContext):
super().__init__(config, ctx)
self.config = config
self.report = SACSourceReport()

self.session, self.client = SACSource.get_sac_connection(self.config)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider refactoring lengthy methods.

The SACSource class is well-organized, but some methods, such as get_workunits_internal, are lengthy and could be broken down into smaller helper methods for better readability and maintainability.

Comment on lines +252 to +278
def get_workunits_internal(self) -> Iterable[MetadataWorkUnit]:
if self.config.ingest_stories or self.config.ingest_applications:
resources = self.get_resources()

for resource in resources:
datasets = []

for resource_model in resource.resource_models:
dataset_urn = make_dataset_urn_with_platform_instance(
platform=self.platform,
name=f"{resource_model.namespace}:{resource_model.model_id}",
platform_instance=self.config.platform_instance,
env=self.config.env,
)

if dataset_urn not in datasets:
datasets.append(dataset_urn)

if dataset_urn in self.ingested_dataset_entities:
continue

self.ingested_dataset_entities.add(dataset_urn)

yield from self.get_model_workunits(dataset_urn, resource_model)

yield from self.get_resource_workunits(resource, datasets)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactor get_workunits_internal for clarity.

The get_workunits_internal method is lengthy and could be refactored into smaller helper methods to enhance readability and maintainability.

Comment on lines +376 to +542
namespace: Optional[str] = None
if len(schema) < len(namespace_with_schema):
namespace = namespace_with_schema[len(f"{schema}.") :]
view = view[:-1]
upstream_dataset_name = self.get_view_name(schema, namespace, view)

if upstream_dataset_name is not None:
if model.connection_id in self.config.connection_mapping:
connection = self.config.connection_mapping[model.connection_id]
platform = (
connection.platform
if connection.platform
else model.system_type.lower()
)
platform_instance = connection.platform_instance
env = connection.env
else:
platform = model.system_type.lower()
platform_instance = model.connection_id
env = DEFAULT_ENV

logger.info(
f"No connection mapping found for connection with id {model.connection_id}, connection id will be used as platform instance"
)

upstream_dataset_urn = make_dataset_urn_with_platform_instance(
platform=platform,
name=upstream_dataset_name,
platform_instance=platform_instance,
env=env,
)

if upstream_dataset_urn not in self.ingested_upstream_dataset_keys:
mcp = MetadataChangeProposalWrapper(
entityUrn=upstream_dataset_urn,
aspect=dataset_urn_to_key(upstream_dataset_urn),
)

yield mcp.as_workunit(is_primary_source=False)

self.ingested_upstream_dataset_keys.add(upstream_dataset_urn)

mcp = MetadataChangeProposalWrapper(
entityUrn=dataset_urn,
aspect=UpstreamLineageClass(
upstreams=[
UpstreamClass(
dataset=upstream_dataset_urn,
type=DatasetLineageTypeClass.COPY,
),
],
),
)

yield mcp.as_workunit()
else:
self.report.report_warning(
"unknown-upstream-dataset",
f"Unknown upstream dataset for model with id {model.namespace}:{model.model_id} and external id {model.external_id}",
)
elif model.system_type is not None:
self.report.report_warning(
"unknown-system-type",
f"Unknown system type {model.system_type} for model with id {model.namespace}:{model.model_id} and external id {model.external_id}",
)

mcp = MetadataChangeProposalWrapper(
entityUrn=dataset_urn,
aspect=StatusClass(
removed=False,
),
)

yield mcp.as_workunit()

if model.external_id and model.connection_id and model.system_type:
type_name = DatasetSubTypes.SAC_LIVE_DATA_MODEL
elif model.is_import:
type_name = DatasetSubTypes.SAC_IMPORT_DATA_MODEL
else:
type_name = DatasetSubTypes.SAC_MODEL

mcp = MetadataChangeProposalWrapper(
entityUrn=dataset_urn,
aspect=SubTypesClass(
typeNames=[type_name],
),
)

yield mcp.as_workunit()

mcp = MetadataChangeProposalWrapper(
entityUrn=dataset_urn,
aspect=DataPlatformInstanceClass(
platform=make_data_platform_urn(self.platform),
instance=self.config.platform_instance,
),
)

yield mcp.as_workunit()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactor get_model_workunits for readability.

The get_model_workunits method is comprehensive but lengthy. Consider breaking it into smaller helper methods to improve readability and maintainability.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 3dd7c06 and 1db83a3.

Files selected for processing (1)
  • metadata-ingestion/setup.py (4 hunks)
Additional comments not posted (3)
metadata-ingestion/setup.py (3)

311-315: Verify the necessity and compatibility of sac dependencies.

Ensure that the added dependencies "requests", "pyodata>=1.11.1", and "Authlib" are necessary for the SAC integration and compatible with existing dependencies.

Verification successful

Dependencies for sac are used and necessary

The sac dependencies "requests", "pyodata>=1.11.1", and "Authlib" are utilized in various parts of the codebase, confirming their necessity for the SAC integration. Specifically, pyodata and Authlib are used in the sac source implementation, while requests is widely used across the codebase. No conflicts were identified.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the necessity and compatibility of `sac` dependencies.

# Test: Check for any conflicting dependencies in the project.
# Expect: No conflicting dependencies.
pip check

Length of output: 82


Script:

#!/bin/bash
# Description: Search for usage of "requests", "pyodata", and "Authlib" in the codebase.

# Search for "requests" usage
rg "requests" -A 3

# Search for "pyodata" usage
rg "pyodata" -A 3

# Search for "Authlib" usage
rg "Authlib" -A 3

Length of output: 258418


734-734: Verify the correctness of the sac entry point definition.

Ensure that the sac entry point is correctly defined and points to the intended module and class SACSource.

Verification successful

The sac entry point is correctly defined.

The SACSource class is present in the expected module datahub.ingestion.source.sac.sac, confirming that the entry point in setup.py is accurate. No further action is needed.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the correctness of the `sac` entry point definition.

# Test: Check for the existence and correctness of the `SACSource` class in the specified module.
# Expect: The `SACSource` class should be defined in the `datahub.ingestion.source.sac.sac` module.
ast-grep --lang python --pattern $'class SACSource { $$$ }'

Length of output: 506


Script:

#!/bin/bash
# Description: Verify the correctness of the `sac` entry point definition.

# Test: Correctly check for the existence of the `SACSource` class in the specified module.
# Expect: The `SACSource` class should be defined in the `datahub.ingestion.source.sac.sac` module.
ast-grep --lang python --pattern 'class SACSource:'

Length of output: 53


Script:

#!/bin/bash
# Description: Search for the `SACSource` class across the entire codebase.

# Test: Use `rg` to find any occurrence of the `SACSource` class.
# Expect: Locate the definition of `SACSource` if it exists anywhere in the repository.
rg 'class SACSource' --type py

Length of output: 349


480-480: Verify the integration and naming conventions of the sac plugin.

Ensure that the sac plugin is correctly integrated into the plugins dictionary and follows the project's naming conventions.

@hsheth2
Copy link
Collaborator

hsheth2 commented Aug 12, 2024

@Masterchen09 looks like the cypress tests that are failing are mutations/ingestion_source.js and mutations/managing_secrets.js, which worries me a bit given the changes here. Not sure if it's a flaky test or a real failure

@Masterchen09
Copy link
Contributor Author

@hsheth2 Thanks for mentioning it, I didn't noticed that some tests are failing.

For mutations/ingestion_source.js it sounds logical, that the test is failing: Because of the sorting Snowflake is not visible anymore without scrolling...I hope this is fixed now.

I have currently no idea regarding mutations/managing_secrets.js- the changes to the UI should have no influence on this...let's see whether the test is passing now or not...

@Masterchen09
Copy link
Contributor Author

@hsheth2 The fix for mutations/ingestion_source.js yesterday solved the issue...the mutations/managing_secrets.js failed again, but I think the error message is just a bit misleading:

AssertionError: Timed out retrying after 10000ms: Expected to find content: 'Successfully created Secret!' but never did.

This is not the reason for the first attempt to fail, only the reason for the second and third attempt...the error message is caused by the failing first attempt, because the secret already exists and will not be created in the second and third attempt - the reason for failing the first attempt is the same reason why mutations/ingestion_source.js failed: Snowflake is not visible without scrolling...I have applied the same fix to mutations/managing_secrets.js which I have added yesterday to mutations/ingestion_source.js...let's see whether the test is successful now.

There is also another fix for the frozen dataclasses...from a functional point view it makes no difference, but is not correct to have a set in a frozen dataclass - now it's a frozenset. ;-)

@hsheth2
Copy link
Collaborator

hsheth2 commented Aug 13, 2024

@Masterchen09 merged master to fix a separate issue with CI. Let's see if the cypress tests pass now

@hsheth2 hsheth2 merged commit ce99bc4 into datahub-project:master Aug 26, 2024
59 of 62 checks passed
@hsheth2
Copy link
Collaborator

hsheth2 commented Aug 26, 2024

CI failures are unrelated (nifi, nested domains), so merging through.

@Masterchen09 Masterchen09 deleted the feat-sap-analytics-cloud branch September 4, 2024 21:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community-contribution PR or Issue raised by member(s) of DataHub Community ingestion PR or Issue related to the ingestion of metadata merge-pending-ci A PR that has passed review and should be merged once CI is green. product PR or Issue related to the DataHub UI/UX
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants