Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(metadata-ingestion)glue connector failure when Optional field Type of PartitionKey is absent for a Table #10052

Conversation

siladitya2
Copy link
Contributor

@siladitya2 siladitya2 commented Mar 14, 2024

Issue:
When the Optional field Type of PartintionKeys (PartitionKey["Type"]) is missing of a Table, the glue connector is failing with below exception.

  File "/datahub-ingestion/.local/lib/python3.10/site-packages/datahub/ingestion/source/aws/glue.py", line 1168, in get_schema_metadata
    hive_column_type=partition_key["Type"],
KeyError: 'Type'

Fix:
When the OptionalType field of Partitionkey is missing, we are mapping(avro schema mapping) the unknown Type with null Type for producing the MCE.

UI screenshot:
Screenshot 2024-03-18 at 11 36 31

Additionally, adding a debug logger while extracting data from tables, which will help to debug any issue when connector failed to process data for a particular table.

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

@siladitya2 siladitya2 marked this pull request as draft March 14, 2024 16:56
@github-actions github-actions bot added ingestion PR or Issue related to the ingestion of metadata community-contribution PR or Issue raised by member(s) of DataHub Community datahub-community-champion PRs authored by DataHub Community Champions labels Mar 14, 2024
si-chakraborty added 2 commits March 15, 2024 17:55
@siladitya2 siladitya2 changed the title fix(metadata-ingestion)improve resilience and observability of glue-connector fix(metadata-ingestion)glue connector failure when Optional field Type of PartitionKey is absent Mar 15, 2024
@siladitya2 siladitya2 changed the title fix(metadata-ingestion)glue connector failure when Optional field Type of PartitionKey is absent fix(metadata-ingestion)glue connector failure when Optional field Type of PartitionKey is absent of a Table Mar 15, 2024
@siladitya2 siladitya2 changed the title fix(metadata-ingestion)glue connector failure when Optional field Type of PartitionKey is absent of a Table fix(metadata-ingestion)glue connector failure when Optional field Type of PartitionKey is absent for a Table Mar 15, 2024
@siladitya2 siladitya2 marked this pull request as ready for review March 15, 2024 20:49
@hsheth2 hsheth2 added the merge-pending-ci A PR that has passed review and should be merged once CI is green. label Mar 18, 2024
@treff7es treff7es merged commit 43ac405 into datahub-project:master Mar 20, 2024
52 of 53 checks passed
sleeperdeep pushed a commit to sleeperdeep/datahub that referenced this pull request Jun 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community-contribution PR or Issue raised by member(s) of DataHub Community datahub-community-champion PRs authored by DataHub Community Champions ingestion PR or Issue related to the ingestion of metadata merge-pending-ci A PR that has passed review and should be merged once CI is green.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants