Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ingest/unity): enable hive metastore ingestion #9416

Conversation

mayurinehate
Copy link
Collaborator

for unity-catalog enabled databricks workspaces

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

metadata-ingestion/setup.py Show resolved Hide resolved
schema_fields.extend(
get_schema_fields_for_hive_column(
col.name, col.data_type.lower(), description=col.comment
with patch.object(HiveColumnToAvroConverter, "_STRUCT_TYPE_SEPARATOR", " "):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd much rather have converter = HiveColumnToAvroConverter(struct_type_separator=" "); converter.some_method(...)

We really shouldn't need to monkeypatch our own code

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. This change is only done to replace existing logic as that interfered with unity catalog logic , when running unit tests. Let me add the impacted tests in other test batch and add a TODO here about need for this refractor. I'd prefer to do it in a separate PR.

def schema_pattern_should__always_deny_information_schema(
cls, v: AllowDenyPattern
) -> AllowDenyPattern:
v.deny.append(".*\\.information_schema")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this pattern of "there's a few extra db/schemas that we always want to deny" happens in other sources too, and we've been abusing the user-facing allow/deny pattern to set "system" configs.

When we revisit the sql common refactoring, I'd like to think about moving system-level deny patterns to a class variable instead of reusing the user-facing config

@hsheth2 hsheth2 merged commit aac1c55 into datahub-project:master Dec 14, 2023
53 checks passed
Salman-Apptware pushed a commit to Salman-Apptware/datahub that referenced this pull request Dec 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ingestion PR or Issue related to the ingestion of metadata
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants