-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(ingestion): add option for optimized skipping of schemas #2209
feat(ingestion): add option for optimized skipping of schemas #2209
Conversation
@hsheth2 This is not the prettiest change I know, but I wanted to just put the PR out there to make any discussion about issue #2208 more concrete. For context, I am stumbling on this when trying to ingest from aws athena where we have in our non-production environment a truly polluted metastore with a ridiculous number of garbage schemas, whereas the number of schemas I would be interested in ingesting from is on the order of <10. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks pretty good! Made some minor suggestions
I ran into this same issue when I was ingesting from BigQuery, so I'm glad we've got a good solution for it
dataset_snapshot.aspects.append(dataset_properties) | ||
schema_metadata = get_schema_metadata( | ||
self.report, dataset_name, platform, columns | ||
if sql_config.schema_pattern.allowed(schema): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use something along these lines:
if not sql_config.schema_pattern.allowed(schema):
self.report.report_dropped(schema)
continue
It will help localize the reporting alongside the if statement and keep the indentation a bit less deep. I was planning on doing this for table_pattern as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree and fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Thanks for adding this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Fixes: #2208
Checklist