Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ingestion): schema, table filtering for redshift-usage #4396

Merged
merged 13 commits into from
Apr 2, 2022

Conversation

Abhiram98
Copy link
Contributor

@Abhiram98 Abhiram98 commented Mar 12, 2022

Filter redshift usage statistics based on schema and table patterns.

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable)

@Abhiram98
Copy link
Contributor Author

@Abhiram98 Abhiram98 marked this pull request as ready for review March 14, 2022 09:33
@@ -223,7 +223,12 @@ def _get_redshift_history(
event_dict["endtime"] = event_dict.get("endtime").__str__()

logger.debug(f"event_dict: {event_dict}")
events.append(event_dict)
# filter based on schema and table pattern
if self.config.schema_pattern.allowed(event_dict['schema'])\
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any plans to include view_pattern as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rslanka, it seems like the script wasn't ingesting view related usage stats in the first place. I've looked at the code and couldn't find anything. Did a small test run as well; view usage didn't get ingested.

"email_domain": "acryl.io",
"include_views": True,
"include_tables": True,
"schema_pattern": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not include a table_pattern as well to this test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I will include this as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this test case

@rslanka
Copy link
Contributor

rslanka commented Mar 14, 2022

@Abhiram98, looks great overall! It would be great if you could address the comments above.

@github-actions
Copy link

github-actions bot commented Mar 16, 2022

Unit Test Results (build & test)

  96 files  ±0    96 suites  ±0   19m 41s ⏱️ - 5m 53s
686 tests ±0  627 ✔️ ±0  59 💤 ±0  0 ±0 

Results for commit f4c8480. ± Comparison against base commit 2fc3a48.

♻️ This comment has been updated with latest results.

@github-actions
Copy link

github-actions bot commented Mar 16, 2022

Unit Test Results (metadata ingestion)

       5 files  ±0         5 suites  ±0   58m 36s ⏱️ - 3m 0s
   394 tests +1     394 ✔️ +1    0 💤 ±0  0 ±0 
1 814 runs  +5  1 783 ✔️ +5  31 💤 ±0  0 ±0 

Results for commit f4c8480. ± Comparison against base commit 2fc3a48.

♻️ This comment has been updated with latest results.

):
events.append(event_dict)
else:
logger.debug(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @rslanka, I have added this to the reporting as well

Copy link
Contributor

@rslanka rslanka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. @Abhiram98 , could you please address the change requested here (and we can merge)?

@rslanka rslanka added the pending-submitter-response Issue/request has been reviewed but requires a response from the submitter label Mar 23, 2022
@rslanka rslanka removed the pending-submitter-response Issue/request has been reviewed but requires a response from the submitter label Mar 31, 2022
Copy link
Contributor

@rslanka rslanka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@rslanka rslanka merged commit 2674272 into datahub-project:master Apr 2, 2022
maggiehays pushed a commit to maggiehays/datahub that referenced this pull request Aug 1, 2022
…project#4396)

* Filter based on table/schema pattern + documentation

Co-authored-by: Ravindra Lanka <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants