Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize secondary task recollection #2800

Merged
merged 8 commits into from
Jun 17, 2024
Merged

Conversation

ABrain7710
Copy link
Contributor

Description

Overall

  • Optimized pull_request_files, pull_request_commits, and pull_request_reviews when they are recollected

Implementation Details

  • When building list of repos to collect make the list contain a full_collection boolean flag so start_data_collection knows what to pass
  • In AugurTaskRoutine.start_data_collection iterate through the tuples and pass the repo_git and full_collection flag to each phase
  • Add full_collection flag to every phase's method arguments. This had to be done since AugurTaskRoutine.start_data_collection passes the same arguments for every phase
  • Define database methods to get_secondary_data_last_collected and get prs that have been updated since a date
  • Updated pull_request_files to only get pr numbers for updated prs if full_collection flag is false
  • Updated pull_request_commits to only get pr urls for updated prs if full_collection flag is false
  • Updated pull_request_reviews to only get pr numbers for updated prs if full_collection flag is false

Notes for Reviewers
I have not tested this yet, I will change it from a draft pr to a pr when it is test

Signed commits

  • Yes, I signed my commits.

@ABrain7710 ABrain7710 changed the base branch from main to dev May 17, 2024 02:46
Signed-off-by: Andrew Brain <[email protected]>
@@ -1,7 +1,7 @@
import sqlalchemy as s

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pylint] reported by reviewdog 🐶
C0114: Missing module docstring (missing-module-docstring)

@@ -1,7 +1,7 @@
from celery import chain

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pylint] reported by reviewdog 🐶
C0114: Missing module docstring (missing-module-docstring)

@@ -1,7 +1,7 @@
from celery import chain
import logging

def machine_learning_phase(repo_git):
def machine_learning_phase(repo_git, full_collection):
from augur.tasks.data_analysis.clustering_worker.tasks import clustering_task

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pylint] reported by reviewdog 🐶
C0415: Import outside toplevel (augur.tasks.data_analysis.clustering_worker.tasks.clustering_task) (import-outside-toplevel)

@@ -1,7 +1,7 @@
from celery import chain
import logging

def machine_learning_phase(repo_git):
def machine_learning_phase(repo_git, full_collection):
from augur.tasks.data_analysis.clustering_worker.tasks import clustering_task
from augur.tasks.data_analysis.discourse_analysis.tasks import discourse_analysis_task

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pylint] reported by reviewdog 🐶
C0415: Import outside toplevel (augur.tasks.data_analysis.discourse_analysis.tasks.discourse_analysis_task) (import-outside-toplevel)

@@ -1,7 +1,7 @@
from celery import chain
import logging

def machine_learning_phase(repo_git):
def machine_learning_phase(repo_git, full_collection):
from augur.tasks.data_analysis.clustering_worker.tasks import clustering_task
from augur.tasks.data_analysis.discourse_analysis.tasks import discourse_analysis_task
from augur.tasks.data_analysis.insight_worker.tasks import insight_task

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pylint] reported by reviewdog 🐶
C0415: Import outside toplevel (augur.tasks.data_analysis.insight_worker.tasks.insight_task) (import-outside-toplevel)

@@ -166,7 +166,7 @@ def build_primary_repo_collect_request(session,enabled_phase_names, days_until_c
primary_gitlab_enabled_phases.append(primary_repo_collect_phase_gitlab)

#task success is scheduled no matter what the config says.
def core_task_success_util_gen(repo_git):
def core_task_success_util_gen(repo_git, full_collection):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pylint] reported by reviewdog 🐶
W0613: Unused argument 'full_collection' (unused-argument)

@@ -186,7 +186,7 @@ def build_secondary_repo_collect_request(session,enabled_phase_names, days_until

secondary_enabled_phases.append(secondary_repo_collect_phase)

def secondary_task_success_util_gen(repo_git):
def secondary_task_success_util_gen(repo_git, full_collection):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pylint] reported by reviewdog 🐶
W0613: Unused argument 'full_collection' (unused-argument)

@@ -202,12 +202,12 @@ def build_facade_repo_collect_request(session,enabled_phase_names, days_until_co

facade_enabled_phases.append(facade_phase)

def facade_task_success_util_gen(repo_git):
def facade_task_success_util_gen(repo_git, full_collection):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pylint] reported by reviewdog 🐶
W0613: Unused argument 'full_collection' (unused-argument)

return facade_task_success_util.si(repo_git)

facade_enabled_phases.append(facade_task_success_util_gen)

def facade_task_update_weight_util_gen(repo_git):
def facade_task_update_weight_util_gen(repo_git, full_collection):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pylint] reported by reviewdog 🐶
W0613: Unused argument 'full_collection' (unused-argument)

@@ -222,7 +222,7 @@ def build_ml_repo_collect_request(session,enabled_phase_names, days_until_collec

ml_enabled_phases.append(machine_learning_phase)

def ml_task_success_util_gen(repo_git):
def ml_task_success_util_gen(repo_git, full_collection):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pylint] reported by reviewdog 🐶
W0613: Unused argument 'full_collection' (unused-argument)

@sgoggins
Copy link
Member

@ABrain7710 : What's the status of this one with regards to testing? Should we merge it now that we have done a release.

@ABrain7710
Copy link
Contributor Author

This is complete, but I haven't had a chance to test it yet. It is a bit of a difficult test.It is a very small change though so if it was reviewed throughly it could be probably be merged

@@ -53,6 +73,7 @@ def pull_request_commits_model(repo,logger, key_auth):
logger.info(f"{task_name}: Inserting {len(all_data)} rows")
pr_commits_natural_keys = ["pull_request_id", "repo_id", "pr_cmt_sha"]
bulk_insert_dicts(logger, all_data,PullRequestCommit,pr_commits_natural_keys)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pylint] reported by reviewdog 🐶
E0602: Undefined variable 'bulk_insert_dicts' (undefined-variable)

@@ -328,7 +326,7 @@ def collect_pull_request_review_comments(repo_git: str) -> None:


@celery.task(base=AugurSecondaryRepoCollectionTask)
def collect_pull_request_reviews(repo_git: str) -> None:
def collect_pull_request_reviews(repo_git: str, full_collection: bool) -> None:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pylint] reported by reviewdog 🐶
R0912: Too many branches (16/12) (too-many-branches)

@@ -328,7 +326,7 @@ def collect_pull_request_review_comments(repo_git: str) -> None:


@celery.task(base=AugurSecondaryRepoCollectionTask)
def collect_pull_request_reviews(repo_git: str) -> None:
def collect_pull_request_reviews(repo_git: str, full_collection: bool) -> None:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pylint] reported by reviewdog 🐶
R0915: Too many statements (54/50) (too-many-statements)

Signed-off-by: Andrew Brain <[email protected]>
@ABrain7710 ABrain7710 marked this pull request as ready for review June 17, 2024 04:01
@@ -2,18 +2,18 @@
from augur.tasks.github.pull_requests.commits_model.core import *

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pylint] reported by reviewdog 🐶
W0401: Wildcard import augur.tasks.github.pull_requests.commits_model.core (wildcard-import)

element.decode('utf-8').replace('\x00', ' ') if isinstance(element, bytes) else element
for element in page_data
]
logger.info(f"NUL characters were found in PR Reviews and replaced with spaces.")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pylint] reported by reviewdog 🐶
W1309: Using an f-string that does not have any interpolated variables (f-string-without-interpolation)

logger.info(f"NUL characters were found in PR Reviews and replaced with spaces.")
elif isinstance(page_data, bytes):
page_data = page_data.decode('utf-8').replace('\x00', ' ')
logger.info(f"NUL characters were found in PR Reviews and replaced with spaces.")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pylint] reported by reviewdog 🐶
W1309: Using an f-string that does not have any interpolated variables (f-string-without-interpolation)

contributors = []
for pull_request_id in all_pr_reviews.keys():
contributors = []
for pull_request_id in all_pr_reviews.keys():

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pylint] reported by reviewdog 🐶
C0206: Consider iterating with .items() (consider-using-dict-items)

contributors = []
for pull_request_id in all_pr_reviews.keys():
contributors = []
for pull_request_id in all_pr_reviews.keys():

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pylint] reported by reviewdog 🐶
C0201: Consider iterating the dictionary directly instead of calling .keys() (consider-iterating-dictionary)

pr_reviews = []
for pull_request_id in all_pr_reviews.keys():
pr_reviews = []
for pull_request_id in all_pr_reviews.keys():

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pylint] reported by reviewdog 🐶
C0206: Consider iterating with .items() (consider-using-dict-items)

pr_reviews = []
for pull_request_id in all_pr_reviews.keys():
pr_reviews = []
for pull_request_id in all_pr_reviews.keys():

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pylint] reported by reviewdog 🐶
C0201: Consider iterating the dictionary directly instead of calling .keys() (consider-iterating-dictionary)


pr_reviews = []
pr_reviews_generator = GithubPaginator(pr_review_url, manifest.key_auth, logger)
for page_data, page in pr_reviews_generator.iter_pages():

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pylint] reported by reviewdog 🐶
W0612: Unused variable 'page' (unused-variable)

Copy link
Member

@sgoggins sgoggins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ready for testing!

@sgoggins sgoggins merged commit 6f931ee into dev Jun 17, 2024
9 checks passed
@ABrain7710 ABrain7710 deleted the optimize-secondary-recollection branch June 25, 2024 00:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants