-
Notifications
You must be signed in to change notification settings - Fork 846
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pr file patch into MAIN #2806
Pr file patch into MAIN #2806
Conversation
```bash Traceback (most recent call last): File "/home/ubuntu/github/virtualenvs/hosted/lib/python3.11/site-packages/celery/app/trace.py", line 451, in trace_task R = retval = fun(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^ File "/home/ubuntu/github/virtualenvs/hosted/lib/python3.11/site-packages/celery/app/trace.py", line 734, in __protected_call__ return self.run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ubuntu/github/augur/augur/tasks/github/pull_requests/files_model/tasks.py", line 18, in process_pull_request_files pull_request_files_model(repo.repo_id, logger, augur_db, manifest.key_auth) File "/home/ubuntu/github/augur/augur/tasks/github/pull_requests/files_model/core.py", line 68, in pull_request_files_model pr_file_rows += [{ ^^ File "/home/ubuntu/github/augur/augur/tasks/github/pull_requests/files_model/core.py", line 68, in <listcomp> pr_file_rows += [{ ^^ File "/home/ubuntu/github/augur/augur/tasks/github/util/gh_graphql_entities.py", line 344, in __iter__ coreData['totalCount'] ~~~~~~~~^^^^^^^^^^^^^^ TypeError: 'NoneType' object is not subscriptable ```
```python if coreData is not None: self.logger.info(f"... core data obtained") else: self.logger.info(f"Helen, the ghost in our machine, did not get a numerical result for core data (value): {data} \n Zero value assigned.") coreData['totalCount'] = 0 except KeyError as e: self.logger.error("Could not extract paginate result because there was no data returned") self.logger.error( ''.join(traceback.format_exception(None, e, e.__traceback__))) self.logger.info(f"Graphql paramters: {params}") return ```
```python try: if coreData is not None: if coreData.get('totalCount') is not None: self.logger.info("... core data obtained") else: self.logger.info(f"Helen, the ghost in our machine, did not get a numerical result for core data (value): {data} \n Zero value assigned.") coreData['totalCount'] = 0 else: self.logger.error("Core data is None, cannot proceed with operations on it.") except KeyError as e: self.logger.error("Could not extract paginate result because there was no data returned") self.logger.error(''.join(traceback.format_exception(None, e, e.__traceback__))) ```
```python coreData = self.extract_paginate_result(data) ``` ```python Traceback (most recent call last): File "/home/ubuntu/github/virtualenvs/hosted/lib/python3.11/site-packages/celery/app/trace.py", line 451, in trace_task R = retval = fun(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^ File "/home/ubuntu/github/virtualenvs/hosted/lib/python3.11/site-packages/celery/app/trace.py", line 734, in __protected_call__ return self.run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ubuntu/github/augur/augur/tasks/github/pull_requests/files_model/tasks.py", line 18, in process_pull_request_files pull_request_files_model(repo.repo_id, logger, augur_db, manifest.key_auth) File "/home/ubuntu/github/augur/augur/tasks/github/pull_requests/files_model/core.py", line 68, in pull_request_files_model pr_file_rows += [{ ^^ File "/home/ubuntu/github/augur/augur/tasks/github/pull_requests/files_model/core.py", line 68, in <listcomp> pr_file_rows += [{ ^^ File "/home/ubuntu/github/augur/augur/tasks/github/util/gh_graphql_entities.py", line 341, in __iter__ coreData = self.extract_paginate_result(data) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ubuntu/github/augur/augur/tasks/github/util/gh_graphql_entities.py", line 253, in extract_paginate_result raise TimeoutError("No data received from endpoint.") TimeoutError: No data received from endpoint. ```
…ncremental inserts. Error Details: - OS Level OOM Error Log - Augur error log - Database state showing that the killed collection process leaves all subsequent core tasks hanging after the OOM. ```bash May 11 08:40:26 ip-172-31-43-26 kernel: [196984.540841] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name May 11 08:40:26 ip-172-31-43-26 kernel: [196984.543107] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=user.slice,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1000.slice/session-330.scope,task=celery,pid=2787413,uid=1000 May 11 08:40:26 ip-172-31-43-26 kernel: [196984.543138] Out of memory: Killed process 2787413 (celery) total-vm:14657984kB, anon-rss:8728064kB, file-rss:5632kB, shmem-rss:0kB, UID:1000 pgtables:17844kB oom_score_adj:0 May 11 11:33:17 ip-172-31-43-26 kernel: [207355.229829] Softwar~cThread invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=0 May 11 11:33:17 ip-172-31-43-26 kernel: [207355.229874] oom_kill_process+0x10c/0x1b0 May 11 11:33:17 ip-172-31-43-26 kernel: [207355.229884] __alloc_pages_may_oom+0x114/0x1e0 May 11 11:33:17 ip-172-31-43-26 kernel: [207355.230235] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name May 11 11:33:17 ip-172-31-43-26 kernel: [207355.232557] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=user.slice,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1000.slice/session-330.scope,task=celery,pid=2787363,uid=1000 May 11 11:33:17 ip-172-31-43-26 kernel: [207355.232580] Out of memory: Killed process 2787363 (celery) total-vm:14415148kB, anon-rss:8500952kB, file-rss:9216kB, shmem-rss:0kB, UID:1000 pgtables:17364kB oom_score_adj:0 augur.tasks.github.pull_requests.tasks.collect_pull_requests cf2337d6-6adb-4b0b-a47d-cfd74c01d86a Name augur.tasks.github.pull_requests.tasks.collect_pull_requests UUID cf2337d6-6adb-4b0b-a47d-cfd74c01d86a State FAILURE args ('https://github.com/kubernetes/kubernetes',) kwargs {} Result None Received 2024-05-11 02:19:36.542103 UTC Started 2024-05-11 02:19:36.543418 UTC Failed 2024-05-11 05:03:50.075814 UTC Retries 0 Worker core:c1d207996f614c648319b1c800672fce@ip-172-31-43-26 Exception WorkerLostError('Worker exited prematurely: signal 9 (SIGKILL) Job: 35.') Timestamp 2024-05-11 05:03:50.075814 UTC Traceback Traceback (most recent call last): File "/home/ubuntu/github/virtualenvs/hosted/lib/python3.11/site-packages/billiard/pool.py", line 1265, in mark_as_worker_lost raise WorkerLostError( billiard.exceptions.WorkerLostError: Worker exited prematurely: signal 9 (SIGKILL) Job: 35. Clock 56697 Root <Task: augur.tasks.start_tasks.augur_collection_monitor(f93b7451-cfb2-42fc-a7b8-4ea8fe59d647) SUCCESS clock:222> Root id f93b7451-cfb2-42fc-a7b8-4ea8fe59d647 Parent <Task: augur.tasks.github.detect_move.tasks.detect_github_repo_move_core(41327ca8-c735-44c5-a536-38acd8968e42) SUCCESS clock:244> Parent id 41327ca8-c735-44c5-a536-38acd8968e42 Children Augur thinks we are still collecting, so it will never get to messages: repo_id core_data_last_collected core_status core_task_id secondary_data_last_collected secondary_status secondary_task_id event_last_collected facade_status facade_data_last_collected facade_task_id core_weight facade_weight secondary_weight issue_pr_sum commit_sum ml_status ml_data_last_collected ml_task_id ml_weight 123948 2024-02-09 04:43:29 Collecting 9bef29e6-9519-4acb-8469-24c6857c8d92 2022-12-20 00:00:00 Success Success 2024-04-10 06:07:05 -1249186522 121948 -52204734437 203819 121948 Pending -12149126357```
…arises from not having any files in our annointed programming languages, which are all the most common ones. ```bash Traceback (most recent call last): File "/home/ubuntu/github/virtualenvs/hosted/lib/python3.11/site-packages/celery/app/trace.py", line 451, in trace_task R = retval = fun(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^ File "/home/ubuntu/github/virtualenvs/hosted/lib/python3.11/site-packages/celery/app/trace.py", line 734, in __protected_call__ return self.run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ubuntu/github/augur/augur/tasks/git/dependency_tasks/tasks.py", line 47, in process_ossf_dependency_metrics generate_scorecard(session, repo.repo_id, repo_git) File "/home/ubuntu/github/augur/augur/tasks/git/dependency_tasks/core.py", line 75, in generate_scorecard required_output = parse_json_from_subprocess_call(session.logger,['./scorecard', command, '--format=json'],cwd=path_to_scorecard) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ubuntu/github/augur/augur/tasks/util/worker_util.py", line 141, in parse_json_from_subprocess_call raise e File "/home/ubuntu/github/augur/augur/tasks/util/worker_util.py", line 138, in parse_json_from_subprocess_call required_output = json.loads(output) ^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.11/json/__init__.py", line 346, in loads return _default_decoder.decode(s) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.11/json/decoder.py", line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.11/json/decoder.py", line 355, in raw_decode raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0) ```
Signed-off-by: Sean P. Goggins <[email protected]>
Signed-off-by: Andrew Brain <[email protected]>
Signed-off-by: Andrew Brain <[email protected]>
Signed-off-by: Sean P. Goggins <[email protected]>
Signed-off-by: Andrew Brain <[email protected]>
…t using a materialized view. Signed-off-by: Sean P. Goggins <[email protected]>
update check for pr_file_patch
@@ -172,7 +172,7 @@ def determine_worker_processes(ratio,maximum): | |||
sleep_time += 6 | |||
|
|||
#20% of estimate, Maximum value of 25 | |||
secondary_num_processes = determine_worker_processes(.25, 25) | |||
secondary_num_processes = determine_worker_processes(.25, 45) | |||
logger.info(f"Starting secondary worker processes with concurrency={secondary_num_processes}") | |||
secondary_worker = f"celery -A augur.tasks.init.celery_app.celery_app worker -l info --concurrency={secondary_num_processes} -n secondary:{uuid.uuid4().hex}@%h -Q secondary" | |||
process_list.append(subprocess.Popen(secondary_worker.split(" "))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[pylint] reported by reviewdog 🐶
R1732: Consider using 'with' for resource-allocating operations (consider-using-with)
@@ -132,7 +132,7 @@ def determine_worker_processes(ratio,maximum): | |||
sleep_time += 6 | |||
|
|||
#20% of estimate, Maximum value of 25 | |||
secondary_num_processes = determine_worker_processes(.25, 25) | |||
secondary_num_processes = determine_worker_processes(.25, 45) | |||
logger.info(f"Starting secondary worker processes with concurrency={secondary_num_processes}") | |||
secondary_worker = f"celery -A augur.tasks.init.celery_app.celery_app worker -l info --concurrency={secondary_num_processes} -n secondary:{uuid.uuid4().hex}@%h -Q secondary" | |||
process_list.append(subprocess.Popen(secondary_worker.split(" "))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[pylint] reported by reviewdog 🐶
R1732: Consider using 'with' for resource-allocating operations (consider-using-with)
@@ -37,7 +37,7 @@ def start(): | |||
|
|||
scheduling_worker = f"celery -A augur.tasks.init.celery_app.celery_app worker -l info --concurrency=1 -n scheduling:{uuid.uuid4().hex}@%h -Q scheduling" | |||
core_worker = f"celery -A augur.tasks.init.celery_app.celery_app worker -l info --concurrency=45 -n core:{uuid.uuid4().hex}@%h" | |||
secondary_worker = f"celery -A augur.tasks.init.celery_app.celery_app worker -l info --concurrency=25 -n secondary:{uuid.uuid4().hex}@%h -Q secondary" | |||
secondary_worker = f"celery -A augur.tasks.init.celery_app.celery_app worker -l info --concurrency=45 -n secondary:{uuid.uuid4().hex}@%h -Q secondary" | |||
|
|||
scheduling_worker_process = subprocess.Popen(scheduling_worker.split(" ")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[pylint] reported by reviewdog 🐶
R1732: Consider using 'with' for resource-allocating operations (consider-using-with)
@@ -37,7 +37,7 @@ def start(): | |||
|
|||
scheduling_worker = f"celery -A augur.tasks.init.celery_app.celery_app worker -l info --concurrency=1 -n scheduling:{uuid.uuid4().hex}@%h -Q scheduling" | |||
core_worker = f"celery -A augur.tasks.init.celery_app.celery_app worker -l info --concurrency=45 -n core:{uuid.uuid4().hex}@%h" | |||
secondary_worker = f"celery -A augur.tasks.init.celery_app.celery_app worker -l info --concurrency=25 -n secondary:{uuid.uuid4().hex}@%h -Q secondary" | |||
secondary_worker = f"celery -A augur.tasks.init.celery_app.celery_app worker -l info --concurrency=45 -n secondary:{uuid.uuid4().hex}@%h -Q secondary" | |||
|
|||
scheduling_worker_process = subprocess.Popen(scheduling_worker.split(" ")) | |||
core_worker_process = subprocess.Popen(core_worker.split(" ")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[pylint] reported by reviewdog 🐶
R1732: Consider using 'with' for resource-allocating operations (consider-using-with)
else: | ||
logger.info(f"{owner}/{repo} has no pull requests") | ||
return 0 | ||
|
||
|
||
|
||
# TODO: Rename pull_request_reviewers table to pull_request_requested_reviewers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[pylint] reported by reviewdog 🐶
W0511: TODO: Rename pull_request_reviewers table to pull_request_requested_reviewers (fixme)
else: | ||
logger.info(f"{owner}/{repo} has no pull requests") | ||
return 0 | ||
|
||
|
||
|
||
# TODO: Rename pull_request_reviewers table to pull_request_requested_reviewers | ||
# TODO: Fix column names in pull request labels table |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[pylint] reported by reviewdog 🐶
W0511: TODO: Fix column names in pull request labels table (fixme)
total_count += len(all_data) | ||
all_data.clear() | ||
|
||
if len(all_data): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[pylint] reported by reviewdog 🐶
C1802: Do not use len(SEQUENCE)
without comparison to determine if a sequence is empty (use-implicit-booleaness-not-len)
|
||
return len(pr_data) | ||
if total_count > 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[pylint] reported by reviewdog 🐶
R1705: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it (no-else-return)
|
||
return all_data | ||
|
||
yield page_data | ||
|
||
|
||
def process_pull_requests(pull_requests, task_name, repo_id, logger, augur_db): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[pylint] reported by reviewdog 🐶
R0914: Too many local variables (32/30) (too-many-locals)
select repo_git | ||
from augur_operations.collection_status x, repo y | ||
where x.repo_id = y.repo_id | ||
and {condition_string} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[pylint] reported by reviewdog 🐶
E0606: Possibly using variable 'condition_string' before assignment (possibly-used-before-assignment)
This PR was initially created off the
main
branch as a real time patch. I tested it accordingly.