Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pr file patch into MAIN #2806

Merged
merged 24 commits into from
May 23, 2024
Merged

Pr file patch into MAIN #2806

merged 24 commits into from
May 23, 2024

Conversation

sgoggins
Copy link
Member

This PR was initially created off the main branch as a real time patch. I tested it accordingly.

sgoggins and others added 24 commits May 10, 2024 14:39
```bash
Traceback (most recent call last):
  File "/home/ubuntu/github/virtualenvs/hosted/lib/python3.11/site-packages/celery/app/trace.py", line 451, in trace_task
    R = retval = fun(*args, **kwargs)
                 ^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/github/virtualenvs/hosted/lib/python3.11/site-packages/celery/app/trace.py", line 734, in __protected_call__
    return self.run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/github/augur/augur/tasks/github/pull_requests/files_model/tasks.py", line 18, in process_pull_request_files
    pull_request_files_model(repo.repo_id, logger, augur_db, manifest.key_auth)
  File "/home/ubuntu/github/augur/augur/tasks/github/pull_requests/files_model/core.py", line 68, in pull_request_files_model
    pr_file_rows += [{
                    ^^
  File "/home/ubuntu/github/augur/augur/tasks/github/pull_requests/files_model/core.py", line 68, in <listcomp>
    pr_file_rows += [{
                    ^^
  File "/home/ubuntu/github/augur/augur/tasks/github/util/gh_graphql_entities.py", line 344, in __iter__
    coreData['totalCount']
    ~~~~~~~~^^^^^^^^^^^^^^
TypeError: 'NoneType' object is not subscriptable
```
```python
            if coreData is not None:
                self.logger.info(f"... core data obtained")
            else:
                self.logger.info(f"Helen, the ghost in our machine, did not get a numerical result for core data (value): {data} \n Zero value assigned.")
                coreData['totalCount'] = 0
        except KeyError as e:
            self.logger.error("Could not extract paginate result because there was no data returned")
            self.logger.error(
                ''.join(traceback.format_exception(None, e, e.__traceback__)))

            self.logger.info(f"Graphql paramters: {params}")
            return
```
```python
try:
    if coreData is not None:
        if coreData.get('totalCount') is not None:
            self.logger.info("... core data obtained")
        else:
            self.logger.info(f"Helen, the ghost in our machine, did not get a numerical result for core data (value): {data} \n Zero value assigned.")
            coreData['totalCount'] = 0
    else:
        self.logger.error("Core data is None, cannot proceed with operations on it.")
except KeyError as e:
    self.logger.error("Could not extract paginate result because there was no data returned")
    self.logger.error(''.join(traceback.format_exception(None, e, e.__traceback__)))
```
```python
            coreData = self.extract_paginate_result(data)
```

```python
Traceback (most recent call last):
  File "/home/ubuntu/github/virtualenvs/hosted/lib/python3.11/site-packages/celery/app/trace.py", line 451, in trace_task
    R = retval = fun(*args, **kwargs)
                 ^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/github/virtualenvs/hosted/lib/python3.11/site-packages/celery/app/trace.py", line 734, in __protected_call__
    return self.run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/github/augur/augur/tasks/github/pull_requests/files_model/tasks.py", line 18, in process_pull_request_files
    pull_request_files_model(repo.repo_id, logger, augur_db, manifest.key_auth)
  File "/home/ubuntu/github/augur/augur/tasks/github/pull_requests/files_model/core.py", line 68, in pull_request_files_model
    pr_file_rows += [{
                    ^^
  File "/home/ubuntu/github/augur/augur/tasks/github/pull_requests/files_model/core.py", line 68, in <listcomp>
    pr_file_rows += [{
                    ^^
  File "/home/ubuntu/github/augur/augur/tasks/github/util/gh_graphql_entities.py", line 341, in __iter__
    coreData = self.extract_paginate_result(data)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/github/augur/augur/tasks/github/util/gh_graphql_entities.py", line 253, in extract_paginate_result
    raise TimeoutError("No data received from endpoint.")
TimeoutError: No data received from endpoint.
```
…ncremental inserts.

Error Details:
- OS Level OOM Error Log
- Augur error log
- Database state showing that the killed collection process leaves all subsequent core tasks hanging after the OOM.

```bash
May 11 08:40:26 ip-172-31-43-26 kernel: [196984.540841] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
May 11 08:40:26 ip-172-31-43-26 kernel: [196984.543107] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=user.slice,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1000.slice/session-330.scope,task=celery,pid=2787413,uid=1000
May 11 08:40:26 ip-172-31-43-26 kernel: [196984.543138] Out of memory: Killed process 2787413 (celery) total-vm:14657984kB, anon-rss:8728064kB, file-rss:5632kB, shmem-rss:0kB, UID:1000 pgtables:17844kB oom_score_adj:0
May 11 11:33:17 ip-172-31-43-26 kernel: [207355.229829] Softwar~cThread invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=0
May 11 11:33:17 ip-172-31-43-26 kernel: [207355.229874]  oom_kill_process+0x10c/0x1b0
May 11 11:33:17 ip-172-31-43-26 kernel: [207355.229884]  __alloc_pages_may_oom+0x114/0x1e0
May 11 11:33:17 ip-172-31-43-26 kernel: [207355.230235] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
May 11 11:33:17 ip-172-31-43-26 kernel: [207355.232557] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=user.slice,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1000.slice/session-330.scope,task=celery,pid=2787363,uid=1000
May 11 11:33:17 ip-172-31-43-26 kernel: [207355.232580] Out of memory: Killed process 2787363 (celery) total-vm:14415148kB, anon-rss:8500952kB, file-rss:9216kB, shmem-rss:0kB, UID:1000 pgtables:17364kB oom_score_adj:0

augur.tasks.github.pull_requests.tasks.collect_pull_requests cf2337d6-6adb-4b0b-a47d-cfd74c01d86a
Name 	augur.tasks.github.pull_requests.tasks.collect_pull_requests
UUID 	cf2337d6-6adb-4b0b-a47d-cfd74c01d86a
State 	FAILURE
args 	('https://github.com/kubernetes/kubernetes',)
kwargs 	{}
Result 	None
Received 	2024-05-11 02:19:36.542103 UTC
Started 	2024-05-11 02:19:36.543418 UTC
Failed 	2024-05-11 05:03:50.075814 UTC
Retries 	0
Worker 	core:c1d207996f614c648319b1c800672fce@ip-172-31-43-26
Exception 	WorkerLostError('Worker exited prematurely: signal 9 (SIGKILL) Job: 35.')
Timestamp 	2024-05-11 05:03:50.075814 UTC
Traceback

Traceback (most recent call last):
  File "/home/ubuntu/github/virtualenvs/hosted/lib/python3.11/site-packages/billiard/pool.py", line 1265, in mark_as_worker_lost
    raise WorkerLostError(
billiard.exceptions.WorkerLostError: Worker exited prematurely: signal 9 (SIGKILL) Job: 35.

Clock 	56697
Root 	<Task: augur.tasks.start_tasks.augur_collection_monitor(f93b7451-cfb2-42fc-a7b8-4ea8fe59d647) SUCCESS clock:222>
Root id 	f93b7451-cfb2-42fc-a7b8-4ea8fe59d647
Parent 	<Task: augur.tasks.github.detect_move.tasks.detect_github_repo_move_core(41327ca8-c735-44c5-a536-38acd8968e42) SUCCESS clock:244>
Parent id 	41327ca8-c735-44c5-a536-38acd8968e42
Children

Augur thinks we are still collecting, so it will never get to messages:
repo_id	core_data_last_collected	core_status	core_task_id	secondary_data_last_collected	secondary_status	secondary_task_id	event_last_collected	facade_status	facade_data_last_collected	facade_task_id	core_weight	facade_weight	secondary_weight	issue_pr_sum	commit_sum	ml_status	ml_data_last_collected	ml_task_id	ml_weight
123948	2024-02-09 04:43:29	Collecting	9bef29e6-9519-4acb-8469-24c6857c8d92	2022-12-20 00:00:00	Success			Success	2024-04-10 06:07:05		-1249186522	121948	-52204734437	203819	121948	Pending			-12149126357```
…arises from not having any files in our annointed programming languages, which are all the most common ones.

```bash

Traceback (most recent call last):
  File "/home/ubuntu/github/virtualenvs/hosted/lib/python3.11/site-packages/celery/app/trace.py", line 451, in trace_task
    R = retval = fun(*args, **kwargs)
                 ^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/github/virtualenvs/hosted/lib/python3.11/site-packages/celery/app/trace.py", line 734, in __protected_call__
    return self.run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/github/augur/augur/tasks/git/dependency_tasks/tasks.py", line 47, in process_ossf_dependency_metrics
    generate_scorecard(session, repo.repo_id, repo_git)
  File "/home/ubuntu/github/augur/augur/tasks/git/dependency_tasks/core.py", line 75, in generate_scorecard
    required_output = parse_json_from_subprocess_call(session.logger,['./scorecard', command, '--format=json'],cwd=path_to_scorecard)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/github/augur/augur/tasks/util/worker_util.py", line 141, in parse_json_from_subprocess_call
    raise e
  File "/home/ubuntu/github/augur/augur/tasks/util/worker_util.py", line 138, in parse_json_from_subprocess_call
    required_output = json.loads(output)
                      ^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

```
Signed-off-by: Sean P. Goggins <[email protected]>
Signed-off-by: Andrew Brain <[email protected]>
Signed-off-by: Andrew Brain <[email protected]>
Signed-off-by: Sean P. Goggins <[email protected]>
…t using a materialized view.

Signed-off-by: Sean P. Goggins <[email protected]>
update check for pr_file_patch
@@ -172,7 +172,7 @@ def determine_worker_processes(ratio,maximum):
sleep_time += 6

#20% of estimate, Maximum value of 25
secondary_num_processes = determine_worker_processes(.25, 25)
secondary_num_processes = determine_worker_processes(.25, 45)
logger.info(f"Starting secondary worker processes with concurrency={secondary_num_processes}")
secondary_worker = f"celery -A augur.tasks.init.celery_app.celery_app worker -l info --concurrency={secondary_num_processes} -n secondary:{uuid.uuid4().hex}@%h -Q secondary"
process_list.append(subprocess.Popen(secondary_worker.split(" ")))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pylint] reported by reviewdog 🐶
R1732: Consider using 'with' for resource-allocating operations (consider-using-with)

@@ -132,7 +132,7 @@ def determine_worker_processes(ratio,maximum):
sleep_time += 6

#20% of estimate, Maximum value of 25
secondary_num_processes = determine_worker_processes(.25, 25)
secondary_num_processes = determine_worker_processes(.25, 45)
logger.info(f"Starting secondary worker processes with concurrency={secondary_num_processes}")
secondary_worker = f"celery -A augur.tasks.init.celery_app.celery_app worker -l info --concurrency={secondary_num_processes} -n secondary:{uuid.uuid4().hex}@%h -Q secondary"
process_list.append(subprocess.Popen(secondary_worker.split(" ")))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pylint] reported by reviewdog 🐶
R1732: Consider using 'with' for resource-allocating operations (consider-using-with)

@@ -37,7 +37,7 @@ def start():

scheduling_worker = f"celery -A augur.tasks.init.celery_app.celery_app worker -l info --concurrency=1 -n scheduling:{uuid.uuid4().hex}@%h -Q scheduling"
core_worker = f"celery -A augur.tasks.init.celery_app.celery_app worker -l info --concurrency=45 -n core:{uuid.uuid4().hex}@%h"
secondary_worker = f"celery -A augur.tasks.init.celery_app.celery_app worker -l info --concurrency=25 -n secondary:{uuid.uuid4().hex}@%h -Q secondary"
secondary_worker = f"celery -A augur.tasks.init.celery_app.celery_app worker -l info --concurrency=45 -n secondary:{uuid.uuid4().hex}@%h -Q secondary"

scheduling_worker_process = subprocess.Popen(scheduling_worker.split(" "))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pylint] reported by reviewdog 🐶
R1732: Consider using 'with' for resource-allocating operations (consider-using-with)

@@ -37,7 +37,7 @@ def start():

scheduling_worker = f"celery -A augur.tasks.init.celery_app.celery_app worker -l info --concurrency=1 -n scheduling:{uuid.uuid4().hex}@%h -Q scheduling"
core_worker = f"celery -A augur.tasks.init.celery_app.celery_app worker -l info --concurrency=45 -n core:{uuid.uuid4().hex}@%h"
secondary_worker = f"celery -A augur.tasks.init.celery_app.celery_app worker -l info --concurrency=25 -n secondary:{uuid.uuid4().hex}@%h -Q secondary"
secondary_worker = f"celery -A augur.tasks.init.celery_app.celery_app worker -l info --concurrency=45 -n secondary:{uuid.uuid4().hex}@%h -Q secondary"

scheduling_worker_process = subprocess.Popen(scheduling_worker.split(" "))
core_worker_process = subprocess.Popen(core_worker.split(" "))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pylint] reported by reviewdog 🐶
R1732: Consider using 'with' for resource-allocating operations (consider-using-with)

else:
logger.info(f"{owner}/{repo} has no pull requests")
return 0



# TODO: Rename pull_request_reviewers table to pull_request_requested_reviewers

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pylint] reported by reviewdog 🐶
W0511: TODO: Rename pull_request_reviewers table to pull_request_requested_reviewers (fixme)

else:
logger.info(f"{owner}/{repo} has no pull requests")
return 0



# TODO: Rename pull_request_reviewers table to pull_request_requested_reviewers
# TODO: Fix column names in pull request labels table

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pylint] reported by reviewdog 🐶
W0511: TODO: Fix column names in pull request labels table (fixme)

total_count += len(all_data)
all_data.clear()

if len(all_data):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pylint] reported by reviewdog 🐶
C1802: Do not use len(SEQUENCE) without comparison to determine if a sequence is empty (use-implicit-booleaness-not-len)


return len(pr_data)
if total_count > 0:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pylint] reported by reviewdog 🐶
R1705: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it (no-else-return)


return all_data

yield page_data


def process_pull_requests(pull_requests, task_name, repo_id, logger, augur_db):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pylint] reported by reviewdog 🐶
R0914: Too many local variables (32/30) (too-many-locals)

select repo_git
from augur_operations.collection_status x, repo y
where x.repo_id = y.repo_id
and {condition_string}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pylint] reported by reviewdog 🐶
E0606: Possibly using variable 'condition_string' before assignment (possibly-used-before-assignment)

@sgoggins sgoggins merged commit d8ea7c8 into main May 23, 2024
8 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants