Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve github paginator #2823

Merged
merged 4 commits into from
Jun 17, 2024
Merged

Improve github paginator #2823

merged 4 commits into from
Jun 17, 2024

Conversation

ABrain7710
Copy link
Contributor

Description

  • Created a more generic github data access class that can be used to paginate over a resource or get a single resource if the github endpoint returns a dict. It also handles retries with a library so that we don't have to build our own logic. It uses the response.raise_for_status() httpx method to ensure that an exception is thrown if a failure status code is returned (this allows us to assume that the response is successful if no exceptions occur Also the make_request() method throws a rate limit exception so that the caller can determine how to handle the rate limit, but the higher level methods like paginate resource and get resource handle the rate limit like the github api docs tell us to.

I did this work to support the core recollection work, since I need a reliable method to get the page count of a resource and the old github paginator was not reliable enough.

Currently I only added it to the pull request task, but over time I will start to add it to other tasks as we become more confident in it

Signed commits

  • Yes, I signed my commits.

@@ -4,6 +4,7 @@
from augur.tasks.init.celery_app import celery_app as celery
from augur.tasks.init.celery_app import AugurCoreRepoCollectionTask, AugurSecondaryRepoCollectionTask
from augur.application.db.data_parse import *

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pylint] reported by reviewdog 🐶
W0401: Wildcard import augur.application.db.data_parse (wildcard-import)


return response.json()

# TODO: Handle timeout exceptions better

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pylint] reported by reviewdog 🐶
W0511: TODO: Handle timeout exceptions better (fixme)

@@ -0,0 +1,185 @@
import logging

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pylint] reported by reviewdog 🐶
C0114: Missing module docstring (missing-module-docstring)

from urllib.parse import urlparse, parse_qs, urlencode


class RatelimitException(Exception):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pylint] reported by reviewdog 🐶
C0115: Missing class docstring (missing-class-docstring)

class RatelimitException(Exception):
pass

class UrlNotFoundException(Exception):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pylint] reported by reviewdog 🐶
C0115: Missing class docstring (missing-class-docstring)

class UrlNotFoundException(Exception):
pass

class GithubDataAccess:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pylint] reported by reviewdog 🐶
C0115: Missing class docstring (missing-class-docstring)


return (100 * (num_pages -1)) + len(data)

def paginate_resource(self, url):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pylint] reported by reviewdog 🐶
R1711: Useless return at end of function or method (useless-return)


return int(parse_qs(parsed_url.query)['page'][0])
except (KeyError, ValueError):
raise Exception(f"Unable to parse 'last' url from response: {response.links['last']}")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pylint] reported by reviewdog 🐶
W0707: Consider explicitly re-raising using 'except (KeyError, ValueError) as exc' and 'raise Exception(f"Unable to parse 'last' url from response: {response.links['last']}") from exc' (raise-missing-from)


response = client.request(method=method, url=url, timeout=timeout, follow_redirects=True)

if response.status_code in [403, 429]:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pylint] reported by reviewdog 🐶
R1720: Unnecessary "elif" after "raise", remove the leading "el" from "elif" (no-else-raise)

Copy link
Member

@sgoggins sgoggins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ready to test!

@sgoggins sgoggins merged commit f711042 into dev Jun 17, 2024
8 of 9 checks passed
@ABrain7710 ABrain7710 deleted the redesign-github-paginator branch June 25, 2024 00:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants