Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add PullRequestDiffsStream as child to PullRequestsStream #345

Merged

Conversation

angelachenn
Copy link
Contributor

This PR adds a child stream for pull request diffs to the parent pull requests stream. References Parent-Child Streams and Github REST API documentation.

  • also adds pull_id to the PR stream parent context, making {org, repo, repo_id, pull_number, pull_id}. Gives users more flexibility in using or joining their tables.
  • I followed code patterns that I noticed, except for matching error status codes against their respective messages; this is more to avoid a dependency on Github error messages. If they were to change them for whatever reason, that'd be a breaking change.

Some clarity on the error catching: all of them are related to the diff size being too large for to process and/or stream.

  • 406, 422 and 502 are all from the Github REST API's end.
    • 406 being diff too large either because it exceeds the max # of files (300) or lines changed (20000)
    • 422 being server error because diff is taking too long to generate
    • 502 Server Error
  • PR diffs that are small enough to call via the API but still quite large will result in this error:
Screenshot 2025-01-23 at 4 21 08 PM Modifying the buffer size is risky and doesn't really prevent related issues from resurfacing downstream. Luckily intervening at the parsing step worked, I've set the content-size limit to 40MiB to allow for some overhead.

PRs that cause those errors are skipped. I've tested with a pipeline on my end and it all seems to be working successfully.

2025-01-23 08:16:29,000 | INFO     | tap-github.pull_request_diffs | Skipping PR due to 422 error: Server Error: Sorry, this diff is taking too long to generate.
2025-01-23 08:17:59,294 | INFO     | tap-github.pull_request_diffs | Skipping PR due to 406 error: Sorry, the diff exceeded the maximum number of files (300). Consider using 'List pull requests files' API or locally cloning the repository instead.
2025-01-23 08:43:26,939 | INFO     | tap-github.pull_request_diffs | Skipping PR. The diff size (55.72 MiB) exceeded the maximum size limit of 40 MiB.

My initial brute-force approach was just to shove another call to the diff in the post_process method within PullRequestsStreams based on a config setting, but that results in getting instantly rate-limited lol.

Please advise if I missed anything. Thanks!

@edgarrmondragon edgarrmondragon self-requested a review January 23, 2025 21:34
@edgarrmondragon edgarrmondragon self-assigned this Jan 23, 2025
@edgarrmondragon
Copy link
Member

Thanks @angelachenn! Can you fix the linting issues? I'll review after that.

@angelachenn
Copy link
Contributor Author

Thanks for the quick reply!! Looking into that right now

@angelachenn angelachenn force-pushed the include-pull-request-diffs branch from 63d545f to 25f4dc5 Compare January 23, 2025 21:45
@edgarrmondragon
Copy link
Member

pre-commit.ci autofix

@angelachenn angelachenn force-pushed the include-pull-request-diffs branch from 09f843d to cd2e81a Compare January 23, 2025 21:50
@angelachenn angelachenn force-pushed the include-pull-request-diffs branch from 498537b to 43cabe0 Compare January 23, 2025 22:03
@edgarrmondragon edgarrmondragon added the enhancement New feature or request label Jan 25, 2025
Copy link
Member

@edgarrmondragon edgarrmondragon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @angelachenn!

@edgarrmondragon edgarrmondragon added this pull request to the merge queue Jan 27, 2025
Merged via the queue into MeltanoLabs:main with commit 38c80bd Jan 27, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants