Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Version syncing: repositories with too many branches/tags fail on syncing #8990

Closed
humitos opened this issue Mar 2, 2022 · 7 comments · Fixed by #10594
Closed

Version syncing: repositories with too many branches/tags fail on syncing #8990

humitos opened this issue Mar 2, 2022 · 7 comments · Fixed by #10594
Assignees
Labels
Accepted Accepted issue on our roadmap Bug A bug Good First Issue Good for new contributors Priority: low Low priority Sprintable Small enough to sprint on

Comments

@humitos
Copy link
Member

humitos commented Mar 2, 2022

I saw there are repositories that have too many branches/tags failing when we try to re-sync their versions. An example of it is jupyterlab which has 61k versions:

▶ git ls-remote https://github.com/jupyterlab/jupyterlab.git | wc -l
61290

It fails on our side because we truncate the output of the command if it's too big: 'Output is too big. Truncated at 4718592 bytes.'. We could remove this restriction, but that will produce a problem in our database when trying to re-sync 61k versions. I'm not sure what's the right approach to fix this issue 😞

This project in particular has ~1300 versions registered in Read the Docs:

In [1]: p = Project.objects.get(slug="jupyterlab")

In [2]: p.versions.count()
Out[2]: 1321

and they have Skip syncing tags feature flag enabled.

In [6]: list(p.features.all())
Out[6]: 
...
 <Feature: Use remote listing in VCS (e.g. git ls-remote) if supported for sync versions feature>,
 <Feature: Skip syncing tags feature>,
...

Sentry issue: https://sentry.io/organizations/read-the-docs/issues/3058924977/

@humitos humitos added Bug A bug Accepted Accepted issue on our roadmap labels Mar 2, 2022
@humitos humitos changed the title Version syncing: repositories with too many branch/tags fail on syncing Version syncing: repositories with too many branches/tags fail on syncing Mar 2, 2022
@aanm
Copy link

aanm commented Mar 4, 2022

I think we are facing this exact issue in https://readthedocs.org/projects/cilium. I can't find a tag that was recently created in https://readthedocs.org/projects/cilium/versions. Any workaround in place?

The only workaround found was to delete some upstream non-used references with git push --delete origin refs/heads/branch_name

@humitos humitos self-assigned this Mar 12, 2022
@humitos humitos added Sprintable Small enough to sprint on Good First Issue Good for new contributors labels Mar 12, 2022
@stsewd
Copy link
Member

stsewd commented Apr 19, 2022

Another option is to cut the output only when saving the command.

@humitos
Copy link
Member Author

humitos commented Apr 20, 2022

It may be worth checking if updating 65k versions on each sync won't destroy our API / database first. I think it could be a problem. In that case, we may want to be able to sort the branches by descending date and keep the last 200 created only or similar.

@humitos
Copy link
Member Author

humitos commented Apr 20, 2022

Another option is to cut the output only when saving the command.

In that case, we will be sending a lot of data over the API, which may take some time, and then it will be discarded. I think we could add an argument to the method that runs these particular commands telling them not to chunk the output's command.

@stsewd
Copy link
Member

stsewd commented Apr 20, 2022

In that case, we will be sending a lot of data over the API

I meant before using the API to save that command.

@humitos humitos added the Priority: low Low priority label May 24, 2022
@dojutsu-user
Copy link
Member

dojutsu-user commented Oct 20, 2022

Hi @humitos @stsewd,

I can take this up, but not entirely sure on what needs to be done here.

@humitos
Copy link
Member Author

humitos commented Oct 24, 2022

@dojutsu-user I'm not sure what is exactly the solution to this problem. I suggested:

we may want to be able to sort the branches by descending date and keep the last 200 created only or similar

So, a good starting point is to figure it out how to do this with Git and all the other supported VCS (not considered to be deprecated soon, see #8840)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Accepted Accepted issue on our roadmap Bug A bug Good First Issue Good for new contributors Priority: low Low priority Sprintable Small enough to sprint on
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants