-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Goes on a frenzie to download hundreds (all?) of (.whl) versions of boto3, botocore #12274
Comments
I guess it is just a typical one for https://pip.pypa.io/en/stable/topics/dependency-resolution/#possible-ways-to-reduce-backtracking which INFO messages referred to in this progression a number of times. Unfortunately logging/UI is not really helpful to figure out what is the culprit ... It would have been nice if at the end of the installation there was information about which conflicts were not resolved or what versions of dependencies were not the most latest installed to get versions satisfied, and ideally where those dependencies came from - i.e. which packages specified them. Feel welcome to retitle or close in favor of some already existing issue along those lines. |
This is not uncommon when Pip gets stuck in a backtracking situation with boto3 and botocore. The only way to know if it can resolve a requirment is to download a package and analyze it's metadata, there is no guaranteed way to know ahead of time what will resolve the conflict, and boto3 and botocore make a new release every day so 100s might be downloaded just to determine that's not the cause of the conflict Further dependency resolution is an NP hard problem so it can be quite tricky to provide any useful information to a user on why backtracking is happening, especially when it does eventually resolve (if there is no solution Pip will print the final blocking conflict). One thing that will help the performance of situations is that when PyPi backfills PEP 658 files Pip will not need to download the full 10 MBs of each package but instead a small file (which you saw in the first few downloads as new packages are enabled for PEP 658 file distribution). I am working on a branch of Pip which is better at reporting errors and it gives like these:
I was initially suspicious of the extra (urllib3[socks]) as there are known performance issues with extras but I tried the branch from this PR and there wasn't any significant improvements. I applied my commits onto that branch and the feedback message changed a little bit (which turns out to be quite helpful in the end):
Do you think this would have been actually helpful for you if it was in the output? It would be useful for me to know whether to continue to pursue that PR. Anyway let's look at the requirements for each of these projects: Selenium: https://github.com/SeleniumHQ/selenium/blob/selenium-4.11.2-python/py/requirements.txt
Botocore: https://github.com/boto/botocore/blob/1.31.4/setup.cfg#L5
Requests: https://github.com/psf/requests/blob/v2.31.0/setup.py#L61
urllib3[socks]: (it seems base urllib3 has no requirements) https://github.com/urllib3/urllib3/blob/2.0.2/pyproject.toml#L56
So, as you can see botocore is not compatible with urllib3 2 and the latest version of Selenium is specifying You can handle this in one of two ways, you can pin
And then run:
You should find that adding this as a requirement or constraint will be significantly faster to resolve. Unfortunately I don't think the "frenzie" is actually incorrect behavior, however it would certainly be useful if Pip was able to better express why it was backtracking. |
Alternatively, if you don't want to add artificial constraints, I found that specifying
Pro:
Con:
|
Yes, while not an official feature(?) my understanding is the design of the resolver is that you can give it "hints" by providing dependencies and providing them in a specific order. However my concern in this example is that older versions of botocore might not put an upper bound on urllib3 in their requirements, but they are still not compatible with urllib3 2. So if you're not paying specific attention the resolver could still decide to pick an older version of botocore with a newer version of urllib3. |
Thank you @notatallshaw and @sanderr ! Per @notatallshaw comment with selenium requirements.txt I thought that indeed selenium reads them out in their setup.py like some projects do so started to compose an issue for them to fix such a practicehere is the unfinished draft: ATM co-installation of the Selenium via Originally dependencies were not strictly versioned, see e.g. https://github.com/SeleniumHQ/selenium/blob/c8a7cb1896fb8cd80a016b00687d4345f4daf799/py/requirements.txt Then in 135b8e291c2cccfafd61974ace25db928b014675 was first move to hardfix some, and that opened the pandora's box. More and more being fixed and then similar to his requests, but in relation to a specific dependency were filed before e.g.
but apparently their ❯ grep Req PKG-INFO
Requires-Python: >=3.8
Requires-Dist: urllib3[socks]>=1.26,<3
Requires-Dist: trio~=0.17
Requires-Dist: trio-websocket~=0.9
Requires-Dist: certifi>=2021.10.8 so it seems that we should be all good on that one here since botocore wants
It would be better than nothing (current behavior) for sure! But because those messages are spit out somewhere intermixed with many other messages -- they might not even be available due to limited size of the terminal history whenever backtracking would be done, which could take awhile like in this case. do you think it would be possible/reasonably easy to implement, if backtracking was triggered, at the end of the installation to include similar message, eg.
which would then alert and point user to those. But I also wonder if further analysis could be done, e.g. going through all installed packages and double-checking that pip installed all desired versions and not went for some other version, e.g.
or I even wonder if smth like following which could be handy
which could hint on what packages did not get their most recent versions installed for some reason which might be due to those strict dependencies specifications etc. WDYT? |
Good catch, the chain of events is a little more complicated, the algorithm the resolver does is something a long the lines of:
So going through the collection process where we count the user collection as a depth of 1, the next depth as 2, etc.. (I'm 1 indexing because it's easier for to count the requirements or packages that led to a package being collected) Here is a highly curated series of steps that caused significant backtracking, I'll do requirements in
Hope that makes sense. Previous versions of Pip would spend time backtracking on unrelated packages, that didn't happen here, it was looking at the right packages it just didn't know that I do wonder if Pip could possibly make the choice that
Well looking at this situation in detail I'm not sure how helpful it was anyway, I will play around with what information can be available and see if I could make a more useful message and look at how to get it into the logs. Adding it at the end might certainly be useful but I don't know enough about the Pip reporter to know how easy it is, I will take a look when I get there. All your other ideas sound great, I however don't have the capacity to add something like those, I'm sure the Pip maintainers would look at a helpful logging message PR if you or someone else is interested in contributing. |
While I have an intuituion that there is a possible optimization here, after trying a few times to get this to work I found I ended up breaking more important optimziations which made backtracking much slower. So for now I am no long pursuring this. |
Thanks for trying @notatallshaw ! |
Can you do a run of this with That'll include useful information about what the dependency resolver is doing, which can help with diagnosing the underlying issue here. :) |
FYI I'm happy to do that as I can reproduce the issue, but I'm quite confident I've already explained why the backtracking occurs in this cause in this comment: #12274 (comment) I personally don't see any obvious optimization available to Pip / resolvelib in this case. One can intuit that the wrong package is being backtracked on by looking at the larger graph, but I don't know how easy that is to express in terms of how the current backtrack algorithm is implemented. |
I realized that if my explanation is correct it follows that his would be a more minimal reproducible example:
And indeed it exhibits the exact same behvior, but this example produces a log file too big to upload to gist, so here is a version which doesn't backtrack as much:
And here is the is the raw gist (because even with this greatly reduced version the non-raw gist is truncated ): https://gist.githubusercontent.com/notatallshaw/0bd205eab179a09ba57782aab2c784bb/raw/76e71e5d881fc39937109e120f2896e2585e6b42/log.txt |
Seems like OPs problem is much better now, and this issue can be closed. Pip made a new release which is more likely to help it not go into a frenzie when dealing with extras, and in the mean time the ecosystem has improved significantly to support urllib3 2.0. I am only able to reproduce OPs issue to a much smaller extent now by carefully chosing upperbounds on package versions, or by using pypi-timemachine. I can't reproduce it significantly just be using the given requirements. |
I am going to close this issue as OPs requirements no longer cause this problem. While it's possible for the pip resolver to start backtracking many versions, there is a significant improvement on the PyPI index (and any other index that chooses to implement it) that pip first checks metadata files for dependency information meaning that there is a lot smaller downloads. There are still performance and algorithm improvements that can be made, but they are being tracked in other issues. |
Description
Original use case -- this https://github.com/dandi/dandi-api-webshots-tools/blob/master/requirements.txt , boiled down to combination of
dandi
andselenium
which leads to smth likeso you can see it going after all those .patch versions of botocore and boto3 (.144, .143, ...) and then I think other .minor versions and so on...
at the end it does finish with
so with
boto3-1.28.44 botocore-1.31.44
so the most recent ones on pypi and double-checked:and only those installed
(venv) dandi@drogon:/tmp$ ls -ld venv/lib/python3.9/site-packages/boto*info drwxr-xr-x 2 dandi dandi 4096 Sep 11 11:45 venv/lib/python3.9/site-packages/boto3-1.28.44.dist-info drwxr-xr-x 2 dandi dandi 4096 Sep 11 11:45 venv/lib/python3.9/site-packages/botocore-1.31.44.dist-info
Expected behavior
to not query/download all those other versions of boto3 and botocore only to install the most recent one at the end.
pip version
23.2.1
Python version
originally locally 3.11.5 and then on server with 3.9.2
OS
Debian GNU/Linux
How to Reproduce
Output
No response
Code of Conduct
The text was updated successfully, but these errors were encountered: