-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pip 20.3.1 wheel command downloads multiple versions of the same package but keeps only one #9271
Comments
FWIW this is actually intentionally “fixed” because the behaviour you expected (keep all wheels downloaded and built during resolution) was reported as a bug. |
#8827 was the report, I think. |
OK; I'm not particularly fussed about the download and save behaviour :) However I do think it's not great that this downloaded 7 versions of the whl to only keep the 8th. This is one case with a small wheel but probably does play out in CI in various ways many, many times. I think it has to do with the large upper-constraints.txt? In theory, this should be a list of packages that all should work together and the resolver should essentially have nothing to do. As I mentioned I didn't have much luck breaking that down to replicate with something smaller. Is there some way to dump why pip thinks it needs to download all the intermediate versions? |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Folks hitting "pip's downloading everything" are likely hitting #9284. |
FWIW the original issue still replicates with 20.3.3
still not sure what about this constraints file make pytest special |
Same issue.
So upgrading all packages, I used the second method to avoid downloading all versions of the same package, I think it is an issue.
|
Confirmed the issue 20.3.3 |
I thought about this for a while, and decided that removing unmatched downloads is the correct behaviour. Additional version downloads happen when pip discovers incompatibilities in the dependency graph and performs backtracking; those downloads are, therefore, unusable in the environment (otherwise pip wouldn’t need to download other things). Both |
What about CI server builds and similar stuff: they do not keep a pip cache. They start from zero. In our case, we build everything from zero (including installing the last pip and starting with no cache) to discover installing issues. This is a valid use case. We do not want to roll new versions up to our clusters without testing the installing process. The previous pip resolver did not need so much time (an almost extra hour) to build everything. Is it possible to postpone this resolver until the metadata could be accessed without downloading gigabytes of pip files? Perhaps with the Simple API or some RESTful service? Disclosure: This is a good reason against investing in python. I do not consider Python mature enough to be used to solve machine learning problems because of this periodical devops drama. Every new issue that does it more difficult is an issue useful to consider other alternatives as Julia, or C++. So keep up. :) |
Sorry but I don’t get what you’re trying to say. This issue is about |
I wouldn't argue that discarding the unnecessary downloads is the right thing to do. We were caught by this because of our admittedly obscure wheel-building scripts that were parsing "Downloading ..." lines and assuming those files existed on disk. My main concern now is that it does download so much only to throw it away. If this is playing out in many other situations among our thousands of CI jobs that's not good, so anything we can do to understand and mitigate it would probably help. It's not 100% clear to me this is the same issue as mentioned in #9271 (comment) but maybe it is? I'm not sure what about my original report triggered just the To refine it a little more; the wheel building is a bit of red-herring I guess ... the extra downloads happen with just
|
Is the issue here “pip downloads too many unnecessary things” or “pip throws away downloads it deems unnecessary”? The topic in this thread is changing from message to message, and I no longer follow what you really think is the issue, and what you think pip should do to fix things for you. pip downloading multiple versions of a package is an infrastructure issue, and has been heavily discussed in pypi/warehouse#8254, #9215, and many other places. pip has to download them, as least for now, due to how Python defines package metadata, to provide you a workable dependency environment. As for the second topic, “throws away downloads” is not an accurate description. The downloads are not thrown away (unless you tell pip to with |
Thanks, I don't think at this point this issue will help. The original issue was the output changing to show downloads that were not then kept in the wheel directory. It has become clear that these extra downloads are not a unique bug and related to the issues you mention. I agree these intermediate downloads should not be in the final output. Unfortunately in a CI situation these downloads are effectively thrown away as we start with fresh environments. |
This comes from OpenStack evironments where we are using an
upper-constraints.txt
file.The following command with upper-constraints.txt
seems to download 8 and discard 7 different versions of pytest for some reason
5.4.3
is the one it sticks with an the .whl file left behind. Watching with strace the others appear to be downloaded, but ultimately unlinked.If you try this with
--use-deprecated=legacy-resolver
it choosespytest-6.2.0-py3-none-any.whl
(wrong I guess, and the new resolver is getting it right), but only downloads it once. So I think the resolver is getting this right, but I don't think it's quite right that it downloads and discards the same package multiple times to get to that point.Unfortunately I haven't yet been able to reduce the
upper-constraints.txt
to something smaller that replicates this. I'm sure it involves transitive dependencies I can't see on the large number of packages specified there.For reference, we've found this because we build our own wheel caches with
pip wheel
. We are parsing the logs to see which wheels pip downloaded, and which we built locally (i.e. not available from pypi and are thus worth keeping in our cache). Our script was assuming that anything pip reports as "Downloading " was on-disk and could be deleted (see here). There are several other occurrences of similar behaviour with other packages visible in the logs linked below, I just pulled this one as an example.Files:
The text was updated successfully, but these errors were encountered: