-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fast-deps: bad interaction with Artifactory leading to zipfile.BadZipFile #8723
Comments
Seems similar to #8701. That one is already fixed in master, it’d be nice if you can try if it fixes this as well. |
I ran across that bug while I was testing for this one, and I agree that it's fixed in master. But even though the exception is the same, the cause seems to be different. Pip's caching is disabled on my work computer where this failure happens. I did reproduce this bug using master. |
It seems like a different bug, then. I’m unfortunately not familiar with the part of code to offer a fix, so I’ll let those who do work on it. |
Hi @tgs, thank you very much for filing this, which will give us better understanding on the current state of support for range requests on package indices in the wild.
At first glance, it seems that fast-deps is mistaking for range request support when it's actually unavailable. Could you please give us the output of the following command?
I suspect that the index is internal so I'm not likely to be able to get my hands on, but my guess is that the bus has something to do with this part of the code pip/src/pip/_internal/network/lazy_wheel.py Lines 63 to 73 in e8f5219
|
The command line didn't quite work, I changed it to omit the GET:
So it does claim to accept byte ranges... weird! I'll see if I can figure out the curl command line for ranges and see if it works. |
Oops, I'm sorry for that.
According to MDN, something like this may give some insight:
|
I piped it through |
Maybe I should also open a ticket on the Artifactory public JIRA. |
Thanks for your input @tgs! FYI range responses have the status code of 206 but I don't know enough to tell if the behavior above is acceptable by the web standards. @dholth, is it legal for servers to advertise for |
I've reported the bug to Artifactory - https://www.jfrog.com/jira/browse/RTFACT-23108 Regardless of whether they fix it, there will probably be old versions of Artifactory still installed for some time, so maybe there will need to be a server version cutoff or something? If that's how the condition is detected - maybe that's too brittle. |
Looks like we'd best parse the range from the response and write to disk accordingly. |
@tgs, thank you for linking the Artifactory ticket. @dholth, the response from Artifactory is not a range one (@tgs included the request in the output above). Considering
I'll make a work around to fallback to whole wheel downloading if a server lies about range request support. I think you might mean to write the content of non-range response to disk, but currently we have to disable caching for range responses so probably it is not the best choice right now. |
@cosmicexplorer, I saw in GH-7819 that you're using Artifactory as well. Does this problem affect you? |
Will report back! |
JFTR I have the exact same problem with devpi. Disabling either fast-deps or 2020-resolver fixes it. |
This would likely be an interaction of fast-deps (which uses the Range header for partial downloads to handle zip files), with Artifactory/devpi/something!? Does this occur with only the 2020-resolver enabled? |
Just |
|
|
Thanks, and that looks like a valid partial response. In other news, files.pythonhosted.org doesn't seem to be supporting partial download anymore and I'm feeling really bad for the effort everyone spent during the summer getting fast-deps implemented. If I have to guess, there might be some issue with caching so the reverse of GH-8701 happened, perhaps. I'll try to investigate a bit deeper later this weekend. |
I would say this is a blocker for #9187. Fast tracking will only ever be fast when this feature becomes no longer experimental and enabled by default (backends also must start supporting partial responses again) |
There are other ways to reduce download size. And you may be surprised to learn that download size is not actually the most significant time waste for really slow resolutions. |
@nanonyme, in my experiments, fast-deps is not even faster in practice in the majority of the cases. Download time optimization in similar approach might rather require support from the server, e.g. pypi/warehouse#8254. |
I will try to find some time to experiment with parallel downloads to see if that will make fast-deps faster. It we could cache the result of a But: if we can't rely on a pypi warehouse to reliably provide partial file ranges, then I might suggest instead storing all of this information locally in a json file or sqlite db (as I did in #7819 https://github.com/pypa/pip/compare/master...cosmicexplorer:requirement-dependencies-cache?expand=1) and using some caching methodology to know when to fetch for each entry. This could avoid any dependency on warehouse development. We can definitely wait for pypi/warehouse#8254, but that won't help pip users resolving against |
I would certainly hope removing old resolver would be pushed out into horizon until one or more of these speedups is in place in both Warehouse and downstream mirrors. That may take years for server features. |
But this local database can only be updated by downloading the whole distribution files, isn’t it? Distribution files are already cached, and it’s already pretty fast to access them (compared to other parts of the resolver), so this probably won’t bring much improvement. Or do you propose making the information remotely available (outside of PyPI) so people can update the local database without downloading the actual distributions? |
Howdy, looks like a bug was filed against warehouse (PyPI) regarding this. I wanted to drop in to note that the CDN for PyPI files (files.pythonhosted.org) has continuously supported Range requests since its inception. The confusion seems to be around some copy pasta Our CDN does not respond perfectly to Range requests when they come in as a HEAD request, but does for valid GET requests with a Range header. Note that many of the examples in this thread include the Because our CDN fetches the full file to cache at edge before responding to Range GET requests, the HEAD is the response for the full object, always. |
What did you want to do?
Trying to install Cython using the experimental features.
Command line:
This works correctly on my home linux computer, but fails at work where we have the following /etc/pip.conf:
It also works in both places when I keep 2020-resolver but omit fast-deps.
The same problem is present on
master
Output
This is the relevant part of the output from pip -v, with pip f17c1d6 installed:
The same part of the output for my home machine where it works, also with f17c1d6
Additional information
Happy to try versions from pull requests if that's helpful. I might also be smart enough to set up a mitmproxy to record the HTTP interaction, but I'm not completely sure.
The text was updated successfully, but these errors were encountered: