Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fast-deps: bad interaction with Artifactory leading to zipfile.BadZipFile #8723

Open
tgs opened this issue Aug 6, 2020 · 27 comments
Open

Fast-deps: bad interaction with Artifactory leading to zipfile.BadZipFile #8723

tgs opened this issue Aug 6, 2020 · 27 comments
Labels
type: bug A confirmed bug or unintended behavior

Comments

@tgs
Copy link
Contributor

tgs commented Aug 6, 2020

What did you want to do?
Trying to install Cython using the experimental features.

Command line:

virtualenv -p python3.7 env
source env/bin/activate
pip install -U pip
pip install --use-feature=2020-resolver --use-feature=fast-deps -v cython

This works correctly on my home linux computer, but fails at work where we have the following /etc/pip.conf:

[global]
cert = /etc/pki/tls/cert.pem
no-cache-dir = false
index-url = https://artifactory.our-organization.com/artifactory/api/pypi/python-virtual-repo/simple
index = https://artifactory.our-organization.com/artifactory/api/pypi/python-virtual-repo/simple

It also works in both places when I keep 2020-resolver but omit fast-deps.

The same problem is present on master

Output
This is the relevant part of the output from pip -v, with pip f17c1d6 installed:

Given no hashes to check 141 links for project 'cython': discarding no candidates
Collecting cython
  Obtaining dependency information from cython 0.29.21
  https://artifactory.our-organization.com:443 "HEAD /artifactory/api/pypi/python-virtual-repo/packages/packages/6b/36/d6c18632a339dafa54fd128b0dd2c36c6dc4bc86b8e0d366ccd9f22b480a/Cython-0.29.21-cp37-cp37m-manylinux1_x86_64.whl HTTP/1.1" 200 0
  https://artifactory.our-organization.com:443 "GET /artifactory/api/pypi/python-virtual-repo/packages/packages/6b/36/d6c18632a339dafa54fd128b0dd2c36c6dc4bc86b8e0d366ccd9f22b480a/Cython-0.29.21-cp37-cp37m-manylinux1_x86_64.whl HTTP/1.1" 200 1969113
  https://artifactory.our-organization.com:443 "GET /artifactory/api/pypi/python-virtual-repo/packages/packages/6b/36/d6c18632a339dafa54fd128b0dd2c36c6dc4bc86b8e0d366ccd9f22b480a/Cython-0.29.21-cp37-cp37m-manylinux1_x86_64.whl HTTP/1.1" 200 1969113
  https://artifactory.our-organization.com:443 "GET /artifactory/api/pypi/python-virtual-repo/packages/packages/6b/36/d6c18632a339dafa54fd128b0dd2c36c6dc4bc86b8e0d366ccd9f22b480a/Cython-0.29.21-cp37-cp37m-manylinux1_x86_64.whl HTTP/1.1" 200 1969113
  https://artifactory.our-organization.com:443 "GET /artifactory/api/pypi/python-virtual-repo/packages/packages/6b/36/d6c18632a339dafa54fd128b0dd2c36c6dc4bc86b8e0d366ccd9f22b480a/Cython-0.29.21-cp37-cp37m-manylinux1_x86_64.whl HTTP/1.1" 200 1969113
  https://artifactory.our-organization.com:443 "GET /artifactory/api/pypi/python-virtual-repo/packages/packages/6b/36/d6c18632a339dafa54fd128b0dd2c36c6dc4bc86b8e0d366ccd9f22b480a/Cython-0.29.21-cp37-cp37m-manylinux1_x86_64.whl HTTP/1.1" 200 1969113
  https://artifactory.our-organization.com:443 "GET /artifactory/api/pypi/python-virtual-repo/packages/packages/6b/36/d6c18632a339dafa54fd128b0dd2c36c6dc4bc86b8e0d366ccd9f22b480a/Cython-0.29.21-cp37-cp37m-manylinux1_x86_64.whl HTTP/1.1" 200 1969113
  https://artifactory.our-organization.com:443 "GET /artifactory/api/pypi/python-virtual-repo/packages/packages/6b/36/d6c18632a339dafa54fd128b0dd2c36c6dc4bc86b8e0d366ccd9f22b480a/Cython-0.29.21-cp37-cp37m-manylinux1_x86_64.whl HTTP/1.1" 200 1969113

[SNIP: that line repeats a total of 193 times]

ERROR: Exception:
Traceback (most recent call last):
  File "/tmp/tmp.O9hMH57Ecq/env2/lib/python3.7/site-packages/pip/_internal/cli/base_command.py", line 216, in _main
    status = self.run(options, args)
  File "/tmp/tmp.O9hMH57Ecq/env2/lib/python3.7/site-packages/pip/_internal/cli/req_command.py", line 182, in wrapper
    return func(self, options, args)
  File "/tmp/tmp.O9hMH57Ecq/env2/lib/python3.7/site-packages/pip/_internal/commands/install.py", line 324, in run
    reqs, check_supported_wheels=not options.target_dir
  File "/tmp/tmp.O9hMH57Ecq/env2/lib/python3.7/site-packages/pip/_internal/resolution/resolvelib/resolver.py", line 111, in resolve
    requirements, max_rounds=try_to_avoid_resolution_too_deep,
  File "/tmp/tmp.O9hMH57Ecq/env2/lib/python3.7/site-packages/pip/_vendor/resolvelib/resolvers.py", line 427, in resolve
    state = resolution.resolve(requirements, max_rounds=max_rounds)
  File "/tmp/tmp.O9hMH57Ecq/env2/lib/python3.7/site-packages/pip/_vendor/resolvelib/resolvers.py", line 324, in resolve
    failure_causes = self._attempt_to_pin_criterion(name, criterion)
  File "/tmp/tmp.O9hMH57Ecq/env2/lib/python3.7/site-packages/pip/_vendor/resolvelib/resolvers.py", line 224, in _attempt_to_pin_criterion
    criteria = self._get_criteria_to_update(candidate)
  File "/tmp/tmp.O9hMH57Ecq/env2/lib/python3.7/site-packages/pip/_vendor/resolvelib/resolvers.py", line 215, in _get_criteria_to_update
    for r in self._p.get_dependencies(candidate):
  File "/tmp/tmp.O9hMH57Ecq/env2/lib/python3.7/site-packages/pip/_internal/resolution/resolvelib/provider.py", line 151, in get_dependencies
    for r in candidate.iter_dependencies(with_requires)
  File "/tmp/tmp.O9hMH57Ecq/env2/lib/python3.7/site-packages/pip/_internal/resolution/resolvelib/provider.py", line 150, in <listcomp>
    r
  File "/tmp/tmp.O9hMH57Ecq/env2/lib/python3.7/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 251, in iter_dependencies
    for r in self.dist.requires():
  File "/tmp/tmp.O9hMH57Ecq/env2/lib/python3.7/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 230, in dist
    self._prepare()
  File "/tmp/tmp.O9hMH57Ecq/env2/lib/python3.7/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 217, in _prepare
    dist = self._prepare_distribution()
  File "/tmp/tmp.O9hMH57Ecq/env2/lib/python3.7/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 315, in _prepare_distribution
    self._ireq, parallel_builds=True,
  File "/tmp/tmp.O9hMH57Ecq/env2/lib/python3.7/site-packages/pip/_internal/operations/prepare.py", line 500, in prepare_linked_requirement
    wheel_dist = self._fetch_metadata_using_lazy_wheel(link)
  File "/tmp/tmp.O9hMH57Ecq/env2/lib/python3.7/site-packages/pip/_internal/operations/prepare.py", line 488, in _fetch_metadata_using_lazy_wheel
    return dist_from_wheel_url(name, url, self._session)
  File "/tmp/tmp.O9hMH57Ecq/env2/lib/python3.7/site-packages/pip/_internal/network/lazy_wheel.py", line 46, in dist_from_wheel_url
    zip_file = ZipFile(wheel)  # type: ignore
  File "/opt/python-3.7/lib/python3.7/zipfile.py", line 1222, in __init__
    self._RealGetContents()
  File "/opt/python-3.7/lib/python3.7/zipfile.py", line 1317, in _RealGetContents
    raise BadZipFile("Bad magic number for central directory")
zipfile.BadZipFile: Bad magic number for central directory

The same part of the output for my home machine where it works, also with f17c1d6

Given no hashes to check 141 links for project 'cython': discarding no candidates
Collecting cython
  Obtaining dependency information from cython 0.29.21
  Starting new HTTPS connection (1): files.pythonhosted.org:443
  https://files.pythonhosted.org:443 "HEAD /packages/6b/36/d6c18632a339dafa54fd128b0dd2c36c6dc4bc86b8e0d366ccd9f22b480a/Cython-0.29.21-cp37-cp37m-manylinux1_x86_64.whl HTTP/1.1" 200 0
  Looking up "https://files.pythonhosted.org/packages/6b/36/d6c18632a339dafa54fd128b0dd2c36c6dc4bc86b8e0d366ccd9f22b480a/Cython-0.29.21-cp37-cp37m-manylinux1_x86_64.whl" in the cache
  Request header has "no-cache", cache bypassed
  https://files.pythonhosted.org:443 "GET /packages/6b/36/d6c18632a339dafa54fd128b0dd2c36c6dc4bc86b8e0d366ccd9f22b480a/Cython-0.29.21-cp37-cp37m-manylinux1_x86_64.whl HTTP/1.1" 206 3033
  Status code 206 not in (200, 203, 300, 301)
  Looking up "https://files.pythonhosted.org/packages/6b/36/d6c18632a339dafa54fd128b0dd2c36c6dc4bc86b8e0d366ccd9f22b480a/Cython-0.29.21-cp37-cp37m-manylinux1_x86_64.whl" in the cache
  Request header has "no-cache", cache bypassed
  https://files.pythonhosted.org:443 "GET /packages/6b/36/d6c18632a339dafa54fd128b0dd2c36c6dc4bc86b8e0d366ccd9f22b480a/Cython-0.29.21-cp37-cp37m-manylinux1_x86_64.whl HTTP/1.1" 206 7207
  Status code 206 not in (200, 203, 300, 301)
  Looking up "https://files.pythonhosted.org/packages/6b/36/d6c18632a339dafa54fd128b0dd2c36c6dc4bc86b8e0d366ccd9f22b480a/Cython-0.29.21-cp37-cp37m-manylinux1_x86_64.whl" in the cache
  Request header has "no-cache", cache bypassed
  https://files.pythonhosted.org:443 "GET /packages/6b/36/d6c18632a339dafa54fd128b0dd2c36c6dc4bc86b8e0d366ccd9f22b480a/Cython-0.29.21-cp37-cp37m-manylinux1_x86_64.whl HTTP/1.1" 206 13941
  Status code 206 not in (200, 203, 300, 301)
  Looking up "https://files.pythonhosted.org/packages/6b/36/d6c18632a339dafa54fd128b0dd2c36c6dc4bc86b8e0d366ccd9f22b480a/Cython-0.29.21-cp37-cp37m-manylinux1_x86_64.whl" in the cache
  Request header has "no-cache", cache bypassed
  https://files.pythonhosted.org:443 "GET /packages/6b/36/d6c18632a339dafa54fd128b0dd2c36c6dc4bc86b8e0d366ccd9f22b480a/Cython-0.29.21-cp37-cp37m-manylinux1_x86_64.whl HTTP/1.1" 206 166
  Status code 206 not in (200, 203, 300, 301)
  Looking up "https://files.pythonhosted.org/packages/6b/36/d6c18632a339dafa54fd128b0dd2c36c6dc4bc86b8e0d366ccd9f22b480a/Cython-0.29.21-cp37-cp37m-manylinux1_x86_64.whl" in the cache
  Request header has "no-cache", cache bypassed
  https://files.pythonhosted.org:443 "GET /packages/6b/36/d6c18632a339dafa54fd128b0dd2c36c6dc4bc86b8e0d366ccd9f22b480a/Cython-0.29.21-cp37-cp37m-manylinux1_x86_64.whl HTTP/1.1" 206 10240
  Status code 206 not in (200, 203, 300, 301)
  Looking up "https://files.pythonhosted.org/packages/6b/36/d6c18632a339dafa54fd128b0dd2c36c6dc4bc86b8e0d366ccd9f22b480a/Cython-0.29.21-cp37-cp37m-manylinux1_x86_64.whl" in the cache
  Request header has "no-cache", cache bypassed
  https://files.pythonhosted.org:443 "GET /packages/6b/36/d6c18632a339dafa54fd128b0dd2c36c6dc4bc86b8e0d366ccd9f22b480a/Cython-0.29.21-cp37-cp37m-manylinux1_x86_64.whl HTTP/1.1" 206 30
  Status code 206 not in (200, 203, 300, 301)
  Looking up "https://files.pythonhosted.org/packages/6b/36/d6c18632a339dafa54fd128b0dd2c36c6dc4bc86b8e0d366ccd9f22b480a/Cython-0.29.21-cp37-cp37m-manylinux1_x86_64.whl" in the cache
  Request header has "no-cache", cache bypassed
  https://files.pythonhosted.org:443 "GET /packages/6b/36/d6c18632a339dafa54fd128b0dd2c36c6dc4bc86b8e0d366ccd9f22b480a/Cython-0.29.21-cp37-cp37m-manylinux1_x86_64.whl HTTP/1.1" 206 41
  Status code 206 not in (200, 203, 300, 301)
  Looking up "https://files.pythonhosted.org/packages/6b/36/d6c18632a339dafa54fd128b0dd2c36c6dc4bc86b8e0d366ccd9f22b480a/Cython-0.29.21-cp37-cp37m-manylinux1_x86_64.whl" in the cache
  Request header has "no-cache", cache bypassed
  https://files.pythonhosted.org:443 "GET /packages/6b/36/d6c18632a339dafa54fd128b0dd2c36c6dc4bc86b8e0d366ccd9f22b480a/Cython-0.29.21-cp37-cp37m-manylinux1_x86_64.whl HTTP/1.1" 206 99
  Status code 206 not in (200, 203, 300, 301)
  Looking up "https://files.pythonhosted.org/packages/6b/36/d6c18632a339dafa54fd128b0dd2c36c6dc4bc86b8e0d366ccd9f22b480a/Cython-0.29.21-cp37-cp37m-manylinux1_x86_64.whl" in the cache
  Request header has "no-cache", cache bypassed
  https://files.pythonhosted.org:443 "GET /packages/6b/36/d6c18632a339dafa54fd128b0dd2c36c6dc4bc86b8e0d366ccd9f22b480a/Cython-0.29.21-cp37-cp37m-manylinux1_x86_64.whl HTTP/1.1" 206 30
  Status code 206 not in (200, 203, 300, 301)
  Looking up "https://files.pythonhosted.org/packages/6b/36/d6c18632a339dafa54fd128b0dd2c36c6dc4bc86b8e0d366ccd9f22b480a/Cython-0.29.21-cp37-cp37m-manylinux1_x86_64.whl" in the cache
  Request header has "no-cache", cache bypassed
  https://files.pythonhosted.org:443 "GET /packages/6b/36/d6c18632a339dafa54fd128b0dd2c36c6dc4bc86b8e0d366ccd9f22b480a/Cython-0.29.21-cp37-cp37m-manylinux1_x86_64.whl HTTP/1.1" 206 33
  Status code 206 not in (200, 203, 300, 301)
  Looking up "https://files.pythonhosted.org/packages/6b/36/d6c18632a339dafa54fd128b0dd2c36c6dc4bc86b8e0d366ccd9f22b480a/Cython-0.29.21-cp37-cp37m-manylinux1_x86_64.whl" in the cache
  Request header has "no-cache", cache bypassed
  https://files.pythonhosted.org:443 "GET /packages/6b/36/d6c18632a339dafa54fd128b0dd2c36c6dc4bc86b8e0d366ccd9f22b480a/Cython-0.29.21-cp37-cp37m-manylinux1_x86_64.whl HTTP/1.1" 206 1035
  Status code 206 not in (200, 203, 300, 301)
  Looking up "https://files.pythonhosted.org/packages/6b/36/d6c18632a339dafa54fd128b0dd2c36c6dc4bc86b8e0d366ccd9f22b480a/Cython-0.29.21-cp37-cp37m-manylinux1_x86_64.whl" in the cache
  Request header has "no-cache", cache bypassed
  https://files.pythonhosted.org:443 "GET /packages/6b/36/d6c18632a339dafa54fd128b0dd2c36c6dc4bc86b8e0d366ccd9f22b480a/Cython-0.29.21-cp37-cp37m-manylinux1_x86_64.whl HTTP/1.1" 206 30
  Status code 206 not in (200, 203, 300, 301)
  Looking up "https://files.pythonhosted.org/packages/6b/36/d6c18632a339dafa54fd128b0dd2c36c6dc4bc86b8e0d366ccd9f22b480a/Cython-0.29.21-cp37-cp37m-manylinux1_x86_64.whl" in the cache
  Request header has "no-cache", cache bypassed
  https://files.pythonhosted.org:443 "GET /packages/6b/36/d6c18632a339dafa54fd128b0dd2c36c6dc4bc86b8e0d366ccd9f22b480a/Cython-0.29.21-cp37-cp37m-manylinux1_x86_64.whl HTTP/1.1" 206 38
  Status code 206 not in (200, 203, 300, 301)
  Looking up "https://files.pythonhosted.org/packages/6b/36/d6c18632a339dafa54fd128b0dd2c36c6dc4bc86b8e0d366ccd9f22b480a/Cython-0.29.21-cp37-cp37m-manylinux1_x86_64.whl" in the cache
  Request header has "no-cache", cache bypassed
  https://files.pythonhosted.org:443 "GET /packages/6b/36/d6c18632a339dafa54fd128b0dd2c36c6dc4bc86b8e0d366ccd9f22b480a/Cython-0.29.21-cp37-cp37m-manylinux1_x86_64.whl HTTP/1.1" 206 21
  Status code 206 not in (200, 203, 300, 301)
  Looking up "https://files.pythonhosted.org/packages/6b/36/d6c18632a339dafa54fd128b0dd2c36c6dc4bc86b8e0d366ccd9f22b480a/Cython-0.29.21-cp37-cp37m-manylinux1_x86_64.whl" in the cache
  Request header has "no-cache", cache bypassed
  https://files.pythonhosted.org:443 "GET /packages/6b/36/d6c18632a339dafa54fd128b0dd2c36c6dc4bc86b8e0d366ccd9f22b480a/Cython-0.29.21-cp37-cp37m-manylinux1_x86_64.whl HTTP/1.1" 206 30
  Status code 206 not in (200, 203, 300, 301)
  Looking up "https://files.pythonhosted.org/packages/6b/36/d6c18632a339dafa54fd128b0dd2c36c6dc4bc86b8e0d366ccd9f22b480a/Cython-0.29.21-cp37-cp37m-manylinux1_x86_64.whl" in the cache
  Request header has "no-cache", cache bypassed
  https://files.pythonhosted.org:443 "GET /packages/6b/36/d6c18632a339dafa54fd128b0dd2c36c6dc4bc86b8e0d366ccd9f22b480a/Cython-0.29.21-cp37-cp37m-manylinux1_x86_64.whl HTTP/1.1" 206 3404
  Status code 206 not in (200, 203, 300, 301)
  Created temporary directory: /tmp/pip-unpack-xywl9x_9
  Looking up "https://files.pythonhosted.org/packages/6b/36/d6c18632a339dafa54fd128b0dd2c36c6dc4bc86b8e0d366ccd9f22b480a/Cython-0.29.21-cp37-cp37m-manylinux1_x86_64.whl" in the cache
  No cache entry available
  https://files.pythonhosted.org:443 "GET /packages/6b/36/d6c18632a339dafa54fd128b0dd2c36c6dc4bc86b8e0d366ccd9f22b480a/Cython-0.29.21-cp37-cp37m-manylinux1_x86_64.whl HTTP/1.1" 200 1969113
  Downloading Cython-0.29.21-cp37-cp37m-manylinux1_x86_64.whl (2.0 MB)
     |████████████████████████████████| 2.0 MB 2.2 MB/s eta 0:00:01  Ignoring unknown cache-control directive: immutable
  Updating cache with response from "https://files.pythonhosted.org/packages/6b/36/d6c18632a339dafa54fd128b0dd2c36c6dc4bc86b8e0d366ccd9f22b480a/Cython-0.29.21-cp37-cp37m-manylinux1_x86_64.whl"
  Caching due to etag
     |████████████████████████████████| 2.0 MB 2.2 MB/s 
  Added cython from https://files.pythonhosted.org/packages/6b/36/d6c18632a339dafa54fd128b0dd2c36c6dc4bc86b8e0d366ccd9f22b480a/Cython-0.29.21-cp37-cp37m-manylinux1_x86_64.whl#sha256=5e545a48f919e40079b0efe7b0e081c74b96f9ef25b9c1ff4cdbd95764426b58 to build tracker '/tmp/pip-req-tracker-a7imwe9c'
  Removed cython from https://files.pythonhosted.org/packages/6b/36/d6c18632a339dafa54fd128b0dd2c36c6dc4bc86b8e0d366ccd9f22b480a/Cython-0.29.21-cp37-cp37m-manylinux1_x86_64.whl#sha256=5e545a48f919e40079b0efe7b0e081c74b96f9ef25b9c1ff4cdbd95764426b58 from build tracker '/tmp/pip-req-tracker-a7imwe9c'
Installing collected packages: cython

  changing mode of /tmp/ohoashd-env/bin/cygdb to 755
  changing mode of /tmp/ohoashd-env/bin/cython to 755
  changing mode of /tmp/ohoashd-env/bin/cythonize to 755
Successfully installed cython-0.29.21
Removed build tracker: '/tmp/pip-req-tracker-a7imwe9c'

Additional information

Happy to try versions from pull requests if that's helpful. I might also be smart enough to set up a mitmproxy to record the HTTP interaction, but I'm not completely sure.

@triage-new-issues triage-new-issues bot added the S: needs triage Issues/PRs that need to be triaged label Aug 6, 2020
@uranusjr
Copy link
Member

uranusjr commented Aug 6, 2020

Seems similar to #8701. That one is already fixed in master, it’d be nice if you can try if it fixes this as well.

@tgs
Copy link
Contributor Author

tgs commented Aug 6, 2020

I ran across that bug while I was testing for this one, and I agree that it's fixed in master. But even though the exception is the same, the cause seems to be different. Pip's caching is disabled on my work computer where this failure happens. I did reproduce this bug using master.

@uranusjr
Copy link
Member

uranusjr commented Aug 6, 2020

It seems like a different bug, then. I’m unfortunately not familiar with the part of code to offer a fix, so I’ll let those who do work on it.

@uranusjr uranusjr added the type: bug A confirmed bug or unintended behavior label Aug 6, 2020
@triage-new-issues triage-new-issues bot removed the S: needs triage Issues/PRs that need to be triaged label Aug 6, 2020
@McSinyx
Copy link
Contributor

McSinyx commented Aug 7, 2020

Hi @tgs, thank you very much for filing this, which will give us better understanding on the current state of support for range requests on package indices in the wild.

https://artifactory.our-organization.com:443 "GET /artifactory/api/pypi/python-virtual-repo/packages/packages/6b/36/d6c18632a339dafa54fd128b0dd2c36c6dc4bc86b8e0d366ccd9f22b480a/Cython-0.29.21-cp37-cp37m-manylinux1_x86_64.whl HTTP/1.1" 200 1969113
[SNIP: that line repeats a total of 193 times]

At first glance, it seems that fast-deps is mistaking for range request support when it's actually unavailable. Could you please give us the output of the following command?

curl -I https://artifactory.our-organization.com:443 "GET /artifactory/api/pypi/python-virtual-repo/packages/packages/6b/36/d6c18632a339dafa54fd128b0dd2c36c6dc4bc86b8e0d366ccd9f22b480a/Cython-0.29.21-cp37-cp37m-manylinux1_x86_64.whl

I suspect that the index is internal so I'm not likely to be able to get my hands on, but my guess is that the bus has something to do with this part of the code

head = session.head(url, headers=HEADERS)
raise_for_status(head)
assert head.status_code == 200
self._session, self._url, self._chunk_size = session, url, chunk_size
self._length = int(head.headers['Content-Length'])
self._file = NamedTemporaryFile()
self.truncate(self._length)
self._left = [] # type: List[int]
self._right = [] # type: List[int]
if 'bytes' not in head.headers.get('Accept-Ranges', 'none'):
raise HTTPRangeRequestUnsupported('range request is not supported')
which somehow did not raise HTTPRangeRequestUnsupported.

@tgs
Copy link
Contributor Author

tgs commented Aug 7, 2020

The command line didn't quite work, I changed it to omit the GET:

$ curl -I "https://artifactory.our-organization.com:443/artifactory/api/pypi/python-virtual-repo/packages/packages/6b/36/d6c18632a339dafa54fd128b0dd2c36c6dc4bc86b8e0d366ccd9f22b480a/Cython-0.29.21-cp37-cp37m-manylinux1_x86_64.whl"
HTTP/1.1 200 OK
Date: Fri, 07 Aug 2020 17:41:20 GMT
Server: Artifactory/6.18.1
X-Artifactory-Id: 58b6d1fd8d039ec1:-18e585a9:17378973c9c:-8000
Cache-Control: public, max-age=31536000
Last-Modified: Wed, 08 Jul 2020 21:56:33 GMT
ETag: 5c7486cdb788792dfb0ea3928e075a2df9960578
X-Checksum-Sha1: 5c7486cdb788792dfb0ea3928e075a2df9960578
X-Checksum-Sha256: 5e545a48f919e40079b0efe7b0e081c74b96f9ef25b9c1ff4cdbd95764426b58
X-Checksum-Md5: e244967a4c60b39072d30485df45fd90
Accept-Ranges: bytes
X-Artifactory-Filename: Cython-0.29.21-cp37-cp37m-manylinux1_x86_64.whl
Content-Disposition: attachment; filename="Cython-0.29.21-cp37-cp37m-manylinux1_x86_64.whl"; filename*=UTF-8''Cython-0.29.21-cp37-cp37m-manylinux1_x86_64.whl
Content-Type: application/octet-stream
Content-Length: 1969113
Via: 1.1 artifactory.our-organization.com

So it does claim to accept byte ranges... weird! I'll see if I can figure out the curl command line for ranges and see if it works.

@McSinyx
Copy link
Contributor

McSinyx commented Aug 8, 2020

The command line didn't quite work, I changed it to omit the GET

Oops, I'm sorry for that.

I'll see if I can figure out the curl command line for ranges and see if it works.

According to MDN, something like this may give some insight:

curl -I -H "Range: bytes=0-1023" "https://artifactory.our-organization.com:443/artifactory/api/pypi/python-virtual-repo/packages/packages/6b/36/d6c18632a339dafa54fd128b0dd2c36c6dc4bc86b8e0d366ccd9f22b480a/Cython-0.29.21-cp37-cp37m-manylinux1_x86_64.whl"

@tgs
Copy link
Contributor Author

tgs commented Aug 13, 2020

> GET /artifactory/api/pypi/python-virtual-repo/packages/packages/6b/36/d6c18632a339dafa54fd128b0dd2c36c6dc4bc86b8e0d366ccd9f22b480a/Cython-0.29.21-cp37-cp37m-manylinux1_x86_64.whl HTTP/1.1
> User-Agent: curl/7.29.0
> Host: artifactory.our-organization.com
> Accept: */*
> Range: bytes=0-1023
> 
< HTTP/1.1 200 OK
< Date: Thu, 13 Aug 2020 19:57:32 GMT
< Server: Artifactory/6.18.1
< X-Artifactory-Id: 58b6d1fd8d039ec1:-18e585a9:17378973c9c:-8000
< Cache-Control: public, max-age=31536000
< Last-Modified: Wed, 08 Jul 2020 21:56:33 GMT
< ETag: 5c7486cdb788792dfb0ea3928e075a2df9960578
< X-Checksum-Sha1: 5c7486cdb788792dfb0ea3928e075a2df9960578
< X-Checksum-Sha256: 5e545a48f919e40079b0efe7b0e081c74b96f9ef25b9c1ff4cdbd95764426b58
< X-Checksum-Md5: e244967a4c60b39072d30485df45fd90
< Accept-Ranges: bytes
< X-Artifactory-Filename: Cython-0.29.21-cp37-cp37m-manylinux1_x86_64.whl
< Content-Disposition: attachment; filename="Cython-0.29.21-cp37-cp37m-manylinux1_x86_64.whl"; filename*=UTF-8''Cython-0.29.21-cp37-cp37m-manylinux1_x86_64.whl
< Content-Type: application/octet-stream
< Content-Length: 1969113
< Via: 1.1 artifactory.our-organization.com
< 

I piped it through wc -c, it did indeed return the whole file. Huh!

@tgs
Copy link
Contributor Author

tgs commented Aug 13, 2020

Maybe I should also open a ticket on the Artifactory public JIRA.

@McSinyx
Copy link
Contributor

McSinyx commented Aug 14, 2020

Thanks for your input @tgs! FYI range responses have the status code of 206 but I don't know enough to tell if the behavior above is acceptable by the web standards.

@dholth, is it legal for servers to advertise for Accept-Ranges but in fact responses to range requests with status 200? Should pip works around this by checking every response, or expects the servers to keep their promises?

@tgs
Copy link
Contributor Author

tgs commented Aug 14, 2020

I've reported the bug to Artifactory - https://www.jfrog.com/jira/browse/RTFACT-23108

Regardless of whether they fix it, there will probably be old versions of Artifactory still installed for some time, so maybe there will need to be a server version cutoff or something? If that's how the condition is detected - maybe that's too brittle.

@dholth
Copy link
Member

dholth commented Aug 14, 2020

Looks like we'd best parse the range from the response and write to disk accordingly.

@McSinyx
Copy link
Contributor

McSinyx commented Aug 15, 2020

@tgs, thank you for linking the Artifactory ticket.

@dholth, the response from Artifactory is not a range one (@tgs included the request in the output above). Considering

Regardless of whether they fix it, there will probably be old versions of Artifactory still installed for some time

I'll make a work around to fallback to whole wheel downloading if a server lies about range request support. I think you might mean to write the content of non-range response to disk, but currently we have to disable caching for range responses so probably it is not the best choice right now.

@McSinyx
Copy link
Contributor

McSinyx commented Sep 2, 2020

@cosmicexplorer, I saw in GH-7819 that you're using Artifactory as well. Does this problem affect you?

@cosmicexplorer
Copy link
Contributor

Will report back!

@hynek
Copy link
Contributor

hynek commented Oct 21, 2020

JFTR I have the exact same problem with devpi. Disabling either fast-deps or 2020-resolver fixes it.

@pradyunsg
Copy link
Member

pradyunsg commented Oct 21, 2020

This would likely be an interaction of fast-deps (which uses the Range header for partial downloads to handle zip files), with Artifactory/devpi/something!? Does this occur with only the 2020-resolver enabled?

@hynek
Copy link
Contributor

hynek commented Oct 21, 2020

Just fast-deps: ✅
Just 2020-resolver: ✅
Both together: ❌

@McSinyx
Copy link
Contributor

McSinyx commented Oct 21, 2020

fast-deps is no-op w/o 2020-resolver enabled, so that makes sense. @hynek, what would you get if you run curl -I -H "Range: bytes=0-1023" <link to a wheel on the devpi instance>? Also does anyone has an opinion about

I'll make a work around to fallback to whole wheel downloading if a server lies about range request support.

@hynek
Copy link
Contributor

hynek commented Oct 21, 2020

curl -I -H "Range: bytes=0-1023" "https://pypi.vm.ag/root/pypi/+f/fce/7fc47dfc97615/attrs-20.2.0-py2.py3-none-any.whl#sha256=fce7fc47dfc976152e82d53ff92fa0407700c21acd20886a13777a0d20e655dc"
HTTP/2 206
server: nginx
date: Wed, 21 Oct 2020 11:53:45 GMT
content-type: application/octet-stream
content-length: 1024
last-modified: Sat, 05 Sep 2020 10:27:32 GMT
etag: "5f536814-bc0c"
expires: Thu, 31 Dec 2037 23:55:55 GMT
cache-control: max-age=315360000
strict-transport-security: max-age=63072000
x-content-type-options: nosniff
x-xss-protection: 1; mode=block
x-frame-options: SAMEORIGIN
content-range: bytes 0-1023/48140


@McSinyx
Copy link
Contributor

McSinyx commented Oct 21, 2020

Thanks, and that looks like a valid partial response. In other news, files.pythonhosted.org doesn't seem to be supporting partial download anymore and I'm feeling really bad for the effort everyone spent during the summer getting fast-deps implemented. If I have to guess, there might be some issue with caching so the reverse of GH-8701 happened, perhaps. I'll try to investigate a bit deeper later this weekend.

@nanonyme
Copy link

nanonyme commented Jan 4, 2021

I would say this is a blocker for #9187. Fast tracking will only ever be fast when this feature becomes no longer experimental and enabled by default (backends also must start supporting partial responses again)

@uranusjr
Copy link
Member

uranusjr commented Jan 4, 2021

There are other ways to reduce download size. And you may be surprised to learn that download size is not actually the most significant time waste for really slow resolutions.

@McSinyx
Copy link
Contributor

McSinyx commented Jan 4, 2021

@nanonyme, in my experiments, fast-deps is not even faster in practice in the majority of the cases. Download time optimization in similar approach might rather require support from the server, e.g. pypi/warehouse#8254.

@cosmicexplorer
Copy link
Contributor

cosmicexplorer commented Jan 4, 2021

I will try to find some time to experiment with parallel downloads to see if that will make fast-deps faster. It we could cache the result of a setup.py sdist, we might be able to avoid doing it again (working around #8929), and if we could implement parallel pipelined downloads, I have seen that complete in < 10 seconds regardless of the input in my testing.

But: if we can't rely on a pypi warehouse to reliably provide partial file ranges, then I might suggest instead storing all of this information locally in a json file or sqlite db (as I did in #7819 https://github.com/pypa/pip/compare/master...cosmicexplorer:requirement-dependencies-cache?expand=1) and using some caching methodology to know when to fetch for each entry. This could avoid any dependency on warehouse development.

We can definitely wait for pypi/warehouse#8254, but that won't help pip users resolving against find-links repos (I think), and in general expecting an artifact repository to expose all metadata all up to date all the time and exposing file ranges performantly seems very difficult. For people maintaining any such artifact repository inside some internal corporate network, I think a solution which works unconditionally and doesn't require special configuration on the server would save a lot of work.

@nanonyme
Copy link

nanonyme commented Jan 4, 2021

I would certainly hope removing old resolver would be pushed out into horizon until one or more of these speedups is in place in both Warehouse and downstream mirrors. That may take years for server features.

@uranusjr
Copy link
Member

uranusjr commented Jan 5, 2021

But: if we can't rely on a pypi warehouse to reliably provide partial file ranges, then I might suggest instead storing all of this information locally in a json file or sqlite db

But this local database can only be updated by downloading the whole distribution files, isn’t it? Distribution files are already cached, and it’s already pretty fast to access them (compared to other parts of the resolver), so this probably won’t bring much improvement. Or do you propose making the information remotely available (outside of PyPI) so people can update the local database without downloading the actual distributions?

@ewdurbin
Copy link
Member

ewdurbin commented Jan 6, 2021

Howdy, looks like a bug was filed against warehouse (PyPI) regarding this.

I wanted to drop in to note that the CDN for PyPI files (files.pythonhosted.org) has continuously supported Range requests since its inception.

The confusion seems to be around some copy pasta curl commands being used to test the functionality.

Our CDN does not respond perfectly to Range requests when they come in as a HEAD request, but does for valid GET requests with a Range header.

Note that many of the examples in this thread include the -I parameter to curl leading to a HEAD request.

Because our CDN fetches the full file to cache at edge before responding to Range GET requests, the HEAD is the response for the full object, always.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug A confirmed bug or unintended behavior
Projects
None yet
Development

No branches or pull requests

9 participants