Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pip uses backtracking when dependency installation fails #9764

Closed
grst opened this issue Apr 2, 2021 · 20 comments
Closed

pip uses backtracking when dependency installation fails #9764

grst opened this issue Apr 2, 2021 · 20 comments
Labels
C: dependency resolution About choosing which dependencies to install state: needs discussion This needs some more discussion

Comments

@grst
Copy link

grst commented Apr 2, 2021

Description

When installation of a dependency fails, pip uses the backtracking feature to try other versions of the package (even if the failure is not due to a version conflict)

Expected behavior

I understand that the backtracking is useful to solve version conflicts. Trying different versions when the installation fails for another reason than a version conflict is IMO not useful most of the time, as it often indicates a missing system package.

I find this particular annoying during CI tests, as it takes forever before the test actually fails. If this is intended behavior, it would be great to have a flag to disable it.

pip version

21.0.1

Python version

3.7.10

OS

arch linux

How to Reproduce

As an example, I install scikit-bio into a clean environment (which fails, because the package doesn't properly declare the numpy dependency)

conda create -n test_skbio python=3.7 pip
conda activate test_skbio
pip install scikit-bio

Output

Collecting scikit-bio
  Using cached scikit-bio-0.5.6.tar.gz (8.4 MB)
    ERROR: Command errored out with exit status 1:
     command: /home/sturm/anaconda3/envs/test_skbio/bin/python3.7 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/home/sturm/tmp/pip-install-0wj1ulih/scikit-bio_e1c29669eff64467acdb675f656b2ef2/setup.py'"'"'; __file__='"'"'/home/sturm/tmp/pip-install-0wj1ulih/scikit-bio_e1c29669eff64467acdb675f656b2ef2/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /home/sturm/tmp/pip-pip-egg-info-95p85io3
         cwd: /home/sturm/tmp/pip-install-0wj1ulih/scikit-bio_e1c29669eff64467acdb675f656b2ef2/
    Complete output (5 lines):
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/home/sturm/tmp/pip-install-0wj1ulih/scikit-bio_e1c29669eff64467acdb675f656b2ef2/setup.py", line 20, in <module>
        import numpy as np
    ModuleNotFoundError: No module named 'numpy'
    ----------------------------------------
WARNING: Discarding https://files.pythonhosted.org/packages/66/b0/054ef21e024d24422882958072973cd192b492e004a3ce4e9614ef173d9b/scikit-bio-0.5.6.tar.gz#sha256=48b73ec53ce0ff2c2e3e05f3cfcf93527c1525a8d3e9dd4ae317b4219c37f0ea (from https://pypi.org/simple/scikit-bio/). Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
  Using cached scikit-bio-0.5.5.tar.gz (8.3 MB)
    ERROR: Command errored out with exit status 1:
     command: /home/sturm/anaconda3/envs/test_skbio/bin/python3.7 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/home/sturm/tmp/pip-install-0wj1ulih/scikit-bio_ef40087a894243eea6e9ba7506c90c26/setup.py'"'"'; __file__='"'"'/home/sturm/tmp/pip-install-0wj1ulih/scikit-bio_ef40087a894243eea6e9ba7506c90c26/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /home/sturm/tmp/pip-pip-egg-info-p8h2qvwu
         cwd: /home/sturm/tmp/pip-install-0wj1ulih/scikit-bio_ef40087a894243eea6e9ba7506c90c26/
    Complete output (5 lines):
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/home/sturm/tmp/pip-install-0wj1ulih/scikit-bio_ef40087a894243eea6e9ba7506c90c26/setup.py", line 20, in <module>
        import numpy as np
    ModuleNotFoundError: No module named 'numpy'
    ----------------------------------------
WARNING: Discarding https://files.pythonhosted.org/packages/2d/ff/3a909ae8c212305846f7e87f86f3902408b55b958eccedf5d4349e76c671/scikit-bio-0.5.5.tar.gz#sha256=9fa813be66e88a994f7b7a68b8ba2216e205c525caa8585386ebdeebed6428df (from https://pypi.org/simple/scikit-bio/). Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
  Using cached scikit-bio-0.5.4.tar.gz (8.3 MB)
    ERROR: Command errored out with exit status 1:
     command: /home/sturm/anaconda3/envs/test_skbio/bin/python3.7 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/home/sturm/tmp/pip-install-0wj1ulih/scikit-bio_aa90daa04e0549fbbd36b29262ef299e/setup.py'"'"'; __file__='"'"'/home/sturm/tmp/pip-install-0wj1ulih/scikit-bio_aa90daa04e0549fbbd36b29262ef299e/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /home/sturm/tmp/pip-pip-egg-info-d6wu69n6
         cwd: /home/sturm/tmp/pip-install-0wj1ulih/scikit-bio_aa90daa04e0549fbbd36b29262ef299e/
    Complete output (5 lines):
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/home/sturm/tmp/pip-install-0wj1ulih/scikit-bio_aa90daa04e0549fbbd36b29262ef299e/setup.py", line 20, in <module>
        import numpy as np
    ModuleNotFoundError: No module named 'numpy'
    ----------------------------------------

Code of Conduct

I agree to follow the PSF Code of Conduct.

@grst grst added S: needs triage Issues/PRs that need to be triaged type: bug A confirmed bug or unintended behavior labels Apr 2, 2021
@uranusjr
Copy link
Member

uranusjr commented Apr 2, 2021

This is basically an impossible problem. The first design was actually to fail the entire installation on build failures, and we got a flood of requests for the current behaviour, so we implemented it. Judging from the initial backlash (and the quietness after we released the change until this issue), I am assuming more people find the current behaviour more useful.

@uranusjr uranusjr added state: needs discussion This needs some more discussion and removed S: needs triage Issues/PRs that need to be triaged type: bug A confirmed bug or unintended behavior labels Apr 2, 2021
@grst
Copy link
Author

grst commented Apr 2, 2021

I see! I still believe something like --fail-fast would be useful for CI builds.

@uranusjr
Copy link
Member

uranusjr commented Apr 2, 2021

Yeah, a flag like that makes a lot of sense. Once the legacy resolver is removed entirely, we can start implementing various “strategy flags” for the resolver, and this is one of the first I’m looking to have as well.

@stonebig
Copy link
Contributor

stonebig commented Apr 4, 2021

I have also reported the problem, and I don't understand: there is no reason to not fail fast in any circumstances, when the outcome is anyway to fail:

  • it's better for the quick understanding and fix of the issue,
  • it's better for your cloud budget,
  • it's better for the planet.

when other people complained, it was probably because there were still several issues inter-mixing in code and minds around the new resolver.
when you can detect that it will fail for a reason, no need to look for other reasons.

Like in Chess: if I do that move I will loose my King, ... but maybe I can still take his Queen ?

@uranusjr
Copy link
Member

uranusjr commented Apr 4, 2021

there is no reason to fail fast in any circumstances, when the outcome is anyway to fail:

Is there a typo here? The sentence only makes sense if I read it as there is no reason to not fail fast.


The problem is exactly not every build failure is created the same, and the final outcome is not always to fail.

Indeed it makes no sense to continue if a package version fails to build when it is expected to, but not all projects manage their distributions like this. Some would release versions that only target certain platforms and don’t expect to be built on others, for example. You may argue they are using the project version incorrectly (full disclosure, I’m personally quite annoyed by them), but pip need to take a very lax position to those scenarios since Python packaging has been historically permissive in this area (far too much IMO), especially regarding sdists.

Again, I do feel failing immediately on a build failure is a reasonable approach, but that’s not something we can just do. (Another consideration is you can always put additional constraints to limit backtracking, but there’s no to tell pip to continue if a build failure exits the process immediately.)

@stonebig
Copy link
Contributor

stonebig commented Apr 4, 2021

Typo corrected

@benjaoming
Copy link

benjaoming commented Apr 6, 2021

A quick failure would be great!

An example requirements.txt that creates endless backtracking (using pip 21.0.1) :

docker==3.7
molecule-docker

...but! If we install the 2 dependencies in the following way, pip will break nicely:

pip install docker==3.7
pip install molecule-docker  # This now fails

The nice way of breaking occurs as the second step fails:

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
molecule-docker 0.3.3 requires docker>=4.3.1, but you have docker 3.7.2 which is incompatible.

When we install the two dependencies in combination with -r requirements.txt, then the result is endless backtracking because pip wants to find a version of molecule-docker that's compatible with docker 3.7. IMHO, it should just fail if the first one doesn't work.. then the developer can sort out a version spec that works.

@pfmoore
Copy link
Member

pfmoore commented Apr 6, 2021

then the developer can sort out a version spec that works

But that's essentially the reason why we have a resolver at all - "figuring out a set of versions that works" is in general a really hard problem, and people aren't able to do it by hand (without significant effort). In some cases - possibly including this one - working out a usable set of restrictions is easy enough, but how do we know if that's the case up front (without doing the resolve that caused the problem in the first place)?

In this particular case, is there an earlier version of module-docker which supports docker 3.7? How would you find that version to install it? The only way I can imagine is checking each older version of module-docker to find one that works - which is all that pip is doing. So how would it be any better if you did that yourself?

As @uranusjr said, "stop on first failure" is a reasonable option to offer, and once we've managed to remove all of the "old resolver" code so we can make changes like that, we'll probably look at offering it as an option. But I don't know how useful it will be in practice - I suspect we'll just have people requesting that when failing, pip provide more information to help them work out how to fix the problem, which is precisely what we won't be able to do, because we stopped when we hit the error!

@grst
Copy link
Author

grst commented Apr 6, 2021

I would also like to point out again the distinction between

  • a version conflict, in which case I think it is fine to search for compatible versions
  • other installation errors like missing system libraries (in which case another version won't help), or incorrectly declared dependencies like the skbio example above (in which case the problem should be fixed upstream).

@pfmoore
Copy link
Member

pfmoore commented Apr 6, 2021

Again, I think the problem is how does pip know it's a missing system library (for example) that caused the problem? That's the key issue.

@grst
Copy link
Author

grst commented Apr 6, 2021

It can't, but it should know when the problem is due to a version conflict, shouldn't it?
And by version conflict I mean that a package declares it requires version e.g >= x.y.z or != a.b.c of a dependency.

@pfmoore
Copy link
Member

pfmoore commented Apr 6, 2021

No, because it can't get the dependencies without doing a build, which is what fails (at least sometimes)... But if you're saying that any build failure should cause an immediate stop, isn't that what both @uranusjr and I have confirmed is reasonable to have as an option?

@grst
Copy link
Author

grst commented Apr 6, 2021

Got it, I thought that could be inferred from the metadata without doing a build. I'm happy with such an option then!

@benjaoming
Copy link

benjaoming commented Apr 7, 2021

@pfmoore

Sorry if this sent the explanations back to Square One. I think we are in accordance on all things except what to do with actual hours of backtracking? As I hear you, it's a "wontfix" because that's how it's designed? Bug or design?

This post became a bit long, but please don't read it as a demanding or negative one, the the dependency resolver is great and I hope it's okay to suggest improvements.

I'll keep working with the same isolated example: One package is pinned to an old version and another more recent package dependency has a reverse dependency to the same package, but with a new version.

docker==3
molecule-docker  # Latest version depends on docker>=4.3.1

I can't think of a simpler and smaller example than this - it spawned hours of backtracking, I don't even know where it ends :) The resources used are (a lot of) network, 100% CPU, local pip .cache filling up and possibly also network proxies filling up. You have the helicopter perspective of this from the PyPi usage data.

The default behavior is very bad and causes CIs to lock up for hours. With pre-commit, environments are created in a hidden way without the users knowledge. Even when the user monitors terminal outputs, the thousands of lines of terminal output may be difficult to understand and identifying which version specification that causes the issue, seems less likely. I would expect things to Just Break®, rather than the current behavior, but I understand that the solution here is perhaps to find a compromise.

It's possible to say that pip<20 (old dependency resolver) was already a compromise - it proceeded despite the version conflict but created a warning for the user IIRC. But better solutions surely exist.

The example isn't something that is desirable to have automatically backtracked nor silently ignored -- it should ideally break so the developer can fix it.

Consider the example again:

docker==3
molecule-docker  # Latest version depends on docker>=4.3.1

Scenarios:

  1. current behavior: pip automatically backtracks down the dependency tree, taking hours to install (I stopped mine after over 2 hours)
  2. suggested behavior: pip fails after x steps (x>=0) if there is a version conflict and the developer either resolves it by adding constraints that guide pip to compatible dependencies or bumps the threshold
  3. installation initially works but almost definitely breaks later with 1 or 2 being the outcomes
  4. everything was working before because of a persistent virtual environment with satisfied dependencies, perhaps with some warnings - but since pip 20's dependency resolver, we choose either 1 or 2 as outcomes

Question: Maybe I haven't noticed cases where backtracking was active and helpful because it Just Worked®? When you find that people have benefited from backtracking, how many steps have typically run?

Suggestion

Solution:

  • Have a global counter of the number of steps performed in backtracking
  • Record the initial dependency that started the backtracking
  • Halt with an error when a default threshold is reached (what is a good default threshold?)
  • Introduce a flag specify a non-default backtracking threshold -- or a threshold of 0 (zero) for immediate failure
  • Keep the survey links and all the good current error information, but add something about the backtracking threshold and the dependency that started the backtracking.

Example sketch (final ERROR text is a new mockup)

pip install -r requirements.txt
INFO: pip is looking at multiple versions of urllib3 to determine which version is compatible with other requirements. This could take a while.
INFO: This is taking longer than usual. You might need to provide the dependency resolver with stricter constraints to reduce runtime. If you want to abort this run, you can press Ctrl + C to do so. To improve how pip performs, tell us what happened here: https://pip.pypa.io/surveys/backtracking
INFO: pip is looking at multiple versions of six to determine which version is compatible with other requirements. This could take a while.

ERROR: Could not satisfy version. Started backtracking because of (docker==3 conflicts: molecule-docker needs docker >= 4.3.1) pip has backtracked > 20 steps. If you wish to automatically resolve versions, increase the number of allowed backtracking steps with --automatic-backtracking-threshold=n, where n>=0,<1000

@uranusjr
Copy link
Member

uranusjr commented Apr 7, 2021

I love how every discussion on resolution kind of eventually tries to suggest a counter + exit after a certain number of rounds kind of proposal. That works in abstract, until you actually try to define what a “step” is in the resolver. I’m not saying it won’t work, but have been through this multiple time and don’t really want to go into the same abstract discussions again. Feel free to try your idea out and come up with a proof of concept; we can continue the discussion from there.

@pfmoore
Copy link
Member

pfmoore commented Apr 7, 2021

I'll keep working with the same isolated example: One package is pinned to an old version and another more recent package dependency has a reverse dependency to the same package, but with a new version.

I agree that the behaviour here is bad, but I don't have enough information to understand why it's happening yet. Let's come back to that, though.

Regarding your proposed solution:

Have a global counter of the number of steps performed in backtracking

We have that. It's called max_rounds and is set here, to 2000000. That may seem a lot, but it was originally set much lower, and we got users complaining that pip gave up too soon. We found that in terms of time spent, the number of rounds could be increased a lot without the time being affected too badly, so we increased to the current value.

The problem you have appears to be that in your case, the time spent is not because of too many rounds. But we don't know what it is.

So we need more information. If you were to profile your case, and identify:

  1. Where the time is actually being spent.
  2. How many times pip does a step that gets thrown away by backtracking (please be careful here, we need details - trying to build 100 dependencies, finding a conflict and throwing them away is one backtrack, even if it takes many hours and 100 package builds were thrown away).
  3. In particular, what proportion of time did pip spend building stuff just to extract metadata (dependency information). Our best theory at the moment for all of these "pip takes ages" cases is that pip is building heaps of stuff because the only way to get dependency information for a sdist is to build it.
  4. What information pip has available when a backtrack occurs, and how much help that is in "pruning" the list of options remaining (hint: we've done this, and it's really hard - see previous comment about "pip doesn't know that a build failed because of missing system headers")

Then, we might be able to determine where the problem lies in your case. Without trying to pre-judge, I'm fairly certain that the answer won't be something pip can address easily (typically, it's builds that take a long time to complete).

Some workarounds which I'm sure aren't acceptable, but may give you some food for thought:

  1. Hit CTRL-C after the install has been going for 30 minutes. At that point, as a first step, you can assume that pip has gone into some sort of backtracking spiral, so add constraints to fix that. If you can't work out how to do that, even with pip's verbose log information, consider why you believe pip can. Equally, if you don't know whether 30 minutes is the right length of time to wait, consider how pip could know any better than you.
  2. Pre-build any packages you might need for the install. The pip can just install wheels, which is extremely unlikely to be slow. That might be a pain, because you have to track dependencies to work out what's needed - but that's what pip has to do, so maybe that's where the cost lies?
  3. A combination - kill the process, look at what pip needed to do, prebuild stuff, repeat.

None of these will fix the issue, but they may give you insights, and possibly even suggest a way forward. If you produce a proof of concept fix from that which helps your issue, we'd love to know.

Maybe I haven't noticed cases where backtracking was active and helpful because it Just Worked®?

Quite probably. We have many millions of people using pip daily. And we've had people comment that the new resolver was a significant benefit for them. Honestly, do you really think we would have released the new resolver if we'd had feedback that it was a net loss? This is probably the most extensively publicised feature pip has ever released, and we did more user research on it than we ever had before (thanks to the funding we received). So yes, I'm afraid you are in a small minority here. I know that's no help to you personally, but as pip maintainers we have to look at the wider picture.

When you find that people have benefited from backtracking, how many steps have typically run?

We have no idea. Nobody tells us anything when things work well. Maybe you can imagine how demoralising that can be? Particularly when people who raise issues assume we have all that information to hand 🙁

I think we're just going round in circles now (ironic, really 😉). I suggest that if you want to make progress with this, you profile where pip is spending its time, as I suggested above, and give us some feedback on precisely what pip (or the build tool) is doing in all that time.

@benjaoming
Copy link

Once again, thanks for taking the time to elaborate, @pfmoore

Hope that it can seem both encouraging and motivating that everyone here are happy with the new resolver as well and just trying to find solutions from perceived problems.

@stonebig
Copy link
Contributor

stonebig commented Apr 10, 2021

I tried by reducing the number 2000000 down to 2, or even 0. My issue is when it fails (now quickly), it still doesn't tell me what is its problem, or the first problem it had, so I don't know what package problem I have:

Looking in links: c:\WinP\packages.srcreq
Processing c:\winp\packages.srcreq\fastai-2.1.10-py3-none-any.whl
Processing c:\winp\packages.srcreq\fastcore-1.3.19-py3-none-any.whl
Requirement already satisfied: pillow>6.0.0 in c:\winp\bd39\bucod\wpy64-3940\python-3.9.4.amd64\lib\site-packages (from fastai) (8.2.0)
Requirement already satisfied: pip in c:\winp\bd39\bucod\wpy64-3940\python-3.9.4.amd64\lib\site-packages (from fastai) (21.1.dev0)
Processing c:\winp\packages.srcreq\fastprogress-1.0.0-py3-none-any.whl
Processing c:\winp\packages.srcreq\spacy-3.0.5-cp39-cp39-win_amd64.whl
Requirement already satisfied: requests in c:\winp\bd39\bucod\wpy64-3940\python-3.9.4.amd64\lib\site-packages (from fastai) (2.25.1)
Requirement already satisfied: matplotlib in c:\winp\bd39\bucod\wpy64-3940\python-3.9.4.amd64\lib\site-packages (from fastai) (3.4.1)
Requirement already satisfied: scikit-learn in c:\winp\bd39\bucod\wpy64-3940\python-3.9.4.amd64\lib\site-packages (from fastai) (0.24.1)
Processing c:\winp\packages.srcreq\torchvision-0.9.1+cpu-cp39-cp39-win_amd64.whl
Requirement already satisfied: scipy in c:\winp\bd39\bucod\wpy64-3940\python-3.9.4.amd64\lib\site-packages (from fastai) (1.6.2)
Requirement already satisfied: pandas in c:\winp\bd39\bucod\wpy64-3940\python-3.9.4.amd64\lib\site-packages (from fastai) (1.2.3)
Requirement already satisfied: packaging in c:\winp\bd39\bucod\wpy64-3940\python-3.9.4.amd64\lib\site-packages (from fastai) (20.9)
Processing c:\winp\packages.srcreq\torch-1.8.1+cpu-cp39-cp39-win_amd64.whl
Requirement already satisfied: pyyaml in c:\winp\bd39\bucod\wpy64-3940\python-3.9.4.amd64\lib\site-packages (from fastai) (5.4.1)
ERROR: Exception:
Traceback (most recent call last):
  File "C:\WinP\bd39\bucod\WPy64-3940\python-3.9.4.amd64\lib\site-packages\pip\_internal\cli\base_command.py", line 180, in _main
    status = self.run(options, args)
  File "C:\WinP\bd39\bucod\WPy64-3940\python-3.9.4.amd64\lib\site-packages\pip\_internal\cli\req_command.py", line 204, in wrapper
    return func(self, options, args)
  File "C:\WinP\bd39\bucod\WPy64-3940\python-3.9.4.amd64\lib\site-packages\pip\_internal\commands\install.py", line 318, in run
    requirement_set = resolver.resolve(
  File "C:\WinP\bd39\bucod\WPy64-3940\python-3.9.4.amd64\lib\site-packages\pip\_internal\resolution\resolvelib\resolver.py", line 127, in resolve
    result = self._result = resolver.resolve(
  File "C:\WinP\bd39\bucod\WPy64-3940\python-3.9.4.amd64\lib\site-packages\pip\_vendor\resolvelib\resolvers.py", line 454, in resolve
    state = resolution.resolve(requirements, max_rounds=max_rounds)
  File "C:\WinP\bd39\bucod\WPy64-3940\python-3.9.4.amd64\lib\site-packages\pip\_vendor\resolvelib\resolvers.py", line 365, in resolve
    raise ResolutionTooDeep(max_rounds)
pip._vendor.resolvelib.resolvers.ResolutionTooDeep: 2

@pradyunsg
Copy link
Member

It's called max_rounds and is set here, to 2000000.

Well, a single round can include backtracking various versions of a single package.

Setting it to 2, however, would mean you can't install more than 2 packages. :)

@uranusjr
Copy link
Member

uranusjr commented Dec 9, 2021

Merging into #10655 since it covers this problem, and we don’t really need two issues on this.

@uranusjr uranusjr closed this as completed Dec 9, 2021
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 9, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
C: dependency resolution About choosing which dependencies to install state: needs discussion This needs some more discussion
Projects
None yet
Development

No branches or pull requests

6 participants