Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow prioritizing package lookup according to their source #9524

Closed
wants to merge 2 commits into from
Closed

Allow prioritizing package lookup according to their source #9524

wants to merge 2 commits into from

Conversation

Lordshinjo
Copy link

This is a first try for #6045.

Previously, installation candidates would (roughly) be ordered by version only.
With this change, a source priority is introduced, making sources behave as fallbacks to each other: if a candidate is found in a higher priority source, it will be downloaded from that source even if a higher version of the package is found in another source.
Sources are ordered from highest to lowest priority as: find_links archives, find_links links, index urls in order.
At least that's what I believe is happening based on how --find-links and --extra-index-url are documented, but LinkCollector.collect_links seems to be slightly different.

I'm aware that this changes the behavior of package resolution as decribed in Satisfying Requirements, so I don't know if this should be hidden behind a feature flag or something like that.

Previously, installation candidates would (roughly) be ordered by
version only.
With this change, a source priority is introduced, making sources behave
as fallbacks to each other: if a candidate is found in a higher priority
source, it will be downloaded from that source even if a higher version
of the package is found in another source.
Sources are ordered from highest to lowest priority as: find_links
archives, find_links links, index urls in order.
@uranusjr
Copy link
Member

This is too big a behaviour change to be introduced as is. It needs to at least go through the --use-feature/--use-deprecated cycle to have a chance to be accepted.

@Lordshinjo Lordshinjo changed the title Prioritize package lookup according to their source Allow prioritizing package lookup according to their source Jan 28, 2021
@Lordshinjo
Copy link
Author

I put this behavior behind --use-feature=source-priority.
That's not a great name, but I'll be happy if that's the only issue.

@pfmoore
Copy link
Member

pfmoore commented Jan 28, 2021

Also note that there is no consensus that this is desirable behaviour, so it should really be discussed as a feature request first. Having an implementation is useful, so thanks for that, but people shouldn't be expected to read the PR to understand and discuss the intended behaviour.

@Lordshinjo
Copy link
Author

Yes, this PR is more meant as an improvement to the discussion, just to have a first implementation so that we can see how this looks like and test it.

Where does it make more sense to continue the discussion, here or in #6045 ?

@pfmoore
Copy link
Member

pfmoore commented Jan 28, 2021

I'm happy with either, but I'll start with some points here.

On #6045, @uranusjr noted that simpleindex handles this use case. We should start with an explanation of why that's insufficient, and this needs to be built into pip. Some things to consider:

  • Using simpleindex will support all tools, not just pip
  • Having the code be external reduces the maintenance burden on pip
  • Simpleindex allows more customisation than just a simple priority, so is more flexible.
  • It would be easy to add extra policies to simpleindex, and you don't risk breaking the workflows of millions of users...

Personally, I'm strongly in favour of delegating this functionality to an external tool. I think that loading every piece of functionality that people require into pip is ultimately unsustainable, and we should move away from that approach.

Assuming you want to proceed with changing pip, as I mentioned above, I'd like to see a text description of how the functionality you're adding here will look to the user. (Which reminds me, you also need to add a documentation change to the PR). As far as I can see, you're simply changing the behaviour, so how will we describe the new behaviour? Has your choice of priorities been discussed/agreed?

And of course, there's the question of backward compatibility. The changelog in the PR says "if a candidate is found in a higher priority source, it will be downloaded from that source even if a higher version of the package is found in another source" - that is a breaking change that will cause some users to install the wrong version of a package. How would those users get the current behaviour back (they might be relying on it)? And why are we assuming it's OK to change the default and make people who are happy with the current situation explicitly request it? Surely we should be asking users to opt into the new behaviour? Your approach is (in effect) assuming that the existing behaviour is a bug, and we're "fixing" it. I don't think there's any justification for making that assumption.

@Lordshinjo
Copy link
Author

So yeah, I probably got a bit ahead of myself, though I have to thank you a lot for taking the time to explain all this.

There are 2 separate issues that are conflated in #6045:

  1. pip install installing from extra-index-url first #3454 is about prioritizing the base index over the extra index no matter what, even if the version in the extra index is higher than in the base index.
    This is what is being done in this PR, but is indeed a huge change of behavior with unknown consequences.
    I don't actually care all that much about this issue, and though there is some validity to it, simpleindex is probably a better solution.

  2. New resolver changes the order of index urls used with --extra-index-url #9484 is about prioritizing the base index over the extra index if otherwise equivalent packages are found (though I now realize that I should have opened it as a feature request rather than a bug report.
    There is currently no documented behavior, as the reference only mentions the following:

    Where more than one source of the chosen version is available, it is assumed that any source is acceptable (as otherwise the versions would differ).

    Note that this is also the behavior of the legacy resolver as far as I can tell (possibly by chance, but I highly doubt it).
    I do care about this, because I find it very counter intuitive that the current priority list for equivalent files is "extra indexes (in reverse order) > base index > local files".

If you think this makes sense, I'll open a feature request for the 2nd issue.

@Lordshinjo Lordshinjo closed this Jan 28, 2021
@pfmoore
Copy link
Member

pfmoore commented Jan 28, 2021

I do care about this, because I find it very counter intuitive that the current priority list for equivalent files is "extra indexes (in reverse order) > base index > local files".

Um, no it isn't, technically. It's undefined, as you quoted, so if you noticed that (or worse still, care) you're doing something wrong (or you're reading the source and don't actually have an issue in practice).

I do get your point, but I'm reluctant to define an order, because there will be endless bikeshedding over what the "right" order should be. Some people will want local first (avoid network requests), others will want base index first (treat PyPI as definitive) still others will want something else, because they have local builds that they prefer for some reason (but the local builds might be on their main index or an extra one, who knows...).

"Manage your own priorities if you need to, using devpi or simpleindex" is a much more straightforward message.

@pfmoore
Copy link
Member

pfmoore commented Jan 28, 2021

I should also say thanks for summarising the background so well - that helped me a lot.

@Lordshinjo
Copy link
Author

Um, no it isn't, technically. It's undefined, as you quoted, so if you noticed that (or worse still, care) you're doing something wrong (or you're reading the source and don't actually have an issue in practice).

For reference my actual issue is that we have a private index containing private packages + a mirror of PyPI, but it's on the other side of the world and has a misconfigured cache, which means I get a 20kB/s download speed and have to redownload every package. Because I'm not responsible for my company's infrastructure, I used to simply specify it as an extra index to download public packages from PyPI.
The state of our private index probably counts as doing something wrong though.

@pfmoore
Copy link
Member

pfmoore commented Jan 28, 2021

:-) Having dealt with dodgy corporate infrastructure, I sympathise.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 3, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants