Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

setuptools and state of PEP 376 #371

Open
ghost opened this issue Apr 14, 2015 · 15 comments
Open

setuptools and state of PEP 376 #371

ghost opened this issue Apr 14, 2015 · 15 comments

Comments

@ghost
Copy link

ghost commented Apr 14, 2015

Originally reported by: cournape (Bitbucket: cournape, GitHub: cournape)


Hi there,

I was wondering whether the setuptools team were willing to support PEP 376 (adding .dist-info directories when doing python setup.py install) ?

The context: doing a python setup.py install for package foo and then later on doing a pip install foo does not work because the setup.py install step does not create a .dist-info directory, and in particular there is no RECORD file for pip to uninstall files. I would like to fix this as this has been a recurrent pain point in the scientific python community.

Also, what is the feeling around deprecating "egg dir install" so that by default python setup.py install would install foo in site-package/foo instead of site-packages/foo-1.0.egg ? Or is that considered too radical of a change ?

@ogrisel


@ghost
Copy link
Author

ghost commented Apr 15, 2015

Original comment by jaraco (Bitbucket: jaraco, GitHub: jaraco):


I'm unsure the state of PEP 376 and .dist-info.

One problem with PEP 376 is that it assumes that only one version of a package is every installed at any given time. It doesn't account for --multi-version installs, which allow for multiple versions of a package to be installed (and then "required" at run time). Setuptools allows different versions to be installed but not active (present by default), which has advantages such as rapid switching an environment between different versions without the need to install or uninstall anything. PEP 376 doesn't address this use case.

Another issue is that PEP 376 assumes the introduction of Distutils2, which is a defunct project, in part because the effort required to produce a new format that could readily supersede and obviate the existing approach was a challenging task with many pitfalls and challenges.

Perhaps the most daunting problem is that of namespace packages. The PEP 376 model (and the distutils model prior to PEP 420) specify that a package exists in exactly one place on sys.path. Setuptools instead manages namespace packages such that they need not be merged into the same directory, but can exist as separate distributions and merged at run time. As of yet, I don't believe the PEP 376 model or the pip implementation has support for "editable" projects for a namespace package, when other packages of the same namespace are present (see pip 3).

Setuptools sought out to build a tooling infrastructure to tackle all of these problems, and in largely achieving that goal, has created a pretty high expectations.

Can setuptools generate a .dist-info in addition to .egg-info? Probably.
Can .dist-info supersede .egg-info? Probably.

Can the distutils/pip model obviate the setuptools model of .egg directories? Not in general, and specifically not for namespace packages on systems prior to the implementation of PEP 420 (Python 3.3). Even with PEP 420, support for multi-version installs would have to be dropped for the "one directory per package" model to be viable.

I invite @ncoghlan to comment. Should PEP 376 be considered defunct along with Distutils2, or should the aforementioned features of Setuptools be considered in violation of PEP 376 and start to wean users off of these (useful) models? Is there another option?

@ghost
Copy link
Author

ghost commented Apr 15, 2015

Original comment by cournape (Bitbucket: cournape, GitHub: cournape):


Hi Jason,

I understand the backward compatibility worries. So let me split the discussion in the various parts:

  1. generating .dist-info in addition to .egg-info is uncontroversial
  2. moving .egg-info into "legacy mode" and using .dist-info is also fairly uncontroversial
  3. making python setup.py install work the same under distutils and setuptools is more controversial. Part of it is the loss of features for multi versioned packages and namespace packages.

Would it be more workable if 3. was "opt-in" ? The consensus in the scientific python community is to avoid setuptools because we do not want the "install in a versioned directory" model and its associated costs (additional IO, complexity). If there was an option we could pass to setup such as packages could select the behaviour or --single-version-externally-managed + .dist-info so that pip can uninstall it later, I think we could move that community to setuptools.

If opt-in for 3 is workable, I would be happy to make a proof of concept for 1. and 2., and do some hustling in the scientific community to move to setuptools. I know @ogrisel is interested in this for scikit learn, and if numpy, scipy (the projects I am involved w/) + scikit learn move to this, I would expect most of the community to move fairly quickly.

@ghost
Copy link
Author

ghost commented Apr 15, 2015

Original comment by ogrisel (Bitbucket: ogrisel, GitHub: ogrisel):


In the case of numpy, scipy, scikit-learn I think the majority of the developers would like to have a default python setup.py install that does not mess around with .pth files and does not introduce .egg folder structures). Therefore none of those projects use setuptools for the install command by default for that reason.

In my opinion, the easy switch between versions case is better addressed with isolated runtime environments (such as virtualenv or conda environments for instance).

I would be +1 to have a setuptools option to disable the .egg folder features and use flat packages in site-packages + standard .dist-info metadata.

@ghost
Copy link
Author

ghost commented Apr 16, 2015

Original comment by dholth (Bitbucket: dholth, GitHub: dholth):


The .dist-info directory by itself shouldn't prevent egg-style "one dist per sys.path entry" installs, even though PEP 376 definitely assumes that all installed packages should be copied on top of each other to a single site-packages directory.

bdist_wheel preserves egg-info-writers metadata in .dist-info/ including namespace_packages.txt

The egg-style install is a very important use case.

Would anything have to change with .dist-info to allow editable installations? How are editable installations special apart from the obvious aspect of pointing to a source directory? (We might also have to sanction or change a regexp to support unversioned .dist-info directories.)

@ghost
Copy link
Author

ghost commented Apr 16, 2015

Original comment by ogrisel (Bitbucket: ogrisel, GitHub: ogrisel):


I think the way the python setup.py develop / pip install -e . commands work would not be impacted by the @cournape's original request. They can keep using the easy-install.pth / *.egg-link files.

I think the initial request was more to provide a way to have setuptools install a source distribution that can replicate the behavior of installing a wheel distribution as documented in PEP 427:

  • flat python packages installation directly in site-packages (without support for multiple version)
  • .dist-info metadata following PEP 376 (and to get pip uninstall / upgrades to work properly)

It does not have to be default behavior of setuptools. Maybe it would be provided as an alternative setup.py command, for instance python setup.py flat_install or maybe as a new kwarg for the setup() function itself.

@ghost
Copy link
Author

ghost commented Apr 22, 2015

Original comment by ncoghlan (Bitbucket: ncoghlan, GitHub: ncoghlan):


PEP 376 is still the current installation database specification for interoperability purposes. For setuptools, it only applies to the "--single-version-externally-managed" model, as PEP 376 is designed to have environment switching handled by a separate utility like virtualenv/venv, conda, nix or environment modules, rather than doing it through runtime sys.path manipulation as pkg_resources does.

@ghost
Copy link
Author

ghost commented May 20, 2015

Original comment by rbtcollins (Bitbucket: rbtcollins, GitHub: rbtcollins):


So, sounds like doing this for just --single-version-externally-managed would be strictly better than not doing it. We should probably make distutils itself do it to...

OR

we could pull the wrapping glue of of pip and put it somewhere (e.g. packaging) where any package manager that wants this can use it; and have setuptools use it from there.

@ghost
Copy link
Author

ghost commented Jun 9, 2015

Original comment by embray (Bitbucket: embray, GitHub: embray):


The problem with all this is that many users of our software know that they should use pip install ... to install (or upgrade) packages from PyPI. However many users (particularly more superficially saavy users) also know that python setup.py install is a thing, and will often, for various reasons, download a package and unpack it themselves and run that command. In some cases there are also Python distributions that install Python packages this way by default and use egg installs (although they probably shouldn't).

In either case you end up with a Python system built out of packages installed in two very different ways with different results, and it often ends up in a mess (especially if packages belonging to a namespace package are installed by both methods on the same system--it will break the namespace packages).

In some ways this is more a social problem than a technical problem. But it doesn't help that for years we've been telling users python setup.py install is the way to install Python packages. Except that now in most cases that's undesirable, at least for packages using setuptools. The only case where it is used is the multi-version install using egg dirs. I think that should be continued to be supported, but I think it's also a much more niche use case.

Update: I should add, I think multi-version installs are great in ecosystems that make consistent use of them, and I wish the Python language had explicit support for multiple package versions to make this better. But certainly in the scientific community this has not been the case--either virtualenv or now conda envs are used to support this case, and egg installs are not especially useful and only create confusion.

@ghost
Copy link
Author

ghost commented Jun 9, 2015

Original comment by dholth (Bitbucket: dholth, GitHub: dholth):


Just have setuptools add installed-files.txt to the egg-info directory. pip has been doing this for a long time before .dist-info was on the radar.

In the future we should have sdists where "setup.py install" is not implemented and the installer is required.

@ghost
Copy link
Author

ghost commented Jun 9, 2015

Original comment by embray (Bitbucket: embray, GitHub: embray):


@dholth has a good point--to get more or less consistent installs we could add to all projects' setup.cfg something like:

[install]
single-version-externally-managed = true
record = <distname>.egg-info/installed-files.txt

and get mostly consistent behavior across install methods. However, installing this way will not install dependencies from install_requires either, so it's still not a direct win.

@ghost
Copy link
Author

ghost commented Jun 9, 2015

Original comment by rbtcollins (Bitbucket: rbtcollins, GitHub: rbtcollins):


I'm not sure how install_requires ties in here, can you expand on that? Thats already addressed by using either pip or easy-install [which setuptools has internally, right?] or buildout etc etc etc.

@ghost
Copy link
Author

ghost commented Jun 10, 2015

Original comment by ogrisel (Bitbucket: ogrisel, GitHub: ogrisel):


Thanks Erik for the suggestion. The execution order of egg_info does not work though:

running egg_info
creating scikit_learn.egg-info
writing top-level names to scikit_learn.egg-info/top_level.txt
writing scikit_learn.egg-info/PKG-INFO
writing dependency_links to scikit_learn.egg-info/dependency_links.txt
writing manifest file 'scikit_learn.egg-info/SOURCES.txt'
reading manifest file 'scikit_learn.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'scikit_learn.egg-info/SOURCES.txt'
removing '/volatile/ogrisel/envs/py35/lib/python3.5/site-packages/scikit_learn-0.17.dev0-py3.5.egg-info' (and everything under it)
Copying scikit_learn.egg-info to /volatile/ogrisel/envs/py35/lib/python3.5/site-packages/scikit_learn-0.17.dev0-py3.5.egg-info
running install_scripts
writing list of installed files to 'scikit_learn.egg-info/installed-files.txt'

The generated installed-files.txt is written in the local scikit_learn.egg-info/ folder (in the source folder) but not in the previously generated site-packages/scikit_learn-0.17.dev0-py3.5.egg-info.

Note that that does not prevent pip to uninstall scikit-learn properly as site-packages/scikit_learn-0.17.dev0-py3.5.egg-info also has a top-level.txt file mentionning the sklearn folder and this is apparently enough for pip to cleanly uninstall or upgrade scikit-learn. I don't know if this is the case for all Python projects.

@ghost
Copy link
Author

ghost commented Jun 10, 2015

Original comment by embray (Bitbucket: embray, GitHub: embray):


Right, I think the top-level.txt should be good enough for pip to uninstall most projects. But you're right that installed-files.txt does not get installed into the site-packages/<distname>.egg-info, so you're still not getting the exact pip result. You still need to specify a record file though in order to use single-version-externally-managed, so putting it in the local .egg-info folder is a good a place as any, since it's already likely ignored by source control, etc.

@rbtcollins My point regarding install_requires is that what I (and I think many) would like to see is consistency between the end results of pip install . and python setup.py install, since the latter is still better known to many users. If a package is configured to use single_version_externally_managed (i.e. so as to not do an egg install by default), then setuptools does not use easy_install at all, and does not process install_requires. I don't personally care all that much but I certainly have users who like that, for example, when they ./setup.py install Astropy in a clean virtualenv it will also install Numpy without them having to think about it.

@ghost
Copy link
Author

ghost commented Jun 10, 2015

Original comment by dholth (Bitbucket: dholth, GitHub: dholth):


What I'm suggesting is that python setup.py install should say NotImplementedError: use pip. Then there would be no confusion.

@ghost
Copy link
Author

ghost commented Jun 10, 2015

Original comment by embray (Bitbucket: embray, GitHub: embray):


That just wouldn't be acceptable or practical any time in the near future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

0 participants