-
-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"files" is wrong when installed without "wheel" #115
Comments
In GitLab by @blueyed on Mar 17, 2020, 12:34 mentioned in commit blueyed/importlib_metadata@5d73789b0619d5ac53dfb2966902c13b24d31e2b |
In GitLab by @blueyed on Mar 17, 2020, 12:34 mentioned in merge request !114 |
In GitLab by @jaraco on Mar 25, 2020, 15:55 Before adding support for I'm leaning toward saying that importlib_metadata shouldn't support this use-case, that if someone wants to have metadata support, they should use the non-legacy features of the packaging ecosystem. What is the impact of this issue? Are there real-world use-cases that experience issues as a result of this non-support? |
In GitLab by @jaraco on Jun 5, 2020, 15:37 Bump. |
In GitLab by @jaraco on Sep 23, 2020, 13:41 Happy to revisit when there's additional information available, but for now, I'm declaring "won't fix" without prejudice. |
I've recently run into this issue with the Qiskit (meta)-package:
IMHO, there are two issues with how
Some possible ways to fix this:
|
I believe the issue may be rooted in how qiskit is installed. The preferred way to install Python packages is as a wheel using dist-info metadata. As eggs are deprecated and superseded by wheels, support for egg-info is incomplete and best-effort. Consider this example where I've installed qiskit from the source archive as a wheel and am able to avoid the undesirable behavior:
The key here was to pass This approach utilizes the preferred code path for metadata and avoids the undesirable behavior. Honestly, if it weren't for the fact that editable installs still rely heavily on egg-info metadata, I'd deprecate that format outright. When I wrote I'm not confident that Does the |
Yes, installing it with dist-info metadata (either using I 100% agree on using modern packaging practices. However, in this case, I'm not in the position to control the package installation process: I'm working on a tool to find undeclared and unused dependencies in the user's project, and as such I'm using
Do you foresee problems with any of the three fixes I outlined above? In any case, I'm happy to whip up a PR. |
All good points. Yes, I think I agree with the recommendation, and I'm delighted to hear you'd like to work on an implementation. Crucial will be to have tests that capture the relevant factors and expectations (including which scenarios apply to which logic branches). I'm slightly tempted to deprecate the use of SOURCES.txt altogether, but that probably should happen separately. Feel free to take a stab at it and let me know if you have any questions. I can be reached in Gitter (except I haven't enabled this project) and Discord. |
I recently touched this code and also think looking at |
This corresponds to the qiskit[1] meta-package which: - does not contain any (runtime) Python code itself, but serves as a mechanism to install its transitive dependencies (which populate the qiskit package namespace). - is distributed as a source archive. - includes a top_level.txt which is empty (contains a single newline), arguably correct given that it does not directly install any importable packages/modules. - when installed as an egg, provides a SOURCES.txt which is incorrect from a runtime POV: it references 3 .py files, a setup.py and two files under test/, none of which are actually installed. - when installed (as an egg) by pip, provides an installed-files.txt file which is _more_ accurate than SOURCES.txt, since it reflects the files that are actually available after installation. importlib_metadata reports incorrect .files for this package, because we end up using SOURCES.txt. It is better to use installed-files.txt when it is available. Furthermore, as a result of this, packages_distributions() also incorrectly reports that this packages provides imports names that do not actually exist ("setup" and "test", in qiskit's case). This commit adds EggInfoPkgPipInstalledNoModules, a test project that mimics the egg installation of qiskit, and adds it to existing test cases, as well as adding a new test cases specifically for verifying packages_distributions() with egg-info packages. The following tests fail in this commit, but will be fixed in the next commit: - PackagesDistributionsTest.test_packages_distributions_on_eggs - APITests.test_files_egg_info See the python#115 issue for more details. [1]: qiskit is found at https://pypi.org/project/qiskit/0.41.1/#files
When listing the files in a *.egg-info distribution, prefer using *.egg-info/installed-files.txt instead of *.egg-info/SOURCES.txt. installed-files.txt is written by pip[1] when installing a package, whereas the SOURCES.txt is written by setuptools when creating a source archive[2]. installed-files.txt is only present when the package has been installed by pip, so we cannot depend on it always being available. However, when it _is_ available, it is an accurate record of what files are installed. SOURCES.txt, on the other hand, is always avaiable, but is not always accurate: Since it is generated from the source archive, it will often include files (like 'setup.py') that are no longer available after the package has been installed. Fixes python#115 for the cases where a installed-files.txt file is available. [1]: https://pip.pypa.io/en/stable/news/#v0-3 [2]: https://setuptools.pypa.io/en/latest/deprecated/python_eggs.html#sources-txt-source-files-manifest
As established in previous commits, the SOURCES.txt file is not always an accurate source of files that are present after a package has been installed. One situation where this inaccuracy is problematic is when top_level.txt is also missing, and packages_distributions() is forced to infer the provided import names based on Distribution.files. In this situation we end up with incorrect mappings between import packages and distribution packages, including import packages that clearly do not exist at all. For example, a SOURCES.txt that lists setup.py (which is used _when_ installing, but is not available after installation), will see that setup.py returned from .files, which then will cause packages_distributions() to claim a mapping from the non-existent 'setup' import name to this distribution. This commit adds EggInfoPkgSourcesFallback which demostrates such a scenario, and adds this new class to a couple of relevant tests. A couple of these tests are currently failing, to demonstrate the issue at hand. These test failures will be fixed in the next commit. See the python#115 issue for more details.
Add an extra filter on the paths returned from Distribution.files, to prevent paths that don't exist on the filesystem from being returned. This attempts to solve the issue of .files returning incorrect information based on the inaccuracies of SOURCES.txt. As the code currently is organized, it is more complicated to write this such that it only applies to the information read from SOURCES.txt specifically, hence we apply it to _all_ of .files instead. This fixes python#115, also in the case where there is no installed-files.txt file available. [1]: https://pip.pypa.io/en/stable/news/#v0-3 [2]: https://setuptools.pypa.io/en/latest/deprecated/python_eggs.html#sources-txt-source-files-manifest
This corresponds to the qiskit[1] meta-package which: - does not contain any (runtime) Python code itself, but serves as a mechanism to install its transitive dependencies (which populate the qiskit package namespace). - is distributed as a source archive. - includes a top_level.txt which is empty (contains a single newline), arguably correct given that it does not directly install any importable packages/modules. - when installed as an egg, provides a SOURCES.txt which is incorrect from a runtime POV: it references 3 .py files, a setup.py and two files under test/, none of which are actually installed. - when installed (as an egg) by pip, provides an installed-files.txt file which is _more_ accurate than SOURCES.txt, since it reflects the files that are actually available after installation. importlib_metadata reports incorrect .files for this package, because we end up using SOURCES.txt. It is better to use installed-files.txt when it is available. Furthermore, as a result of this, packages_distributions() also incorrectly reports that this packages provides imports names that do not actually exist ("setup" and "test", in qiskit's case). This commit adds EggInfoPkgPipInstalledNoModules, a test project that mimics the egg installation of qiskit, and adds it to existing test cases, as well as adding a new test cases specifically for verifying packages_distributions() with egg-info packages. The following tests fail in this commit, but will be fixed in the next commit: - PackagesDistributionsTest.test_packages_distributions_on_eggs - APITests.test_files_egg_info See the python#115 issue for more details. [1]: qiskit is found at https://pypi.org/project/qiskit/0.41.1/#files
When listing the files in a *.egg-info distribution, prefer using *.egg-info/installed-files.txt instead of *.egg-info/SOURCES.txt. installed-files.txt is written by pip[1] when installing a package, whereas the SOURCES.txt is written by setuptools when creating a source archive[2]. installed-files.txt is only present when the package has been installed by pip, so we cannot depend on it always being available. However, when it _is_ available, it is an accurate record of what files are installed. SOURCES.txt, on the other hand, is always avaiable, but is not always accurate: Since it is generated from the source archive, it will often include files (like 'setup.py') that are no longer available after the package has been installed. Fixes python#115 for the cases where a installed-files.txt file is available. [1]: https://pip.pypa.io/en/stable/news/#v0-3 [2]: https://setuptools.pypa.io/en/latest/deprecated/python_eggs.html#sources-txt-source-files-manifest
As established in previous commits, the SOURCES.txt file is not always an accurate source of files that are present after a package has been installed. One situation where this inaccuracy is problematic is when top_level.txt is also missing, and packages_distributions() is forced to infer the provided import names based on Distribution.files. In this situation we end up with incorrect mappings between import packages and distribution packages, including import packages that clearly do not exist at all. For example, a SOURCES.txt that lists setup.py (which is used _when_ installing, but is not available after installation), will see that setup.py returned from .files, which then will cause packages_distributions() to claim a mapping from the non-existent 'setup' import name to this distribution. This commit adds EggInfoPkgSourcesFallback which demostrates such a scenario, and adds this new class to a couple of relevant tests. A couple of these tests are currently failing, to demonstrate the issue at hand. These test failures will be fixed in the next commit. See the python#115 issue for more details.
Add an extra filter on the paths returned from Distribution.files, to prevent paths that don't exist on the filesystem from being returned. This attempts to solve the issue of .files returning incorrect information based on the inaccuracies of SOURCES.txt. As the code currently is organized, it is more complicated to write this such that it only applies to the information read from SOURCES.txt specifically, hence we apply it to _all_ of .files instead. This fixes python#115, also in the case where there is no installed-files.txt file available. [1]: https://pip.pypa.io/en/stable/news/#v0-3 [2]: https://setuptools.pypa.io/en/latest/deprecated/python_eggs.html#sources-txt-source-files-manifest
In GitLab by @blueyed on Mar 17, 2020, 12:32
When "wheel" is not installed/used, installing a package will not contain "RECORD", and it falls back to reading "SOURCES.txt" (https://gitlab.com/python-devs/importlib_metadata/blob/3150ed4da9e1267d0787c6f4c1f8258a26a1dd93/importlib_metadata/__init__.py#L270), which then might result in paths not being
locate()
able, when a src-based setup is used:This is similar to https://gitlab.com/python-devs/importlib_metadata/-/issues/112, but in this case here it could use
installed-files.txt
from the egg-info.The text was updated successfully, but these errors were encountered: