Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix .files and inferred packages_distributions for .egg-info packages #437

Merged
merged 23 commits into from
Apr 10, 2023

Conversation

jherland
Copy link
Contributor

The first three commits fix #115 for the case where a .egg-info package includes installed-files.txt:
We prefer installed-files.txt over SOURCES.txt as a source for Distribution.files, as the former is deemed more accurate when it is available.

The last two commits attempts to fix #115 for the case where we have no choice but to parse SOURCES.txt:
This assumes that Distribution.files should never contain a path to a non-existent file.
We simply remove non-existent files with a filter, thus removing any entries from SOURCES.txt that refer to files that were part of the source package, but not part of the final installation.

Commits:

  • tests/fixtures: Fix FilesDef type to include bytes values
  • Add tests for egg-info package with no installed modules
  • Distribution.files: Prefer *.egg-info/installed-files.txt to SOURCES.txt
  • Add tests for egg-info package with .files from inaccurate SOURCES.txt
  • Distribution.files: Only return files that actually exist

The build_files() helper which handles these FilesDef nested dicts
already has code to handle values of type 'bytes' (they get written
verbatim instead of first being processed by DALS()).

Changing the FilesDef type to reflect this prevents Mypy from failing
when encountering legitimate bytes values in these dicts.
This corresponds to the qiskit[1] meta-package which:

  - does not contain any (runtime) Python code itself, but serves as a
    mechanism to install its transitive dependencies (which populate the
    qiskit package namespace).

  - is distributed as a source archive.

  - includes a top_level.txt which is empty (contains a single newline),
    arguably correct given that it does not directly install any
    importable packages/modules.

  - when installed as an egg, provides a SOURCES.txt which is incorrect
    from a runtime POV: it references 3 .py files, a setup.py and two
    files under test/, none of which are actually installed.

  - when installed (as an egg) by pip, provides an installed-files.txt
    file which is _more_ accurate than SOURCES.txt, since it reflects
    the files that are actually available after installation.

importlib_metadata reports incorrect .files for this package, because
we end up using SOURCES.txt. It is better to use installed-files.txt
when it is available.

Furthermore, as a result of this, packages_distributions() also
incorrectly reports that this packages provides imports names that do
not actually exist ("setup" and "test", in qiskit's case).

This commit adds EggInfoPkgPipInstalledNoModules, a test project that
mimics the egg installation of qiskit, and adds it to existing test
cases, as well as adding a new test cases specifically for verifying
packages_distributions() with egg-info packages. The following tests
fail in this commit, but will be fixed in the next commit:

  - PackagesDistributionsTest.test_packages_distributions_on_eggs
  - APITests.test_files_egg_info

See the python#115 issue for more details.

[1]: qiskit is found at https://pypi.org/project/qiskit/0.41.1/#files
When listing the files in a *.egg-info distribution, prefer using
*.egg-info/installed-files.txt instead of *.egg-info/SOURCES.txt.

installed-files.txt is written by pip[1] when installing a package,
whereas the SOURCES.txt is written by setuptools when creating a
source archive[2].

installed-files.txt is only present when the package has been installed
by pip, so we cannot depend on it always being available. However, when
it _is_ available, it is an accurate record of what files are installed.
SOURCES.txt, on the other hand, is always avaiable, but is not always
accurate: Since it is generated from the source archive, it will often
include files (like 'setup.py') that are no longer available after the
package has been installed.

Fixes python#115 for the cases where a installed-files.txt file is available.

[1]: https://pip.pypa.io/en/stable/news/#v0-3
[2]: https://setuptools.pypa.io/en/latest/deprecated/python_eggs.html#sources-txt-source-files-manifest
As established in previous commits, the SOURCES.txt file is not always
an accurate source of files that are present after a package has been
installed.

One situation where this inaccuracy is problematic is when top_level.txt
is also missing, and packages_distributions() is forced to infer the
provided import names based on Distribution.files. In this situation we
end up with incorrect mappings between import packages and distribution
packages, including import packages that clearly do not exist at all.

For example, a SOURCES.txt that lists setup.py (which is used _when_
installing, but is not available after installation), will see that
setup.py returned from .files, which then will cause
packages_distributions() to claim a mapping from the non-existent
'setup' import name to this distribution.

This commit adds EggInfoPkgSourcesFallback which demostrates such a
scenario, and adds this new class to a couple of relevant tests.
A couple of these tests are currently failing, to demonstrate the
issue at hand. These test failures will be fixed in the next commit.

See the python#115 issue for more details.
Add an extra filter on the paths returned from Distribution.files, to
prevent paths that don't exist on the filesystem from being returned.

This attempts to solve the issue of .files returning incorrect
information based on the inaccuracies of SOURCES.txt.

As the code currently is organized, it is more complicated to write
this such that it only applies to the information read from SOURCES.txt
specifically, hence we apply it to _all_ of .files instead.

This fixes python#115, also in the case where there is no installed-files.txt
file available.

[1]: https://pip.pypa.io/en/stable/news/#v0-3
[2]: https://setuptools.pypa.io/en/latest/deprecated/python_eggs.html#sources-txt-source-files-manifest
@jherland jherland force-pushed the egg-metadata-work branch from f3fecf8 to 22d9ea5 Compare March 11, 2023 11:50
Copy link
Member

@FFY00 FFY00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jherland!

This looks good to me, but I am unsure about the last commit (only returning files that exist on the file system), so I'd like @jaraco to also have a look if he's available.

@jherland
Copy link
Contributor Author

Thanks for the review @FFY00. I took a look at the failing tests. The 3.12 failures seem related to #436, and the other three are cases of #434/#435.

@FFY00
Copy link
Member

FFY00 commented Mar 13, 2023

Yep. I am just gonna wait a couple days to give Jason a bit more time to review, otherwise I'll just merge the fixes, unblocking this PR.

Copy link
Member

@jaraco jaraco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jherland for the contrib. I can see you took great care to be minimally invasive while addressing the core concerns. Well done, and kudos for your approach, especially considering the wonky structure of the fixtures and test parameterization.

Overall, this approach looks acceptable. My main concern is with the non-metadata files as presented by installed-files. I have some style/quality concerns, but nothing serious. Feel free to reject or ignore any of my comments and I can address them to my satisfaction.

I'll want to spend a bit more time with this change before merging it in, but feel free to address some of my comments as your time permits.

importlib_metadata/__init__.py Show resolved Hide resolved
tests/fixtures.py Show resolved Hide resolved
tests/test_main.py Outdated Show resolved Hide resolved
tests/fixtures.py Show resolved Hide resolved
importlib_metadata/__init__.py Outdated Show resolved Hide resolved
importlib_metadata/__init__.py Outdated Show resolved Hide resolved
importlib_metadata/__init__.py Outdated Show resolved Hide resolved
tests/test_main.py Outdated Show resolved Hide resolved
tests/test_main.py Show resolved Hide resolved
jherland added 10 commits March 19, 2023 03:44
Move test_packages_distributions_on_eggs() method into a new class,
PackagesDistributionsEggTest, to prevent applying unnecessary fixtures
to existing tests.
…URCES.txt

Rename starved_egg to sources_fallback.
The import names that were created by these tests were not valid Python
identifiers.

Fix that, and furthermore: add another check to verify that _all_ import
names returned from packages_distributions() are always valid Python
identifiers. Ideally we should check that all keys returned from
packages_distributions() are valid import names (i.e. can be imported),
but this is at least a step in the right direction.
…port names"

This reverts commit 70ff991.

This behavior was adopted in 5e8260c and subsequently adapted as part of python#443.
tests/test_main.py Outdated Show resolved Hide resolved
@jaraco jaraco merged commit 3112653 into python:main Apr 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

"files" is wrong when installed without "wheel"
3 participants