Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Dependency to Import mapping on the qiskit package does not include the qiskit import name #176

Open
Nour-Mws opened this issue Feb 22, 2023 · 7 comments
Labels

Comments

@Nour-Mws
Copy link
Collaborator

FawltyDeps v0.3.1

When running FawltyDeps (in the customary FawltyDeps env) on the Algorithms project, the lists of unused deps and undeclared imports do not contain the qiskit package as it is both used and declared:
Excerpt:

{
          "name": "qiskit",
          "source": {
            "path": "quantum/q_full_adder.py",
            "lineno": 13
          }
}
{
      "name": "qiskit",
      "source": {
        "path": "requirements.txt"
      },
      "import_names": [
        "qiskit"
      ],
      "mapping": "IDENTITY"
    },

As FawltyDeps defaults to identity mapping in this instance, the import and dependency name get matched.

When I tried installing all declared dependencies of Algorithms (+FawltyDeps) in a new environment, I got qiskit in the list of undeclared and unused:

These imports appear to be undeclared dependencies:
- 'django'
- 'mpmath'
- 'pytest'
- 'qiskit'
- 'scipy'
- 'seaborn'
These dependencies appear to be unused (i.e. not imported):
- 'keras'
- 'projectq'
- 'qiskit'
- 'texttable'
- 'yulewalker'

Looking at the json output, I got:

   {
      "name": "qiskit",
      "references": [
        {
          "name": "qiskit",
          "source": {
            "path": "requirements.txt"
          },
          "import_names": [
            "test",
            "setup"
          ],
          "mapping": "DEPENDENCY_TO_IMPORT"
        }
      ]
    },

No qiskit to be seen in the import_names!

When I tried to import qiskit in Python in that venv, however, it worked without problem!

I thought for a second that qiskit might be exposed by other packages in the environment, but in that case it would have been matched and it wouldn't have appeared in the undeclared imports.

I tried importing qiskit in the FawltyDeps-only environment and Python couldn't find it (as expected).

I haven't dug deeper than that. My next step would be to see how Python's import finder managed to find qiskit in the 2nd environment in the first place.

@jherland
Copy link
Member

I'm guessing there's something weird about qiskit's metadata causing packages_distributions() to return test and setup, but not qiskit. Maybe this issue is related to #173 in that it's yet another instance of the metadata of Python packages not being 100% precise/accurate, but still "close enough" that it works in practice?

@jherland
Copy link
Member

jherland commented Mar 6, 2023

I started digging more into this...

The qiskit package itself:

  • At time of writing there is only one file available at https://pypi.org/project/qiskit/#files: qiskit-0.41.1.tar.gz
  • This is a very small tarball. In addition to various packaging stuff and other metadata, there are only three python files:
    • setup.py
    • test/test_metapackage.py
    • test/test_simulation.py
  • After pip installing the package, none of these python files are anywhere to be found, the only files that appear to "survive" the installation are the ones in the qiskit.egg-info subdir of the package tarball. These end up in the lib/pythonX.Y/site-packages/qiskit-0.41.1-py3.11.egg-info subdirectory of the virtualenv. This directory ends up with the following files (almost all directly from the tarball):
    • dependency_links.txt
    • installed-files.txt (this is not in the tarball, but auto-generated by pip while installing the package)
    • PKG-INFO
    • requires.txt (this lists some mandatory dependencies (qiskit-terra==0.23.2, qiskit-aer==0.11.2, qiskit-ibmq-provider==0.20.1) and a bunch of extras)
    • SOURCES.txt (this seems to be the file that packages_distributions() ends up querying, see below)
    • top_level.txt (this is empty)
  • So there seems to be no actual code inside the package itself. At least nothing to be executed at runtime, only install-time stuff.
  • Looking at the setup.py (which is being run by pip when installing this package) confirms this impression: Both py_modules and packages are empty lists, and the main part of the file is devoted to the dependencies (which matches what is inside the requires.txt above)
  • As such, my impression of this package is that it is a pure meta-package: it provides nothing on its own, but provides a mechanism for conveniently installing its dependencies (which do end up populating the qiskit.* package namespace with various sub-packages).

Why does packages_distributions() return import names setup and test for this package?

Looking at https://github.com/python/importlib_metadata/blob/main/importlib_metadata/__init__.py#L878 packages_distributions() follows a three-level strategies to find the import names that belong to a package:

  1. _top_level_declared(): If the package contains a top_level.txt file, that determines the import names.
  2. _top_level_inferred(): Otherwise, try to infer from the files property of the package, more specifically, look for .py files in the files property, and use those to deduce package/module names that must be importable. So how do we find files?
    2.1. _read_files_distinfo(): Files are found by looking at the RECORD file inside the package metadata directory, or:
    2.2. _read_files_egginfo(): Files are found by looking at the SOURCES.txt file inside the package metadata directory.

I've searched around various virtualenvs on my machine, looking for top_level.txt, RECORD, and SOURCES.txt files, and AFAICS:

  • Most packages have a top_level.txt file
  • ~All packages have a RECORD file
  • qiskit is the only package I have found that only has a SOURCES.txt file. I'm sure there are others out there, but not in my venvs.

For the record here is the full contents of qiskit's SOURCES.txt:

AUTHORS
LICENSE.txt
README.md
setup.py
qiskit.egg-info/PKG-INFO
qiskit.egg-info/SOURCES.txt
qiskit.egg-info/dependency_links.txt
qiskit.egg-info/requires.txt
qiskit.egg-info/top_level.txt
test/test_metapackage.py
test/test_simulation.py

From here, the code in step 2 above deduces that there are two import names available: setup (from setup.py), and test (from the two .py files under test/).

This list of files is accurate with respect to what's in the qiskit tarball, but not accurate when compared to what ends up installed in site-packages (see the first half of this comment).

At this point in time I'm not sure if the problem is that qiskit's SOURCES.txt is incorrect in terms of what should have been in there, or if the problem is that importlib_metadata has the wrong expectations about what SOURCES.txt files are supposed to contain.

@jherland
Copy link
Member

jherland commented Mar 8, 2023

For the record, qiskit's own README.md file supports my impression of it being a meta-package that only exists to automatically install other packages:

Qiskit is made up of elements that work together to enable quantum computing. This is a simple meta-package to install the elements of Qiskit altogether.

@jherland
Copy link
Member

jherland commented Mar 8, 2023

continuing my deep dive into the obscure packaging practices of Python...

What is SOURCES.txt and is it a relevant source of information?

I found some relevant documentation in setuptools about the SOURCES.txt file:

Although this metadata is included with distributed eggs, it is not actually used at runtime for any purpose. Its function is to ensure that setuptools-built source distributions can correctly discover what files are part of the project’s source, even if the list had been generated using revision control metadata on the original author’s system.

In other words, SOURCES.txt has little or no runtime value for being included in distributed eggs, and it is possible that future versions of the bdist_egg and install_egg_info commands will strip it before installation or distribution. Therefore, do not rely on its being available outside of an original source directory or source distribution.

In addition, this documentation is buried within a section called Guides on backward compatibility & deprecated practice in the setuptools docs.

From that documentation, it would seem like importlib_metadata is wrong to read SOURCES.txt and use it as a source for runtime-available modules/packages. I'm considering opening an issue against importlib_metadata about its use of SOURCES.txt, but I'm not sure that I have a good suggestion for what importlib_metadata should do instead in this corner case...

Furthermore, the corresponding stdlib module importlib.metadata which has been added in recent Python version has exactly the same problem (in fact, it looks like the stdlib module just copied in the importlib_metadata in commit python/cpython@1bbf7b6).

However, as part of this commit that created importlib.metadata there's this test vector: https://github.com/python/cpython/blob/main/Lib/test/test_importlib/fixtures.py#L175
This test example helps give an impression of what importlib_metadata expects a SOURCES.txt to look like:

mod.py
egginfo_pkg.egg-info/top_level.txt

This does indeed follow a similar layout from the SOURCES.txt that ships with qiskit, except that qiskit's only has .py files that are not actually installed, whereas this one mentions only files that are installed.

How easy is it to install a more modern "wheel-style" version of qiskit?

Not very hard it turns out!

I did some experiments with Python v3.10 and the venvs it create by default (containing pip-22.3.1 and setuptools-65.5.0)

qiskit is distritbuted as a source tarball. When we pip install qiskit, the default (and old-style) behavior in a clean venv is to unpack that source tarball in a temporary dir and run python3 setup.by install which delegates via python3 setup.by bdist to python3 setup.by bdist_egg. This creates the "egg" distribution with the problematic SOURCES.txt

However, if the wheel module is available when we run pip install qiskit, we will end up running a slightly different python3 setup.by bdist_wheel action, which builds a more modern qiskit-0.41.1-py3-none-any.whl from the tarball, and then installs that. This wheel has a RECORD file which does not have the same problem as SOURCES.txt:

qiskit-0.41.1.dist-info/AUTHORS,sha256=JOJ8ywsZHCVyC17lmbjgyE7E6qkiXxSz4-4mSLApmK0,8122
qiskit-0.41.1.dist-info/INSTALLER,sha256=zuuue4knoyJ-UwPPXg8fezS7VCrXJQrAP7zeNuwvFQg,4
qiskit-0.41.1.dist-info/LICENSE.txt,sha256=YhEPPmD1lMhhZ0900AcoOo7Y3-5EmFkTLhHSjlwrDZY,11416
qiskit-0.41.1.dist-info/METADATA,sha256=XpOLcR95UVUc-9MMb28ryRy9fFqzjR29MrfJH5-y7xA,10211
qiskit-0.41.1.dist-info/RECORD,,
qiskit-0.41.1.dist-info/REQUESTED,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
qiskit-0.41.1.dist-info/WHEEL,sha256=2wepM1nk4DS4eFpYrW1TTqPcoGNfHhhO_i5m4cOimbo,92
qiskit-0.41.1.dist-info/direct_url.json,sha256=U0lPU0NQXuxG0047XS2Tyjf9cEsq2ljM3Vpd4-FIZds,60
qiskit-0.41.1.dist-info/top_level.txt,sha256=AbpHGcgLb-kRsJGnwFEktk7uzpZOCcBY74-YBdrKVGs,1

From this metadata, packages_distributions() will (correctly) deduce that the qiskit package does not contribute any importable modules.

This new-style (and preferable) behavior can also be forced by passing --use-pep517 to pip install. Pip will then auto-install wheel before it proceeds with installing via python3 setup.by bdist_wheel.

Of course, these are all just workarounds: We cannot force our users to install qiskit with --use-pep517, and we cannot force qiskit to build and ship a wheel instead of their current source tarball.

(However, in the future, if FawltyDeps is ever in a position to e.g. install dependencies into its own venv for the purposes of dep-to-import resolution, then we should keep --use-pep517 in mind, as it will likely make our lives considerably easier...)

TL;DR: So, does any of this actually matter in the end?

I will admit to doing this deep-dive partially because I wanted to learn how deep the rabbit hole goes when it comes to packaging in Python. But we also have a real problem to solve here: From FawltyDeps' perspective we have a project that:

  • imports modules under the qiskit.* namespace (for which FD currently assumes there should be a dependency that provides the qiskit import name)
  • declares a dependency called qiskit.
  • has a current virtualenv with all dependencies installed, including:
    • package qiskit which provides imports {'setup'. 'test'} (in the egg version) or no imports (in the wheel version)
    • packages qiskit-terra and qiskit-ibmq-provider (which are transitive dependencies of qiskit) and both provide the qiskit package (really, probably, sub-packages under the qiskit namespace).

FD's analysis currently does not pick up the qiskit-terra and qiskit-ibmq-provider packages at all, as we only look for the declared qiskit package entry. And since this does not actually provide qiskit as an import name, we end up reporting it as an unused dependency.

Furthermore, since no other declared dependency provides qiskit, we also end up with qiskit reported as an undeclared dependency. Fun all around!

In some eyes, the identity mapping actually does a better job here, although that is actually somewhat misleading once you start digging into the details.

However, handling this Correctly™️ involves:

  • Keeping track of transitive dependencies (package qiskit depends on packages qiskit-terra and qiskit-ibmq-provider, which do provide the qiskit import name).
  • Recognizing that qiskit is an exception to our rule that all imports should be declared as direct dependencies. In fact the whole point of the qiskit package is to provide the qiskit import name via its transitive dependencies.

This will take some more thinking. In the meantime we might be better off introducing the identity mapping as not only a fallback mapping, but also a supplementary mapping that is used even when the dependency claims to provide other import names...

@jherland
Copy link
Member

jherland commented Mar 9, 2023

FWIW, I've added this comment to a similar issue in the importlib_metadata repo. Thinking about writing a PR for importlib_metadata as well.

@jherland
Copy link
Member

Created python/importlib_metadata#437 to have importlib_metadata return better information for qiskit and similar packages.

@Nour-Mws Nour-Mws added P3 minor: not priorized research-needed labels Mar 16, 2023
jherland added a commit that referenced this issue Jan 19, 2024
The python-algortihms project highlights an outstanding issue in
FawltyDeps: Placeholder packages. As discussed in issue #176, these are
packages that don't provide the expected import name (or any import name
at all), but instead depend on other packages (transitive dependencies)
which will then supply the relevant import name.

The first example we came across of such a package (in #176) was qiskit.
Another example is tensorflow, but only on Windows: On Windows the
tensorflow package is a 1.9kB placeholder that depends on
tensorflow-intel (266MB) to bring in the actual meat. (On POSIX, the
tensorflow package itself contains everything.)

We make two changes to our real_projects test for python-algorithms:

1. We add the posix_only flag to the all_reqs_installed experiment, to
   prevent it from running on Windows. Its expected results would be
   different there (because of tensorflow-intel).

2. We add a new experiment, some_reqs_customized, which uses a
   custom_mapping to resolve qiskit and tensorflow. The remaining
   requirements are installed (as in the previous experiment).
   This yields the "best" FawltyDeps result for this project.
   The remaining undeclared/unused dependencies are presumed to be
   true positives for this project.
jherland added a commit that referenced this issue Jan 19, 2024
The python-algortihms project highlights an outstanding issue in
FawltyDeps: Placeholder packages. As discussed in issue #176, these are
packages that don't provide the expected import name (or any import name
at all), but instead depend on other packages (transitive dependencies)
which will then supply the relevant import name.

The first example we came across of such a package (in #176) was qiskit.
Another example is tensorflow, but only on Windows: On Windows the
tensorflow package is a 1.9kB placeholder that depends on
tensorflow-intel (266MB) to bring in the actual meat. (On POSIX, the
tensorflow package itself contains everything.)

We make two changes to our real_projects test for python-algorithms:

1. We add the posix_only flag to the all_reqs_installed experiment, to
   prevent it from running on Windows. Its expected results would be
   different there (because of tensorflow-intel).

2. We add a new experiment, some_reqs_customized, which uses a
   custom_mapping to resolve qiskit and tensorflow. The remaining
   requirements are installed (as in the previous experiment).
   This yields the "best" FawltyDeps result for this project.
   The remaining undeclared/unused dependencies are presumed to be
   true positives for this project.
jherland added a commit that referenced this issue Jan 19, 2024
The python-algortihms project highlights an outstanding issue in
FawltyDeps: Placeholder packages. As discussed in issue #176, these are
packages that don't provide the expected import name (or any import name
at all), but instead depend on other packages (transitive dependencies)
which will then supply the relevant import name.

The first example we came across of such a package (in #176) was qiskit.
Another example is tensorflow, but only on Windows: On Windows the
tensorflow package is a 1.9kB placeholder that depends on
tensorflow-intel (266MB) to bring in the actual meat. (On POSIX, the
tensorflow package itself contains everything.)

We make two changes to our real_projects test for python-algorithms:

1. We add the posix_only flag to the all_reqs_installed experiment, to
   prevent it from running on Windows. Its expected results would be
   different there (because of tensorflow-intel).

2. We add a new experiment, some_reqs_customized, which uses a
   custom_mapping to resolve qiskit and tensorflow. The remaining
   requirements are installed (as in the previous experiment).
   This yields the "best" FawltyDeps result for this project.
   The remaining undeclared/unused dependencies are presumed to be
   true positives for this project.
jherland added a commit that referenced this issue Jan 25, 2024
The python-algortihms project highlights an outstanding issue in
FawltyDeps: Placeholder packages. As discussed in issue #176, these are
packages that don't provide the expected import name (or any import name
at all), but instead depend on other packages (transitive dependencies)
which will then supply the relevant import name.

The first example we came across of such a package (in #176) was qiskit.
Another example is tensorflow, but only on Windows: On Windows the
tensorflow package is a 1.9kB placeholder that depends on
tensorflow-intel (266MB) to bring in the actual meat. (On POSIX, the
tensorflow package itself contains everything.)

We make two changes to our real_projects test for python-algorithms:

1. We add the posix_only flag to the all_reqs_installed experiment, to
   prevent it from running on Windows. Its expected results would be
   different there (because of tensorflow-intel).

2. We add a new experiment, some_reqs_customized, which uses a
   custom_mapping to resolve qiskit and tensorflow. The remaining
   requirements are installed (as in the previous experiment).
   This yields the "best" FawltyDeps result for this project.
   The remaining undeclared/unused dependencies are presumed to be
   true positives for this project.
@jherland
Copy link
Member

FWIW, there seems to be a recent (2024-02-15) change in qiskit upstream: Starting from v1.0.0 qiskit now does provide the qiskit import name itself (instead of relying on dependencies to populate the qiskit.* namespace). They mention this in their own release notes, as well as a separate page to document the change and how to upgrade across the v1.0 boundary.

It would seem that the entire qiskit issue simply goes away when qiskit >= v1.0 is used...

jherland added a commit that referenced this issue Feb 28, 2024
qiskit v1.0 starts to declare the qiskit import name (see issue #176 for
details about the weirdness that precedes v1.0). This changes the
behavior of our real_projects test of the Algortihms project.

qiskit has also dropped support for Python 3.7 (since v0.44), hence it
is easier for us to keep our tests (which need to work with Python 3.7)
running on the older version, for now.
jherland added a commit that referenced this issue Feb 29, 2024
The python-algortihms project highlights an outstanding issue in
FawltyDeps: Placeholder packages. As discussed in issue #176, these are
packages that don't provide the expected import name (or any import name
at all), but instead depend on other packages (transitive dependencies)
which will then supply the relevant import name.

The first example we came across of such a package (in #176) was qiskit.
Another example is tensorflow, but only on Windows: On Windows the
tensorflow package is a 1.9kB placeholder that depends on
tensorflow-intel (266MB) to bring in the actual meat. (On POSIX, the
tensorflow package itself contains everything.)

We make two changes to our real_projects test for python-algorithms:

1. We add the posix_only flag to the all_reqs_installed experiment, to
   prevent it from running on Windows. Its expected results would be
   different there (because of tensorflow-intel).

2. We add a new experiment, some_reqs_customized, which uses a
   custom_mapping to resolve qiskit and tensorflow. The remaining
   requirements are installed (as in the previous experiment).
   This yields the "best" FawltyDeps result for this project.
   The remaining undeclared/unused dependencies are presumed to be
   true positives for this project.
jherland added a commit that referenced this issue Mar 4, 2024
qiskit v1.0 starts to declare the qiskit import name (see issue #176 for
details about the weirdness that precedes v1.0). This changes the
behavior of our real_projects test of the Algortihms project.

qiskit has also dropped support for Python 3.7 (since v0.44), hence it
is easier for us to keep our tests (which need to work with Python 3.7)
running on the older version, for now.
jherland added a commit that referenced this issue Mar 7, 2024
The python-algortihms project highlights an outstanding issue in
FawltyDeps: Placeholder packages. As discussed in issue #176, these are
packages that don't provide the expected import name (or any import name
at all), but instead depend on other packages (transitive dependencies)
which will then supply the relevant import name.

The first example we came across of such a package (in #176) was qiskit.
Another example is tensorflow, but only on Windows: On Windows the
tensorflow package is a 1.9kB placeholder that depends on
tensorflow-intel (266MB) to bring in the actual meat. (On POSIX, the
tensorflow package itself contains everything.)

We make two changes to our real_projects test for python-algorithms:

1. We add the posix_only flag to the all_reqs_installed experiment, to
   prevent it from running on Windows. Its expected results would be
   different there (because of tensorflow-intel).

2. We add a new experiment, some_reqs_customized, which uses a
   custom_mapping to resolve qiskit and tensorflow. The remaining
   requirements are installed (as in the previous experiment).
   This yields the "best" FawltyDeps result for this project.
   The remaining undeclared/unused dependencies are presumed to be
   true positives for this project.
jherland added a commit that referenced this issue Mar 11, 2024
The python-algortihms project highlights an outstanding issue in
FawltyDeps: Placeholder packages. As discussed in issue #176, these are
packages that don't provide the expected import name (or any import name
at all), but instead depend on other packages (transitive dependencies)
which will then supply the relevant import name.

The first example we came across of such a package (in #176) was qiskit.
Another example is tensorflow, but only on Windows: On Windows the
tensorflow package is a 1.9kB placeholder that depends on
tensorflow-intel (266MB) to bring in the actual meat. (On POSIX, the
tensorflow package itself contains everything.)

We make two changes to our real_projects test for python-algorithms:

1. We add the posix_only flag to the all_reqs_installed experiment, to
   prevent it from running on Windows. Its expected results would be
   different there (because of tensorflow-intel).

2. We add a new experiment, some_reqs_customized, which uses a
   custom_mapping to resolve qiskit and tensorflow. The remaining
   requirements are installed (as in the previous experiment).
   This yields the "best" FawltyDeps result for this project.
   The remaining undeclared/unused dependencies are presumed to be
   true positives for this project.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants