Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pluggable wheel verifier #6248

Open
zooba opened this issue Feb 9, 2019 · 2 comments
Open

Pluggable wheel verifier #6248

zooba opened this issue Feb 9, 2019 · 2 comments
Labels
state: needs discussion This needs some more discussion type: feature request Request for a new feature

Comments

@zooba
Copy link
Contributor

zooba commented Feb 9, 2019

What's the problem this feature will solve?
Verify the contents of any wheel before it is extracted or installed, according to arbitrary validation rules (such as containing a particular file or being signed with a particular certificate).

Currently it's fairly easy to modify a built wheel before publishing it to a (public or private) index, but it's much more complicated to download a wheel to inspect it before installing. This feature would allow the person doing the installation to determine what validation is necessary, define it as Python code, and have pip call into it between the download and install stages.

A secondary problem that it solves is having to choose one true verification method and bake it into pip.

Describe the solution you'd like
Assume a reliable baseline install exists (Python, pip, arbitrary packages). Hence, pip does not need to validate itself or these packages, but is able to validate any upgrades to itself or these packages.

In a pip.ini, an environment variable, or the command line, the --verify-function option is set to a value module:function (for example mypackage.module:verify) that identifies the entry point to a file verifier.

Once a wheel has been downloaded but before it is extracted, the verifier is imported and called, passing the . In simplified code, it would look like this:

downloaded_file = urlretrieve(uri_to_wheel)
with ZipFile(downloaded_file) as f:
    if options.verifier:
        mod_name, _, func_name = options.verifier.rpartition(':')
        verifier = getattr(importlib.import_module(mod_name), func_name)
        verifier(f)
    f.extract_all(…)

The error handling code is omitted, but the implication that all errors within the options.verifier block are fatal is correct. If the option has been specified, failing to locate or use the verifier should abort the install, as should any errors raised by the verifier itself. If the file is deemed valid, it returns without raising.

This is entirely sufficient to handle the common internal corporate scenario where:

  • there is a baseline OS install that is already trusted (including pip, a verifier and configuration)
  • there is a certificate where the public key is on production machines and private key on build server
  • packages only come from an internal package server

It could further extend to do more complicated tasks, such as static analysis of source code

It is not intended to handle packages from mixed sources, or establishing trust between parties where there currently is none. It probably has no place at all on public PyPI right now, though it could allow for easier experimentation.

Alternative Solutions
There are lots of other proposals that involve all sorts of solutions for implied or transitive trust. I am not interested in solving those problems

Additional context
Previously discussed at #6198, #4705, 1035, (under-)specified in PEP 427,

@pradyunsg pradyunsg added state: needs discussion This needs some more discussion type: feature request Request for a new feature labels Feb 9, 2019
@dstufft
Copy link
Member

dstufft commented Feb 10, 2019

I would further expand this to verify any file downloaded by pip. If someone wants to restrict a particular verifier to wheels only, they can just no-op on sdists. Particularly since it should be pretty easy to cover both file types.

To be clear, I think that #6198/#4705 are independent tasks that pip itself should just handle.

One thing I'd possibly consider is if it makes sense to do this as a CLI hook or a Python hook. In either case I think it's trivial to translate between them so I don't think it allows anything new or interesting depending on the choice made. The main thing that makes me wonder if a command based hook would be better is that it provides cleaner isolation between the two systems (pip and the hypothetical verifier) with less restrictions (e.g. you could write the verifier as Py3 only and still have it work on Py2). It also provides maybe an easier way to handle communication between them, since pip could just capture stdout/stderr and print them on error (or always, I dunno).

Maybe that feels like a horrible experience though for people writing these verifiers and people would rather just expose a Python API.

@zooba
Copy link
Contributor Author

zooba commented Feb 11, 2019

I prefer a Python hook as we can pass an open file handle most easily (and specifically, the open handle that will be used to extract the contents). This is the best way to avoid race conditions that involve waiting for the file to be verified and then replacing it before it is extracted.

It also avoids adding a simple environment variable to cause pip to launch arbitrary processes, though that's more about surface area reduction than anything else (and removing pip once you're done with it is by far the best way to achieve this).

For my scenario, I don't care about Py2 at all :) I don't honestly expect these to become an off-the-shelf type plugin, so the experience matters more than usual. Which is why I prefer the Python hook.

Perhaps the hook can be two (three?) functions - verify_wheel, verify_sdist, verify_directory (for a git clone or directory path). Verifiers must provide all three (or else we can't detect a verifier that's been trivially compromised by deleting it), but can certainly be noops. That still leaves pip with the task of figuring out what kind of source it, has, which I think is correct.

And I agree on normal RECORD verification - we should just do that. But unless we can prove that nobody needs to disable it (or are willing to just break any installs relying on it being ignored), it'll need additional functionality to turn off. If we add the verifier API and configuration, then RECORD verification could just be the default verifier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
state: needs discussion This needs some more discussion type: feature request Request for a new feature
Projects
None yet
Development

No branches or pull requests

3 participants