Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pip freeze with a hash #4732

Open
lofidevops opened this issue Sep 20, 2017 · 49 comments
Open

pip freeze with a hash #4732

lofidevops opened this issue Sep 20, 2017 · 49 comments
Labels
C: freeze 'pip freeze' related resolution: deferred till PR Further discussion will happen when a PR is made type: feature request Request for a new feature

Comments

@lofidevops
Copy link

lofidevops commented Sep 20, 2017

  • Pip version: 9.0.1
  • Python version: 3.5.4
  • Operating system: Debian / PureOS

Description:

User story: I am a Python developer with an existing requirements.txt file. I want to add hashes to the file, so that future installations are more secure.

What I've run:

At the moment I need to:

  • Locate the package.tar.gz or package.whl
  • Run pip hash /path/to/package
  • Copy the result into requirements.txt
  • Repeat for every package

It would be great if instead I could:

  • Run pip freeze --hash
  • Get pip-formatted output with all package names and their hashes
  • Copy the result into requirements.txt

Today's solution:

Pipfile is a replacement for requirements.txt that includes hashes in a file called Pipfile.lock.

pipenv is a tool for managing your virtualenv based on Pipfile, including checks against the hashes defined in Pipfile.lock. (It can also convert a requirements.txt file.)

Suggested solution:

Supporting Pipfile at the pip layer (rather than a higher-level tool) is on the PyPA roadmap, see https://github.com/pypa/pipfile#pip-integration-eventual :

pip will grow a new command line option, -p / --pipfile to install the versions as specified in a Pipfile, similar to its existing -r / --requirement argument for installing requirements.txt files.
...
To manually update the Pipfile.lock:

$ pip freeze -p different_pipfile
different_pipfile.lock (73d81f) written to disk.

The implication is that this is the preferred solution to supporting hashes (rather than adding them to requirements.txt or pip freeze). The current status "Deferred till PR" (see this ticket). See also #6925

@pradyunsg pradyunsg added the type: enhancement Improvements to functionality label Sep 21, 2017
@pradyunsg pradyunsg changed the title [enhancement] pip freeze --hash pip freeze --hash Sep 21, 2017
@pradyunsg pradyunsg added the C: freeze 'pip freeze' related label Sep 21, 2017
@alecbz
Copy link

alecbz commented Oct 2, 2017

Is there at least some way to easily script this? E.g., can I loop over a pip freeze and somehow programmatically find the file I need to pass to pip hash?

@rbanffy
Copy link

rbanffy commented Oct 5, 2017

PIP would need to calculate and keep the hash somewhere as it installs the package. When doing a freeze, it'd retrieve the information.

@max-wittig
Copy link

This would be an awesome feature, indeed.

@pradyunsg pradyunsg added state: awaiting PR Feature discussed, PR is needed type: feature request Request for a new feature and removed type: enhancement Improvements to functionality state: awaiting PR Feature discussed, PR is needed labels Jan 17, 2018
@pradyunsg
Copy link
Member

This sounds like a good idea, although I am not sure how it'll work. As @max-wittig pointed out, the hash needs to be computed when the installation occurs, when the installation source is downloaded.

@kiowa
Copy link

kiowa commented Feb 10, 2018

You can get the hash from the cached wheel in ~/.cache/pip/wheels/

@gifflen
Copy link

gifflen commented Mar 16, 2018

It looks like pipenv is getting the hashes directly from the warehouse api

https://github.com/pypa/pipenv/blob/master/pipenv/utils.py#L468-L508

@andrewchambers

This comment has been minimized.

@Julian
Copy link
Contributor

Julian commented Aug 4, 2018

@andrewchambers perhaps instead of the slight barbs consider sending a PR?

@lofidevops
Copy link
Author

It appears that this user story (Python developer wanting to hash their dependencies) is addressed by pipenv, a distinct PyPA project. See https://docs.pipenv.org/basics/#pipfile-lock-security-features for details. So I'm closing this issue assuming that this user story is out-of-scope for pip itself, and best handled by a "higher-level" tool.

Other readers also might be interested in:

  • pipsi, which addresses a semi-related user story (end-user management of pip-packaged Python applications)
  • pipfile, a project aiming to replace requirements.txt (used by pipenv)

@Julian
Copy link
Contributor

Julian commented Aug 6, 2018 via email

@pfmoore
Copy link
Member

pfmoore commented Aug 6, 2018

@Julian note that it was the OP who closed the issue, not the pip developers. The option for someone to create a PR for this remains available to anyone interested in the feature.

@Julian
Copy link
Contributor

Julian commented Aug 6, 2018

Ah, indeed, thanks!

Great, glad to hear it's not being designated as out of scope.

@lofidevops lofidevops changed the title pip freeze --hash pip freeze with a hash Aug 9, 2018
@lofidevops
Copy link
Author

See the proposal for a pip freeze -p pipfile command at https://github.com/pypa/pipfile#pip-integration-eventual , which directly solves this user story for pip. I've reopened this ticket because it is clearly on the (long-term) roadmap for pip.

@lofidevops lofidevops reopened this Aug 9, 2018
@lofidevops
Copy link
Author

I've updated the ticket description with the proposed solution (as I understand it). Note that Pipfile-based dependencies are usable today if you use pipenv.

@terrisgit
Copy link

See today's convoluted workaround at your handy peterbe/hashin#100

@max-wittig
Copy link

I just switch to Pipenv, which supports this workflow. Sadly it's still not included in the default python package.

@bittner
Copy link

bittner commented Dec 7, 2018

Is there any roadmap or concrete discussion about implementing the proposed -p / --pipfile option that may replace the -r option in the long run? I'm having a hard time to find this.

@chamoda
Copy link

chamoda commented Jun 10, 2019

This will generate a requirements.txt with hashes

pip-compile requirements.txt --generate-hashes

Note that this will directly modify existing requirements.txt file.

You can install pip-compile with pip install pip-tools

@BenjamenMeyer
Copy link

would still be good to have this directly via pip freeze instead of having to use other tooling; and pipenv and Pipfile comes with their own set of headaches.

@Jongy
Copy link

Jongy commented Dec 11, 2019

pip freeze --hash will be very useful.

@chrahunt
Copy link
Member

What about generating the lock file at install-time, like npm, yarn, pipenv, poetry, Cargo, and Conan do? (sorry if I missed any)

  1. pip install -r requirements.txt --lock requirements.txt.lock
  2. Commit requirements.txt.lock
  3. Afterwards, invoke pip install -r requirements.txt.lock

On updates to requirements.txt, do the same steps.

This directly supports with the stated use case:

I am a Python developer with an existing requirements.txt file. I want to add hashes to the file, so that future installations are more secure.

but it avoids a lot of the extra work that is being described in #8519.

@BenjamenMeyer
Copy link

@NoahGorny WRT hashes:

  • get the hash from the remote
  • generate the hash locally
  • verify the hashes match

Don't trust remote, but don't necessarily trust local either. Verify they match to give the user assurance that the right thing is installed. If they don't match, generate an error; provide a way to force the install if they don't match, but by default uninstall/rollback if they don't match (or depending where/when you're generating the hashes...don't install to start with, which would be even better).

@NoahGorny
Copy link
Contributor

@NoahGorny WRT hashes:

* get the hash from the remote

* generate the hash locally

* verify the hashes match

Don't trust remote, but don't necessarily trust local either. Verify they match to give the user assurance that the right thing is installed. If they don't match, generate an error; provide a way to force the install if they don't match, but by default uninstall/rollback if they don't match (or depending where/when you're generating the hashes...don't install to start with, which would be even better).

We can not always generate the hash locally after installation, that's why we create the new HASH file. However, I am not sure we should fetch hashes from remote each time we freeze the environment...

What about generating the lock file at install-time, like npm, yarn, pipenv, poetry, Cargo, and Conan do? (sorry if I missed any)

1. `pip install -r requirements.txt --lock requirements.txt.lock`

2. Commit `requirements.txt.lock`

3. Afterwards, invoke `pip install -r requirements.txt.lock`

On updates to requirements.txt, do the same steps.

This directly supports with the stated use case:

I am a Python developer with an existing requirements.txt file. I want to add hashes to the file, so that future installations are more secure.

but it avoids a lot of the extra work that is being described in #8519.

This requires users to actively generate lockfiles in installations, and only works if the user is installing from requirements file in the first place. This is a good option for such users, but in other use cases I think it does not work just as well

@sbidoul
Copy link
Member

sbidoul commented Jul 19, 2020

The approach suggested by @chrahunt in #4732 (comment) is also valuable in a lot of situations. It has complexities to think through too, for instance when the install command is used to update an existing environment, and when pip decides it does not need to reinstall some already installed dependencies. In such cases we'd still need a way to obtain information about the hashes of installed distributions.

@chrahunt
Copy link
Member

This requires users to actively generate lockfiles in installations, and only works if the user is installing from requirements file in the first place. This is a good option for such users, but in other use cases I think it does not work just as well

This use case from the original issue assumes we have a requirements file, and several comments refer to Pipfile support, which would work in the same way. I think there may be some people who would want to get their environment set up and then generate a lock file for it, but IMO we risk not actually satisfying this issue adequately if we try to solve that one at the same time.

It has complexities to think through too, for instance when the install command is used to update an existing environment

Good point. It would be worthwhile to see how other dependency managers behave in that situation. If it turns out it's common (and generally agreed to be necessary) to store hashes with the installed packages, then that could be turned right around and included in the PEP itself. :)

@BenjamenMeyer
Copy link

@chrahunt to give confidence that the right thing is being installed; I would think you'd want something generated before it's installed that could easily be verified.

Question: what all is getting hashed? (or being proposed to being hashed)

@pradyunsg
Copy link
Member

pradyunsg commented Aug 30, 2020

Question: what all is getting hashed? (or being proposed to being hashed)

See https://pip.pypa.io/en/stable/reference/pip_install/#hash-checking-mode -- it's the entire files.

pip can check downloaded package archives against local hashes to protect against remote tampering

@BenjamenMeyer
Copy link

Question: what all is getting hashed? (or being proposed to being hashed)

See https://pip.pypa.io/en/stable/reference/pip_install/#hash-checking-mode -- it's the entire files.

That doesn't really say what gets hashed, just requirement around hashing.

If it's the generated file, there wouldn't be an issue with hashing wheels. A hash would also be verifiable against what is downloaded vs what is installed.

Something is off.

@uranusjr
Copy link
Member

uranusjr commented Aug 31, 2020

When you pip install, pip downloads an archive from PyPI (or another index you specify) to extract. If you specify a hash in the requirements file, that archive’s hash is checked against the hash list you provide. But you can’t re-generate the exact same wheel from the installation, since not every file in the wheel is installed (and even more is lost if you install from source). I am honestly not getting what you think is off.

@BenjamenMeyer
Copy link

@uranusjr if you're checking the hash of the archive prior to extracting it, then it doesn't matter what happens after. If you're hashing what is actually put into the system, then of course it's going to change all the time but that's also an extremely bad design since you cannot have deterministic hashing behavior.

Honestly, Python/Pip should follow the package hashing done by RPM, Deb, and others. You hash the package itself, not it's installed data. This provides deterministic behavior and can be verified before an install is ever done.

IOW - there should be no need to regenerate a wheel from the installation; you're not hashing the installation but the package itself.

@uranusjr
Copy link
Member

uranusjr commented Aug 31, 2020

What you describe is exactly what pip is currently doing. The problem in this thread is the other way around: people are looking for a way to generate hashes from installed data, and the pip developers are trying to explain we don’t know how this can be done.

@NoahGorny
Copy link
Contributor

I had an attempt at #8519 which got stale...
It did not have much traction and I drifted away from that, but I am ready to work on this once more.
In my PR, I tried to add a new file into the installation folder, which specifies the hashes of the source package (not the installed data!). Using this file we can easily output the hashes when needed. I think this is the easiest solution to this problem, although it has some setbacks (new file requires a new PEP, need to be dynamic, etc...)
take a look and lemme know what you think @BenjamenMeyer

@BenjamenMeyer
Copy link

@uranusjr if that's the case then it there is certainly an answer - an emphatic no it cannot be done from installed data, nor should that be desired. Just hash the package files that get uploaded to Pypi and be done.

@NoahGorny that doesn't really clarify anything. I did leave a comment about one aspect.

@sbidoul
Copy link
Member

sbidoul commented Sep 1, 2020

I personally think pip freeze with hash is desirable, and would facilitate common workflows.

It is feasible if we record the hash of the distribution that was downloaded for installation (not the wheel we possibly built locally). It is not trivial because we have the (wheel) cache in between. And adding information in .dist-info requires a PEP (although I think we would benefit from a PEP that allows tools to record their own stuff in .dist-info, so we could iterate more rapidly and worry about interop later).

@altendky
Copy link

altendky commented Sep 1, 2020

Is it that painful to put the locking up front and then use the lock to control the environment? Rather than controlling the environment then reaching back up the data path to get the hashes later?

@BenjamenMeyer
Copy link

@sbidoul if you record the hash of the file that was downloaded (the package) whether wheel or otherwise it's easy to verify. If it's a VCS download, then a driver for the VCS should take the VCS location (git URL, svn URL, etc) and some repo data (git hash, svn revision, etc) to create a hash which could then be standardized and easily used.

Trying to generate from the installed data is very problematic from numerous aspects:

  • user modifies it
  • installation modifies it
  • different systems get different kinds of output

A single package (wheel, bdist, sdist) should have exactly 1 hash that would match it.

@sbidoul
Copy link
Member

sbidoul commented Sep 4, 2020

Is it that painful to put the locking up front and then use the lock to control the environment

@altendky I'd say it is cumbersome today. And it seems the tools that automate it have to hack pip internals or reimplement a sizeable portion of it to achieve that goal.

So my feeling is that pip would help broader adoption of hash checking if it exposed mechanisms to facilitate hashes discovery.

pip freeze with hashes is one of such mechanism. Another is to let pip report more information about what it does when installing (or dry-run install), such as the (hashes of) distributions it downloaded to perform the install.

@sbidoul
Copy link
Member

sbidoul commented Sep 4, 2020

Trying to generate from the installed data is very problematic from numerous aspects:

@BenjamenMeyer I don't think anyone is attempting to do that indeed.

Regarding VCS, I'd say we don't really need anything special. I would simply relax a little bit pip's hash checking mode to consider that commit references for VCS that have immutable commit refs (git shas, ...) are sufficient as a hash mechanism.

@altendky
Copy link

altendky commented Sep 4, 2020

@sbidoul, I didn't say pip shouldn't support it, just that perhaps the order of operations should be slightly different than requested here.

@yawaramin
Copy link

How about an alternative UX:

$ pip install --require--hashes
...
ERROR: Hashes are required in --require-hashes mode, but they are missing from some requirements. Here is a list of those requirements along with the hashes their downloaded archives actually had. Add lines like these to your requirements files to prevent tampering. (If you did not enable --require-hashes manually, note that it turns on automatically when any package has a hash.)
    foo==1.0.0 --hash=sha256:...
    bar==2.0.0 --hash=sha256:...
   ...

As it turns out, right now it does almost this, except it prints only one requirement hash on each run of pip install, forcing the user to run it repeatedly. Instead, could it print all the requirement hashes? If so, it sidesteps the issue of where to store the hashes between install and freeze and makes the user follow a different workflow altogether.

@uranusjr
Copy link
Member

Instead, could it print all the requirement hashes?

Theoratically yes (well it can print out all the hashes it knows; theoratically there are infinite possible hashes), but pip does not currently have the mechanism to do so. a PR exploring this would be much welcomed.

@DoWhileGeek

This comment was marked as off-topic.

@The-Compiler
Copy link

Would this perhaps be something which would be a good fit with pip install --dry-run --report --ignore-installed from #10771? If that had a --report-format=requirements or somesuch, that could be used to generate a requirements.txt with hashes and such, no?

(I suppose it would also be possible to write a small tool which takes the JSON output and converts it to a requirements.txt)

@pfmoore
Copy link
Member

pfmoore commented Aug 12, 2022

(I suppose it would also be possible to write a small tool which takes the JSON output and converts it to a requirements.txt)

The idea of using JSON format is precisely so that such tools are easy to write without needing changes to pip, so yes, that would be the recommended approach.

@AkechiShiro
Copy link

AkechiShiro commented Nov 29, 2023

Any way forward on this issue, any way someone could help ? What is left to be done/discussed ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: freeze 'pip freeze' related resolution: deferred till PR Further discussion will happen when a PR is made type: feature request Request for a new feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.