Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should 'pip cache purge' remove more than wheel files? #7372

Open
hugovk opened this issue Nov 17, 2019 · 7 comments
Open

Should 'pip cache purge' remove more than wheel files? #7372

hugovk opened this issue Nov 17, 2019 · 7 comments
Labels
C: cache Dealing with cache and files in it state: needs discussion This needs some more discussion

Comments

@hugovk
Copy link
Contributor

hugovk commented Nov 17, 2019

What's the problem this feature will solve?

#4685 says:

pip's cache is currently a black box that the users can't really inspect. This is not the nicest of experiences. Adding a pip cache to allow interacting with the cache (much like the new pip config) would be a good way to fix that.

Currently git cache purge in #6391 only removes the *.whl files from the pip/wheels/ directory.

This leaves:

  • a lot of empty directories under pip/wheels/ (eg. 1,608 dirs, 57 KB)
  • a lot of files under pip/http/ (eg. 2,408 files, 6,850 dirs, 2.1 GB)
  • a selfcheck.json and pip/selfcheck/ directory (eg. 45 files, 27 KB)

Should they also be cleaned up?

pip cache --help says `purge will "Remove all items from the cache", but it doesn't.

Split out from #6391 (comment) for follow-up after #6391 is merged.

Describe the solution you'd like

  1. Should pip cache purge remove the pip/wheels/ directory itself so all the subdirs are also cleaned up? In my case, it's only 57 KB afterwards, but I guess I don't need those at all, and there's potential for them to slow down the computer.

  2. And I also still have a 2.1 GB pip/http/ directory after purge. Should that also be removed?

  3. And selfcheck.json and pip/selfcheck/ are only 27 KB, but should purge remove the whole pip/ cache dir?

I think my pip/wheels/ directory was quite small, especially when compared with pip/wheels/. [typo, cannot remember original]

Alternative Solutions

  • Leave the files and directories as is, to accumulate.

  • Feature request Cross-platform command to return pip's cache directory #7350 would allow the user to find out where the cache directory is, and writing their own commands to delete files and directories. It'd be nicer for this to be built into pip cache purge.

@triage-new-issues triage-new-issues bot added the S: needs triage Issues/PRs that need to be triaged label Nov 17, 2019
@chrahunt chrahunt added C: cache Dealing with cache and files in it state: needs discussion This needs some more discussion labels Nov 27, 2019
@triage-new-issues triage-new-issues bot removed S: needs triage Issues/PRs that need to be triaged labels Nov 27, 2019
@deveshks
Copy link
Contributor

deveshks commented Apr 30, 2020

So selfcheck.json in selfcheck folder is used in pip._internal.self_outdated_check.pip_self_version_check to check if pip needs to be updated, so cleaning that as part of pip cache purge would need some thoughtl

Same holds true with http folder which is used to cache PipSession objects to be used in pip._internal.cli.req_command._build_session.

Also we cannot remove the wheels directory itself, because (commands.cache._find_wheels](https://github.com/pypa/pip/blob/master/src/pip/_internal/commands/cache.py#L159) uses it as a directory to find all the wheel files, so we have to change the logic around it in order to remove the wheels directory.

A potential solution I could think of is to introduce options for pip cache purge such as:

  1. Clean all the subfolders under the wheels folder in case of pip cache purge --wheel-only.

  2. Clean the subfolders under wheels, http, and selfcheck in case of pip cache purge --all, but that has to inform the user about cleaning the cached pip version, and pip sessions (this might confuse a new user).

@duckinator
Copy link
Contributor

With the wheels, http, and selfcheck directories, is there a reason we couldn't just delete all files and subdirectories, but leave the directories themselves?

For selfcheck.json specifically, I can find no evidence that the codebase still uses it. It appears to have been replaced entirely by other files in the selfcheck folder, so I see no reason to avoid removing it if it exists.


To be clear, what I'm suggesting is basically doing the Python equivalent of these bash commands:

[ -f "$PIP_CACHE_DIR/selfcheck.json" ] && rm "$PIP_CACHE_DIR/selfcheck.json"
rm -r $PIP_CACHE_DIR/{http,selfcheck,wheels}/*

@hugovk
Copy link
Contributor Author

hugovk commented Oct 18, 2020

+1, assuming there's no technical reason for needing the empty subdirs and selfcheck.json.

And my local selfcheck.json was last updated on 2 August, so looks unused.


Since #8910, pip 20.3 will also include http files in pip cache purge.

It still leaves the directories:

$ pip --version
pip 20.3.dev0 from /Users/hugo/github/pip/src/pip (python 3.9)
$ pip cache info
Package index page cache location: /Users/hugo/Library/Caches/pip/http
Package index page cache size: 318.8 MB
Number of HTTP files: 725
Wheels location: /Users/hugo/Library/Caches/pip/wheels
Wheels size: 9.1 MB
Number of wheels: 20
$ pip cache purge
Files removed: 745
$ pip cache info
Package index page cache location: /Users/hugo/Library/Caches/pip/http
Package index page cache size: 0 bytes
Number of HTTP files: 0
Wheels location: /Users/hugo/Library/Caches/pip/wheels
Wheels size: 0 bytes
Number of wheels: 0

In this case, 6,600+ empty dirs remain:

$ find /Users/hugo/Library/Caches/pip/ | wc -l  # find all files
    7089
$ find /Users/hugo/Library/Caches/pip/ -type d | wc -l  # find only directories
    6683
$ find /Users/hugo/Library/Caches/pip/http/ | wc -l
    5663
$ find /Users/hugo/Library/Caches/pip/http/ -type d | wc -l
    5663
$ find /Users/hugo/Library/Caches/pip/selfcheck/ | wc -l
     405
$ find /Users/hugo/Library/Caches/pip/selfcheck/ -type d | wc -l
       1
$ find /Users/hugo/Library/Caches/pip/wheels/ | wc -l
    1018
$ find /Users/hugo/Library/Caches/pip/wheels/ -type d | wc -l
    1018

@pradyunsg
Copy link
Member

pradyunsg commented Oct 18, 2020

And my local selfcheck.json was last updated on 2 August, so looks unused.

It's used as part of the "hey, upgrade your pip" messaging -- we shouldn't remove this.

All OK to delete the empty directories. :)

@hugovk
Copy link
Contributor Author

hugovk commented Oct 18, 2020

Looks like the single-file selfcheck.json was replaced in #6855 with hashed JSON files in selfcheck/:

Previously, we kept selfcheck info for all pip instances in the same file.

Now, we use a separate file per pip instance, simplifying the process of recording updated state.

@pfmoore
Copy link
Member

pfmoore commented Oct 18, 2020

Wow, that sucks. I wish I'd noticed and thought through the implications of #6855 at the time. I apparently now have a bunch of files in that directory, over half of which are for temporary virtualenvs that no longer exist, and most of the others are for throwaway environments that I haven't deleted yet but which haven't been used in months.

Update: Looks like I did spot the implication here. Sadly the housekeeping issue never got dealt with, and I just closed #6956 because I only remembered it as being about the wheel cache. I'll reopen it to track clearing up these files.

@duckinator
Copy link
Contributor

duckinator commented Oct 26, 2020

I'm working on a PR for this. To start, I'm focusing on three things:

  1. Having pip cache purge just remove everything under the http and wheel directories.
  2. Having pip cache remove prune empty directories.
  3. Removing selfcheck.json if it exists.

The pip/selfcheck/ directory requires more logic, and I may wind up leaving it for a follow-up PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: cache Dealing with cache and files in it state: needs discussion This needs some more discussion
Projects
None yet
Development

No branches or pull requests

6 participants