-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do something about cache housekeeping? #6956
Comments
Related to #3138. |
Also related to #4685, since that could be how you perform this. |
Is this still relevant as we finally have a |
Agreed, and as discussed in #8474, I'm ambivalent at best over the idea of an automatic cleanup. So I'm going to close this as complete, and leave automatic cleanup as something for someone else to raise if they feel like it. Thanks for your work in making this happen @duckinator! |
Reopening, as I realised via this discussion that this issue was originally triggered by the question of tidying up outdated selfcheck files. IMO we still need an automated solution for clearing up obsolete selfcheck files. I'd expect that to be something along the lines of whenever we do a selfcheck, we check all the other selfcheck files and delete any that refer to directories that no longer exist. I don't think it should be down to the user to run a purge, nor do I think we should leave files for non-existent environments indefinitely. |
This comment has been minimized.
This comment has been minimized.
Bah. Long day. Thanks for letting me know! |
#2984 is going to make this worse, by potentially have two copies of the same file in the cache. Here is a proposal for a solution: No more than once a day (to prevent performance impacts), after installing a package (so cache information is accurate), clear old entries from the cache. Two potential strategies for deciding which entries to delete and how many:
Given data science packages can be huge, and that disk space is cheap, I would suggest that date-based caching is probably better. |
As sysadmin at a university, we get mails on a regular basis from students who can't figure out why they get over quota mails. Most of the time it's their pip cache filling up the initial 5G of storage they get. It would be very nice if pip would respect a setting (global) that maximises their cache size. It should be checked every time the pip command runs and enforced either automatically or via a clear suggestion to the user. The all or nothing option of purge seems like a poor way of cache management. |
What's the problem this feature will solve?
Pip stores data in its cache, but never clears out obsolete data.
Describe the solution you'd like
Some means (automated or manual) for no-longer-needed cache entries to be cleared out.
Alternative Solutions
It's possible to just delete the cache altogether, as there is nothing in there that won't be recreated as needed, but this is an "all or nothing" solution.
Additional context
Prompted by the discussion here
The text was updated successfully, but these errors were encountered: