Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restrict deleting packages #112

Closed
taion opened this issue Nov 24, 2017 · 11 comments
Closed

Restrict deleting packages #112

taion opened this issue Nov 24, 2017 · 11 comments

Comments

@taion
Copy link

taion commented Nov 24, 2017

(Was pypi/legacy#738, thanks to @ewdurbin for pointing me in the right direction)

Earlier, a number of users encountered broken builds when [email protected], originally published on 2017-11-13, was unpublished on 2017-11-23. This is because those following best practices around fully locking down dependencies (e.g. via Pipfile.lock) were pointed at the no-longer-existing v3.5.0.

Some time ago, there was a similar problem in the npm ecosystem around the left-pad package getting unpublished: https://www.theregister.co.uk/2016/03/23/npm_left_pad_chaos/, http://blog.npmjs.org/post/141577284765/kik-left-pad-and-npm

As a consequence, npm adopted a policy that prohibited deleting versions more than 24 hours old without contacting support: http://blog.npmjs.org/post/141905368000/changes-to-npms-unpublish-policy

I believe PyPI should adopt a similar policy – perhaps exactly the same one.

@jianghuaw
Copy link

Yeah, I totally agree it's worthy to add such a restriction. It's a good practice to make an existing release deprecated by a new release but shouldn't delete a recently published release.

@taion
Copy link
Author

taion commented Nov 27, 2017

Given pypa/pip#3634, it might also be worthwhile to limit adding files to packages that already exist (say to the same 24-hour span), as that Pip issue means that new files could potentially break anybody with locked-down hashes in their requirements.

@hickford
Copy link
Contributor

It's a foreseeable problem. I hope that PyPI will adopt a cautious policy before Python suffers its own left-pad-style fiasco.

I remain concerned that a malicious (or hacked) package maintainer could upload malware to a published version of a popular package #75

PyPI is more complex than Npm because you can upload multiple formats under one version. Right now you can delete a tar ball package-1.2.1.tar.gz and upload a zip package-1.2.1.zip to the same version. This has the same potential to shaft users (accidentally or maliciously) who rely on a specific version.

Perhaps PyPI's policy should be strengthened to 'no uploads to old versions' (say, older than 1 day)

@taion
Copy link
Author

taion commented Nov 28, 2017

@hickford

FWIW, using hashes does prevent that attack, and that's probably a better approach anyway if you really want to make sure that the packages you download haven't been subject to tampering of any sort.

It's just that, without pypa/pip#3634, a well-meaning maintainer that uploads a wheel after-the-fact that should be useful to me instead makes my build fail, which is not ideal.

@ncoghlan
Copy link
Member

I've added a note to #75 pointing out that the potential for sdist replacement should be resolved now, given the policy changes in PEP 527. If that isn't the case, then it's a bug in Warehouse's enforcement of that policy, and is better handled as a direct issue report against Warehouse rather than as a question here.

When it comes to unpublishing, we've long had the policy that using PyPI directly for institutional purposes without a caching proxy in front of it is an irresponsible development practice that fails to account for the complete lack of any form of contractual relationship between the software publisher and the software user (and also wastes the bandwidth being donated by the PSF's CDN provide, Fastly). This means we come firmly on the side of publishers here: while we consider silently replacing old releases to be a security concern, unpublishing them is entirely reasonable in the absence of any written guarantees regarding future availability.

PSF sponsors may be able to make a legitimate case for changing the PyPI terms of service to prohibit removal of previously published releases (rather than merely allowing indefinite caching by others, as they do now), but that would need to be in the context of those sponsors actually making ongoing contributions to PyPI's sustaining engineering. (This would be comparable to NPM changing their own policy to account for the needs of their commercial customers)

At a technical level, pypi/warehouse#720 covers the possible introduction of a different approach to release management that would permit all of the artifacts in a release to be staged and then published as a single coherent unit. Adopting such an approach would also allow end users to make informed software consumption decisions based on the release model that particular projects used.

@taion
Copy link
Author

taion commented Nov 29, 2017

I'm not sure that's sufficient. It's not just that a caching proxy is required – the caching proxy must also have an extremely long cache expiry. For example, for devpi, the default is only 30 minutes – perfectly reasonable for alleviating load from PyPI, but it's not going to prevent this sort of problem.

Also, this issue isn't specific to institutional purposes. Open-source packages aren't going to ignore best practices around locking down concrete dependencies, so they'll suffer the same pain.

To me, this isn't really a question of security – it's a question of usability in any case. I have a number of packages on PyPI myself, but ultimately there are many more users than there are package maintainers.

If, say, a maintainer of a commonly-used package such as six or Django loses control of his or her PyPI account or otherwise unpublishes a commonly-used package, it's going to cause a huge amount of pain to the entire Python community.

@ncoghlan
Copy link
Member

Right, but the PSF has zero paid support staff for PyPI, and the volunteer admins already receive more support requests from for-profit companies than they have the ability to handle.

We're not going to institute any policies that would create more work for the existing volunteers in the absence of ongoing funding that allows management of those support queues to be handed off to paid PSF staff instead.

@dstufft
Copy link
Member

dstufft commented Nov 29, 2017

Deciding what to allow or disallow comes down to a balancing act between allowing package authors control over managing their software's lifecycle in the way they see fit and restrictions to allow end users to have a set of expectations they know cannot be invalidated.

For this, there are really two separate issues here. The first one is that uploading to older releases (or even the most recent release) can cause a different artifact to be found than what previously was fetched. This generally isn't an issue except if the new artifact is broken in some way, or if you're using hash features in pip or similar. For broken artifacts, we're unlikely to do much about them and we expect authors to manage those edge cases in whatever makes the most sense for their project (likely removing the broken artifact). For the hash issue, that's really a tooling specific thing, and in pip specifically we should probably just exclude anything that doesn't match one of our expected hashes for the dependency.

The other issue is that of deletions. There are a lot of pros and cons to a variety of different options here. I don't think we can change this at this point though without a discussion and rough consensus on distutils-sig (and possibly a PEP if it doesn't seem like we're able to get rough consensus without one).

@ncoghlan
Copy link
Member

In pypi/warehouse#720 (comment), I've suggested that we might be able to tie a policy change to the introduction of the staging mode, such that publishers can choose between two ways of working:

  1. Partially mutable releases (status quo): artifacts are published immediately, old releases can have artifacts added and removed, but not replaced.
  2. Immutable releases: releases start in a "staging" mode, where artifacts can be freely added, removed, and replaced (but don't appear in the main index yet), and then once maintainers are happy the release is ready to go, they move them to a "published" state, where the artifacts are linked from the main index, and the release itself becomes completely immutable (in the absence of intervention by the PyPI admins).

New projects would start out with the immutable release model by default (but could opt out if they really wanted to), while existing projects would have to opt in to switching over from the status quo.

@taion
Copy link
Author

taion commented Nov 30, 2017

I'll move this discussion to the mailing list as it seems like that's a better venue per #112 (comment). Sorry to bounce this around so much. Hopefully I don't screw up using the mailing list and end up causing more problems. 🤞

@taion taion closed this as completed Nov 30, 2017
@taion
Copy link
Author

taion commented Nov 30, 2017

Sorry to introduce the hashes thing here – that's a bit of a distraction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants