-
Notifications
You must be signed in to change notification settings - Fork 968
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handling the learning & interoperability testing use cases for TestPyPI #2286
Comments
To be clear, TestPyPI basically never gets purged. I think it's happened once or twice ever and it is an entirely manual process. It's basically just a second deployment of the relevant codebases with a slightly different theme at this point. |
@dstufft Does the list-of-lists deleter still get run against the main DB? I know @r1chardj0n3s used to have to do that regularly thanks to an old tutorial that suggested publishing practice packages to the live service. |
The lists-of-list purger was run against the production database, since the URL in the tutorial was the prod URL, not test. I ran it a few months back, after quite a break. The code is closely tied to the legacy codebase, but should be reasonable to migrate to the new, I should think. It's in the admin.py script. |
Responding to pypa/packaging-problems#114 and thinking about #720 (comment), I'm wondering if a possible option here might be to offer per-user personal indices that follow the same model as the proposed staging index, only even more permissive:
In essence, every user would get their own optional scratchpad to do whatever they want, and the inherent release mutability and barriers to use (i.e. needing to pass |
Sorry to jump in here – I'm curious why TestPyPI is necessary at all. I'm not aware of a "test npm", a "test Crates", or a "test RubyGems". Is it possible that TestPyPI is solving a problem that generally doesn't exist any more? |
@taion See #726 for the background discussion here. Test PyPI currently gets used for 3 different use cases:
And as that list shows, keeping an entire separate sandbox instance around probably isn't the right long term answer to any of those situations - it was just an expedient answer given the limitations of the original PyPI code base. |
Makes sense. You're probably already aware of this, but I will note that "testing" seems like a less interesting consequence of per-user indices than the people who will be offering you money to get private per-user indices. |
@taion We're fairly wary of fee-for-service arrangements that may not only jeopardise the PSF's public interest charity status with the IRS, but also potentially cause concerns for the providers of the in-kind infrastructure donations that allow PyPI to operate at all. (That's very different from NPM's situation, since that's structured as a private for-profit company, albeit one that seems to be run by some genuinely lovely people). Since we advise folks to run caching proxies anyway, that means the "private Python package indices" market is one we're largely leaving to language independent artifact management vendors like JFrog and Sonatype. |
Yeah, I understand. I just mean it's sort of funny – you'd have support at a technical level for a really useful feature, and one that people usually pay for, but not really a way to deploy it. Like, Gemfury's great, but I'm way too afraid of accidentally uploading my private packages to PyPI. To get back to the point, though, even for the "learning" case, is there a lot of harm to users uploading "hello world" packages to the main repository? Again, to take npm as an example, it has about 5x the number of packages as PyPI, and while it does support scoped packages, their tutorial for publishing packages literally shows just dropping something into the root scope: https://docs.npmjs.com/getting-started/publishing-npm-packages. In practice I can't think of any trouble that's caused for me as either a consumer or a publisher of npm packages. |
Yea it's an issue. There is a book that has people upload a package to live PyPI that we semi-regularly have to delete their packages to avoid the spam in the main repository. You can normally find them by searching for "A simple printer of nested lists" (see https://pypi.org/search/?q=A+simple+printer+of+nested+lists). I don't like the per-user indexes here, because I think that: (A) They're not really needed for PyPI's core use case |
The official npm docs include a screencast that demonstrates uploading a dummy package to npm. Funny enough, that package is actually still around: https://www.npmjs.com/package/npm-demo-pkg. Do the spammy demo packages cause maintenance issues for PyPI? It looks like those are about 1.5% of all the packages on PyPI, which seems like a lot – but I've never noticed them, which I guess is one data point that suggests that they may not be a problem for users. |
The mention of DevPi reminded me of a suggestion that I think I made in IRC a while back, but never copied over to the issue tracker anywhere: if we ever did decide to go down the path of offering scoped personal indices, it may actually make more sense for the PSF to host a DevPi instance that shares its identity and access management with PyPI than it does to add that capability in to Warehouse. Or, as @taion suggests, we could just not worry about it - folks don't tend to use desirable names for their practice packages, and if they ever do, then we already have an established precedent of periodically clearing out projects that are clearly practice packages. We'd just need to make sure that "Removal of known practice packages" was a case covered in PEP 541. Actually going ahead and doing the removals would then be a matter of keeping the service metrics clean, including for folks doing whole-of-ecosystem analysis (there's no reason to get services like libraries.io to waste their time analysing people's practice packages over and over again). We'd just want to make sure the detection-and-removal script was public, and that we gave freshly created projects a grace period any time the script ran (so we don't delete them while people are still practising with them). It would potentially make a kind of interesting metric in its own right, since it would provide data on the number of unique practice packages being created. If we went down that path, then the resolution to this issue would be that we'd need to redirect |
Would it make sense to have a special classifier to mark packages uploaded for practice or experimentation with packaging systems, and make it clear that those packages may be automatically deleted once they're 30 days old (or whatever time limit we set)? We obviously can't rely on the 'printer of nested list' packages having the classifier, but it would give other tutorials and experimenters a convenient way to mark packages that aren't meant to be used, while being as close as possible to publishing a real package. |
And it wouldn't even need to be a new field, we could just ask introductory tutorial authors to include a |
Yup :-) I'd suggest making it clearly different, not part of an existing category like |
Another approach might be to have something for scoped or namespaced packages. Instead of publishing to This actually has a nice side benefit, too, in that I can then do something like It would prevent clutter on the root namespace, if that's the concern, anyway. |
You can prevent upload to PyPI by just using a nonsense classifier, like I'll have to think about the classifier approach. One thing that concerns me is that it makes it harder for automated tooling to segregate "packages that matter" versus "packages that don't". For instance, a bandersnatch mirror is going to fetch and mirror those packages even though they're destined to be pruned and nobody should ever actually use them anyways. This might not be the hugest deal, but for instance in the PyPI infra itself we ~never delete package files themselves from our mirror, to allow us to recover from an erroneous delete if needed. I don't have a good sense of how much additional space moving this into "real" PyPI would consume, because we don't mirror TestPyPI at all right now. The other thing is we'd probably want to combine it with something like the the per user indexes or something because otherwise we limit it's ability to used. For instance, I couldn't test the pip release process by just uploading So in the end, it's totally doable, I'm just not sure if a sandbox instance or some way to flag a package in the main repository as a "temporary" package is better. My gut tells me a sandbox instance is nicer, because I can see us overtime wanting to ratchet down the immutability of the main index over time (at least it's not an uncommon request) and keeping them segregated doesn't back us into as much of a corner in terms of that if we ever do want to do that. That being said, not having to run two separate instances is also appealing to me. |
For the folks in this thread who don't already know the context: The folks working on Warehouse have gotten funding to concentrate on improving and deploying Warehouse, and have kicked off work towards our development roadmap -- the most urgent task is to improve Warehouse to the point where we can redirect pypi.python.org to pypi.org so the site is more sustainable and reliable. Since this feature isn't something that the legacy codebase has, I've moved it to a future milestone. But I have opened #2891 for a related feature that we might be able to do in the next few months. Thanks and sorry again for the wait. |
Split out from #726 to cover the additional TestPyPI use cases mentioned by @takluyver in that thread:
I believe the current TestPyPI implements this by periodically purging the entire instance database, but it might be beneficial to instead implement this as a configurable "mode" in Warehouse, where it uses the regular Warehouse account management backend, but all the state related to packages themselves is purely ephemeral and auto-deletes after a "while".
The text was updated successfully, but these errors were encountered: