Handling the learning & interoperability testing use cases for TestPyPI #2286

ncoghlan · 2017-08-07T01:38:31Z

Split out from #726 to cover the additional TestPyPI use cases mentioned by @takluyver in that thread:

folks just learning & experimenting with the Python packaging toolchain
toolchain developers testing the interoperability of their publication process

I believe the current TestPyPI implements this by periodically purging the entire instance database, but it might be beneficial to instead implement this as a configurable "mode" in Warehouse, where it uses the regular Warehouse account management backend, but all the state related to packages themselves is purely ephemeral and auto-deletes after a "while".

dstufft · 2017-08-07T17:17:30Z

To be clear, TestPyPI basically never gets purged. I think it's happened once or twice ever and it is an entirely manual process. It's basically just a second deployment of the relevant codebases with a slightly different theme at this point.

ncoghlan · 2017-08-09T02:00:52Z

@dstufft Does the list-of-lists deleter still get run against the main DB? I know @r1chardj0n3s used to have to do that regularly thanks to an old tutorial that suggested publishing practice packages to the live service.

r1chardj0n3s · 2017-08-10T04:36:30Z

The lists-of-list purger was run against the production database, since the URL in the tutorial was the prod URL, not test. I ran it a few months back, after quite a break. The code is closely tied to the legacy codebase, but should be reasonable to migrate to the new, I should think. It's in the admin.py script.

ncoghlan · 2017-11-30T00:58:30Z

Responding to pypa/packaging-problems#114 and thinking about #720 (comment), I'm wondering if a possible option here might be to offer per-user personal indices that follow the same model as the proposed staging index, only even more permissive:

only warn about name conflicts with the main PyPI index, don't prohibit them
personal releases are permanently mutable (no staging required)

In essence, every user would get their own optional scratchpad to do whatever they want, and the inherent release mutability and barriers to use (i.e. needing to pass --extra-index-url) would encourage publishing to the main index once usage started to expand beyond the original author.

taion · 2017-11-30T04:01:48Z

Sorry to jump in here – I'm curious why TestPyPI is necessary at all. I'm not aware of a "test npm", a "test Crates", or a "test RubyGems". Is it possible that TestPyPI is solving a problem that generally doesn't exist any more?

ncoghlan · 2017-11-30T04:16:53Z

@taion See #726 for the background discussion here. Test PyPI currently gets used for 3 different use cases:

Testing legacy PyPI changes before they get pushed live (replaced in Warehouse by a proper CI arrangement and support for local testing of changes)
Testing package updates before pushing them to the main index (proposed to be replaced by an integrated staging index feature)
A sandbox where students and other folks just learning Python can be introduced to the package publication toolchain without generating noise on the main index server (which I'm now suggesting would be a good reason to introduce support for per-user indices)

And as that list shows, keeping an entire separate sandbox instance around probably isn't the right long term answer to any of those situations - it was just an expedient answer given the limitations of the original PyPI code base.

taion · 2017-11-30T04:27:09Z

Makes sense. You're probably already aware of this, but I will note that "testing" seems like a less interesting consequence of per-user indices than the people who will be offering you money to get private per-user indices.

ncoghlan · 2017-11-30T04:36:39Z

@taion We're fairly wary of fee-for-service arrangements that may not only jeopardise the PSF's public interest charity status with the IRS, but also potentially cause concerns for the providers of the in-kind infrastructure donations that allow PyPI to operate at all. (That's very different from NPM's situation, since that's structured as a private for-profit company, albeit one that seems to be run by some genuinely lovely people).

Since we advise folks to run caching proxies anyway, that means the "private Python package indices" market is one we're largely leaving to language independent artifact management vendors like JFrog and Sonatype.

taion · 2017-11-30T04:48:05Z

Yeah, I understand. I just mean it's sort of funny – you'd have support at a technical level for a really useful feature, and one that people usually pay for, but not really a way to deploy it.

Like, Gemfury's great, but I'm way too afraid of accidentally uploading my private packages to PyPI.

To get back to the point, though, even for the "learning" case, is there a lot of harm to users uploading "hello world" packages to the main repository? Again, to take npm as an example, it has about 5x the number of packages as PyPI, and while it does support scoped packages, their tutorial for publishing packages literally shows just dropping something into the root scope: https://docs.npmjs.com/getting-started/publishing-npm-packages.

In practice I can't think of any trouble that's caused for me as either a consumer or a publisher of npm packages.

dstufft · 2017-11-30T05:56:39Z

Yea it's an issue. There is a book that has people upload a package to live PyPI that we semi-regularly have to delete their packages to avoid the spam in the main repository. You can normally find them by searching for "A simple printer of nested lists" (see https://pypi.org/search/?q=A+simple+printer+of+nested+lists).

I don't like the per-user indexes here, because I think that:

(A) They're not really needed for PyPI's core use case
(B) They're going to be much harder to scale and keep available since less traffic to each individual list means less likely they're going to be in CDN cache.
(C) It adds additional complexity and functionality to PyPI that is likely best handled by DevPI or similiar.

taion · 2017-11-30T06:16:53Z

The official npm docs include a screencast that demonstrates uploading a dummy package to npm. Funny enough, that package is actually still around: https://www.npmjs.com/package/npm-demo-pkg.

Do the spammy demo packages cause maintenance issues for PyPI? It looks like those are about 1.5% of all the packages on PyPI, which seems like a lot – but I've never noticed them, which I guess is one data point that suggests that they may not be a problem for users.

ncoghlan · 2017-11-30T12:14:46Z

The mention of DevPi reminded me of a suggestion that I think I made in IRC a while back, but never copied over to the issue tracker anywhere: if we ever did decide to go down the path of offering scoped personal indices, it may actually make more sense for the PSF to host a DevPi instance that shares its identity and access management with PyPI than it does to add that capability in to Warehouse.

Or, as @taion suggests, we could just not worry about it - folks don't tend to use desirable names for their practice packages, and if they ever do, then we already have an established precedent of periodically clearing out projects that are clearly practice packages. We'd just need to make sure that "Removal of known practice packages" was a case covered in PEP 541.

Actually going ahead and doing the removals would then be a matter of keeping the service metrics clean, including for folks doing whole-of-ecosystem analysis (there's no reason to get services like libraries.io to waste their time analysing people's practice packages over and over again). We'd just want to make sure the detection-and-removal script was public, and that we gave freshly created projects a grace period any time the script ran (so we don't delete them while people are still practising with them). It would potentially make a kind of interesting metric in its own right, since it would provide data on the number of unique practice packages being created.

If we went down that path, then the resolution to this issue would be that we'd need to redirect test.pypi.python.org to pypi.org as well (so we didn't break any documentation that had been updated to adhere to our request to use the test index when learning to use the tools).

takluyver · 2017-11-30T13:14:35Z

Would it make sense to have a special classifier to mark packages uploaded for practice or experimentation with packaging systems, and make it clear that those packages may be automatically deleted once they're 30 days old (or whatever time limit we set)?

We obviously can't rely on the 'printer of nested list' packages having the classifier, but it would give other tutorials and experimenters a convenient way to mark packages that aren't meant to be used, while being as close as possible to publishing a real package.

ncoghlan · 2017-11-30T13:47:14Z

And it wouldn't even need to be a new field, we could just ask introductory tutorial authors to include a Development Status :: 0 - Practice classifier in their examples.

takluyver · 2017-11-30T13:55:25Z

Yup :-)

I'd suggest making it clearly different, not part of an existing category like Development Status, so that it's clear that it has a special meaning. Maybe even put DeleteMe in the name.

taion · 2017-11-30T16:38:07Z

Another approach might be to have something for scoped or namespaced packages. Instead of publishing to nestr, I could publish to @taion/nestr, with the restriction that only I (taion) can create packages of the form @taion/*. This is just a strawman convention, copied from what npm does.

This actually has a nice side benefit, too, in that I can then do something like @top-secret-private/all-our-private-code for internal packages, and not worry about accidentally publishing to PyPI (assuming I don't somehow randomly have permissions to publish to @top-secret-private on PyPI).

It would prevent clutter on the root namespace, if that's the concern, anyway.

dstufft · 2017-11-30T18:04:51Z

You can prevent upload to PyPI by just using a nonsense classifier, like Private :: Do Not Upload. Warehouse will reject the upload for an invalid classifier.

I'll have to think about the classifier approach.

One thing that concerns me is that it makes it harder for automated tooling to segregate "packages that matter" versus "packages that don't". For instance, a bandersnatch mirror is going to fetch and mirror those packages even though they're destined to be pruned and nobody should ever actually use them anyways. This might not be the hugest deal, but for instance in the PyPI infra itself we ~never delete package files themselves from our mirror, to allow us to recover from an erroneous delete if needed. I don't have a good sense of how much additional space moving this into "real" PyPI would consume, because we don't mirror TestPyPI at all right now.

The other thing is we'd probably want to combine it with something like the the per user indexes or something because otherwise we limit it's ability to used. For instance, I couldn't test the pip release process by just uploading 10.0.0.dev1 to the main index with a Delete Me classifier, because people would immediately start pulling it down erroneously. So I would have to segregate it into some other index (or temporarily patch pip itself so it was named pip-test or something) or I would need to use the staged uploads from #726 and just modify the release process to not push the final "publish the staged release" button during testing.

So in the end, it's totally doable, I'm just not sure if a sandbox instance or some way to flag a package in the main repository as a "temporary" package is better. My gut tells me a sandbox instance is nicer, because I can see us overtime wanting to ratchet down the immutability of the main index over time (at least it's not an uncommon request) and keeping them segregated doesn't back us into as much of a corner in terms of that if we ever do want to do that. That being said, not having to run two separate instances is also appealing to me.

brainwane · 2018-02-05T21:37:07Z

For the folks in this thread who don't already know the context: The folks working on Warehouse have gotten funding to concentrate on improving and deploying Warehouse, and have kicked off work towards our development roadmap -- the most urgent task is to improve Warehouse to the point where we can redirect pypi.python.org to pypi.org so the site is more sustainable and reliable.

Since this feature isn't something that the legacy codebase has, I've moved it to a future milestone. But I have opened #2891 for a related feature that we might be able to do in the next few months.

Thanks and sorry again for the wait.

ncoghlan mentioned this issue Aug 7, 2017

Draft release feature on main archive to allow testing a release before it goes live #726

Open

ncoghlan changed the title ~~"Testing mode" which auto-deletes history after a configurable time~~ Handling the learning & interoperability testing use cases for TestPyPI Aug 7, 2017

di added the feature request label Aug 25, 2017

ncoghlan mentioned this issue Nov 30, 2017

Package names conflicts : why not optional separate namespaces (e.g.: channels on anaconda) ? pypa/packaging-problems#114

Open

brainwane mentioned this issue Feb 5, 2018

Time-limited sandbox instance with clone of PyPI DB for testing #2891

Closed

brainwane added testing Test infrastructure and individual tests needs discussion a product management/policy issue maintainers and users should discuss labels Feb 5, 2018

brainwane added this to the Cool but not urgent milestone Feb 5, 2018

dstufft mentioned this issue Jun 28, 2018

Packaging Tutorial should mention that packages TestPyPI and PyPI are different sets pypa/packaging.python.org#528

Closed

xmunoz mentioned this issue Apr 26, 2021

Package preview feature for PyPI psf/fundable-packaging-improvements#32

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handling the learning & interoperability testing use cases for TestPyPI #2286

Handling the learning & interoperability testing use cases for TestPyPI #2286

ncoghlan commented Aug 7, 2017

dstufft commented Aug 7, 2017

ncoghlan commented Aug 9, 2017

r1chardj0n3s commented Aug 10, 2017

ncoghlan commented Nov 30, 2017

taion commented Nov 30, 2017

ncoghlan commented Nov 30, 2017

taion commented Nov 30, 2017

ncoghlan commented Nov 30, 2017

taion commented Nov 30, 2017

dstufft commented Nov 30, 2017

taion commented Nov 30, 2017

ncoghlan commented Nov 30, 2017

takluyver commented Nov 30, 2017

ncoghlan commented Nov 30, 2017

takluyver commented Nov 30, 2017

taion commented Nov 30, 2017 •

edited

Loading

dstufft commented Nov 30, 2017

brainwane commented Feb 5, 2018 •

edited

Loading

Handling the learning & interoperability testing use cases for TestPyPI #2286

Handling the learning & interoperability testing use cases for TestPyPI #2286

Comments

ncoghlan commented Aug 7, 2017

dstufft commented Aug 7, 2017

ncoghlan commented Aug 9, 2017

r1chardj0n3s commented Aug 10, 2017

ncoghlan commented Nov 30, 2017

taion commented Nov 30, 2017

ncoghlan commented Nov 30, 2017

taion commented Nov 30, 2017

ncoghlan commented Nov 30, 2017

taion commented Nov 30, 2017

dstufft commented Nov 30, 2017

taion commented Nov 30, 2017

ncoghlan commented Nov 30, 2017

takluyver commented Nov 30, 2017

ncoghlan commented Nov 30, 2017

takluyver commented Nov 30, 2017

taion commented Nov 30, 2017 • edited Loading

dstufft commented Nov 30, 2017

brainwane commented Feb 5, 2018 • edited Loading

taion commented Nov 30, 2017 •

edited

Loading

brainwane commented Feb 5, 2018 •

edited

Loading