Decentralised GUID system based on Podping called PodpingGUID #533
Replies: 1 comment 6 replies
-
This looks great, Brian. A few comments to wildly agree with you:
We need to stamp duplicates out as quickly as we can. Podnews's database expects GUIDs to be unique. I have a race condition on the database, therefore - a duplicate GUID will fail to ingest (with an opaque error). I've obviously no plans to fix this, since GUIDs are designed to be unique. What would be a really good start is to send emails to podcast host tech support teams where non-unique GUIDs are seen, to ensure that they're promptly fixed. This could be done with additional code on one of these independent resolvers. (With @daveajones 's permission, that could be something done on behalf of the Index, from a suitable email).
This isn't flawless - and we should make this quite clear in any writeup, in case anyone thinks that RSS feeds are unique. As one example, any Libsyn podcast has I think five different RSS feeds supplied to different partners - so that Libsyn can more accurately reference where the audio is downloaded from. The GUID fixes problems when podcasts move; but also fixes problems like this, where multiple RSS feeds can legitimately be used for the same podcast. (All would return the same GUID).
What happens about podcasts who don't use podping? Some are the biggest podcasts out there. Using the RSS feed, Podnews will automatically add shows to the index that ARE in Apple Podcasts but AREN'T in the Podcast Index. You'd be surprised how many are unlisted (and, by extension, have no GUID in the feed). They get assigned the correct GUID on ingestion, but your suggested architecture wouldn't work. (One potential workaround is for every newly ingested podcast to be 'podpinged' by the Index; which would have the benefit of a trail of every newly added show.) This is a good and sensible plan: thank you. I'll leave the technology to someone else (I've an allergy to Docker, and an attraction to tiny, micro instances with minimal computational power, but that's just me). |
Beta Was this translation helpful? Give feedback.
-
Over reliance on PodcastIndex.org's API
We recognize that enormous development has been spurred by the PodcastIndex and the API is being widely used but at the same time many of us know the inherently decentralised nature of Podcasting is one of its greatest strengths.
No large entities have captured podcasting and we're doing our best to make sure they don't.
GUID
Into this we recognised very early on that we needed a more reliable Unique Identifier for Podcasts which was host independent. Up to this point RSS Feed URLs have largely defined a feed with hit and miss methods to move feeds from one host to another. This has led to duplication and confusion.
To fill the roll of a unique, unchanging identifier for any podcast the GUID was adopted in Phase 3 1st June 2021.
GUID use increasing
As we move toward new tags such as
<remoteItem>
it becomes clear that the GUID is the most sensible way to identify third party feeds. Any use of URLs can lead to ambiguity and relies on unreliable redirection chains which may break.GUID resolving
At present the only way to find an RSS Feed URL from a GUID is to query the PodcastIndex API or other systems run by PodcastIndex.
Proposal
Self Hosted GUID Resolver via Podping
What I'm proposing to build is a self hosted GUID resolver which can keep itself up to date independently of PodcastIndex (eventually). There are three components to this on the client side, all of which I believe will run happily side by side on a single modest server.
API server and Database
I propose and API server and MongoDB database running together in Docker. This will be able to return RSS URLs for any GUID (unfortunately there are some non-unique duplicated GUIDs at present) and the opposite to find a GUID of an RSS Feed if one isn't in the XML.
GUID-Slurper
This component (which can run alongside the API and Database, will stream Hive blocks and watch for announced GUID changes.
On first start up the system will fetch a recent data dump of all 6m+ GUID records and populate its own database. From then on it will scan Hive to catch up on any changes since the data dump and continue forward updating and maintaining itself without any further action. If it goes down, it will be able to catch up from where it left off.
This is similar to what I have already got running for Podping: api.podping.org.
Traefik Reverse Proxy
Included in the configuration would be a Traefik reverse proxy which handles SSL certs and makes the API available on a domain name.
Podping for GUIDs
At the PodcastIndex end (and with the option to decentralise later) is a simple addition to Podping which sends out a suitable message announcing any changes to GUIDs and URLs.
There is a higher degree of trust needed here: for the current system where most RSS Feed Hosts use
podping.cloud
this system is trustable. For later independent announcing of GUID changes (which I believe may only happen some years into this) we would have to rely on the RSS feed using the currently proposed<podcast:podping>
tag to identify which Hive accounts (and only those Hive accounts) have authority to issue a GUID URL mapping change. It should also be possible to confirm that the GUID URL change requested is actually present in the RSS feed and they match at the time of ingestion.Conclusion
What I'm proposing is that anyone can grab a domain name, pull a single
docker-compose.yaml
file from GitHub and with one command start up a GUID resolver.Even whilst most of the input for this system will come via PodcastIndex to begin with, the tools and techniques necessary to announce a change of URL for a given GUID will all be public domain, free, open and permissionless.
Beta Was this translation helpful? Give feedback.
All reactions