Decentralised GUID system based on Podping called PodpingGUID #533

brianoflondon · 2023-05-16T07:10:16Z

brianoflondon
May 16, 2023

Over reliance on PodcastIndex.org's API

We recognize that enormous development has been spurred by the PodcastIndex and the API is being widely used but at the same time many of us know the inherently decentralised nature of Podcasting is one of its greatest strengths.

No large entities have captured podcasting and we're doing our best to make sure they don't.

GUID

Into this we recognised very early on that we needed a more reliable Unique Identifier for Podcasts which was host independent. Up to this point RSS Feed URLs have largely defined a feed with hit and miss methods to move feeds from one host to another. This has led to duplication and confusion.

To fill the roll of a unique, unchanging identifier for any podcast the GUID was adopted in Phase 3 1st June 2021.

GUID use increasing

As we move toward new tags such as <remoteItem> it becomes clear that the GUID is the most sensible way to identify third party feeds. Any use of URLs can lead to ambiguity and relies on unreliable redirection chains which may break.

GUID resolving

At present the only way to find an RSS Feed URL from a GUID is to query the PodcastIndex API or other systems run by PodcastIndex.

Proposal

Self Hosted GUID Resolver via Podping

What I'm proposing to build is a self hosted GUID resolver which can keep itself up to date independently of PodcastIndex (eventually). There are three components to this on the client side, all of which I believe will run happily side by side on a single modest server.

API server and Database

I propose and API server and MongoDB database running together in Docker. This will be able to return RSS URLs for any GUID (unfortunately there are some non-unique duplicated GUIDs at present) and the opposite to find a GUID of an RSS Feed if one isn't in the XML.

GUID-Slurper

This component (which can run alongside the API and Database, will stream Hive blocks and watch for announced GUID changes.

On first start up the system will fetch a recent data dump of all 6m+ GUID records and populate its own database. From then on it will scan Hive to catch up on any changes since the data dump and continue forward updating and maintaining itself without any further action. If it goes down, it will be able to catch up from where it left off.

This is similar to what I have already got running for Podping: api.podping.org.

Traefik Reverse Proxy

Included in the configuration would be a Traefik reverse proxy which handles SSL certs and makes the API available on a domain name.

Podping for GUIDs

At the PodcastIndex end (and with the option to decentralise later) is a simple addition to Podping which sends out a suitable message announcing any changes to GUIDs and URLs.

There is a higher degree of trust needed here: for the current system where most RSS Feed Hosts use podping.cloud this system is trustable. For later independent announcing of GUID changes (which I believe may only happen some years into this) we would have to rely on the RSS feed using the currently proposed <podcast:podping> tag to identify which Hive accounts (and only those Hive accounts) have authority to issue a GUID URL mapping change. It should also be possible to confirm that the GUID URL change requested is actually present in the RSS feed and they match at the time of ingestion.

Conclusion

What I'm proposing is that anyone can grab a domain name, pull a single docker-compose.yaml file from GitHub and with one command start up a GUID resolver.

Even whilst most of the input for this system will come via PodcastIndex to begin with, the tools and techniques necessary to announce a change of URL for a given GUID will all be public domain, free, open and permissionless.

jamescridland · 2023-05-16T10:09:23Z

jamescridland
May 16, 2023

This looks great, Brian.

A few comments to wildly agree with you:

This will be able to return RSS URLs for any GUID (unfortunately there are some non-unique duplicated GUIDs at present)

We need to stamp duplicates out as quickly as we can. Podnews's database expects GUIDs to be unique. I have a race condition on the database, therefore - a duplicate GUID will fail to ingest (with an opaque error). I've obviously no plans to fix this, since GUIDs are designed to be unique.

What would be a really good start is to send emails to podcast host tech support teams where non-unique GUIDs are seen, to ensure that they're promptly fixed. This could be done with additional code on one of these independent resolvers. (With @daveajones 's permission, that could be something done on behalf of the Index, from a suitable email).

the only way to find an RSS Feed URL from a GUID is to query the PodcastIndex API

This isn't flawless - and we should make this quite clear in any writeup, in case anyone thinks that RSS feeds are unique. As one example, any Libsyn podcast has I think five different RSS feeds supplied to different partners - so that Libsyn can more accurately reference where the audio is downloaded from. The GUID fixes problems when podcasts move; but also fixes problems like this, where multiple RSS feeds can legitimately be used for the same podcast. (All would return the same GUID).

On first start up the system will fetch a recent data dump of all 6m+ GUID records and populate its own database. From then on it will scan Hive to catch up on any changes

What happens about podcasts who don't use podping? Some are the biggest podcasts out there. Using the RSS feed, Podnews will automatically add shows to the index that ARE in Apple Podcasts but AREN'T in the Podcast Index. You'd be surprised how many are unlisted (and, by extension, have no GUID in the feed). They get assigned the correct GUID on ingestion, but your suggested architecture wouldn't work. (One potential workaround is for every newly ingested podcast to be 'podpinged' by the Index; which would have the benefit of a trail of every newly added show.)

This is a good and sensible plan: thank you. I'll leave the technology to someone else (I've an allergy to Docker, and an attraction to tiny, micro instances with minimal computational power, but that's just me).

6 replies

jamescridland May 17, 2023

That No Agenda feed looks like a prime case for the alternateEnclosure... but I digress. You're right, a canonical RSS feed is an interesting one. https://podnews.net/rss/spotify and https://podnews.net/rss should both contain the same GUID, and I wonder how we properly link to the main version (the one without the spotify in it).

In terms of updates - I don't think we need to announce all 6m feeds; but it seems to me that there's a point in the future where we might start announcing new shows. That then means that any client can be kept up to date with a) a download of a dump from Monday at 00:00; b) go back through Hive until Monday at 00:00. From then on, as long as the client listens to Hive, it'll have every GUID in it. Perhaps. What worries me is the launch of a big show which doesn't use podping - which will never make it into the database of these servers. Maybe.

francosolerio May 17, 2023

https://podnews.net/rss/spotify and https://podnews.net/rss should both contain the same GUID

This puzzles me. If the RSS feed is the source of truth, how could there be two sources of truth? Doesn't this lead to potential issues? What if one of the two at some point doesn't have updates?

keunes May 17, 2023

What worries me is the launch of a big show which doesn't use podping - which will never make it into the database of these servers. Maybe.

If this podcast doesn't get added by podping, it may get added to the Podcast index db via the API (thank you James for adding the ones from Apple). Once it's in the db it can be distributed further via podping. As I understood from Brian's comments. Essentially as you described as 'workaround'. There's no downside to this approach per se, is there?

jamescridland May 18, 2023

Once it's in the db it can be distributed further via podping

I think that's what I'm suggesting; but podcasts currently aren't distributed by podping unless the publisher does that.

If the RSS feed is the source of truth, how could there be two sources of truth?

That's an issue for the web too. It was fixed with the canonical header - so if you look at https://podnews.net/go-subscribe/github or, indeed, https://podnews.net/?utm_source=github then it will have a canonical as https://podnews.net/ to indicate to webcrawlers which page address is the "correct" one, and in this case, the source of truth.

There is a <link rel="self" in RSS - and here is the spec for that. That isn't the same, though. I'd rather like a canonical link. You'd assume that Libsyn has also thought about this. It's a wider question about RSS feeds in their entirety, though.

What if one of the two at some point doesn't have updates?

For me, that won't happen (it's the same code). But, I agree, that's a potential problem.

brianoflondon May 18, 2023
Author

This system is separate to Podping but using the same underlying tech (custom_json messages on Hive).

Changes to the URLs or new GUIDs as found in RSS feeds which the PodcastIndex scans, will be announced by the PodcastIndex.

The point here is that this function CAN be performed directly by a feed owner if they want to. Right now the feeds would just issue a 301 redirect and it would be PodcastIndex that takes care of updating the GUID -> URL map.

The point here is that this one piece of information, GUID, is actually more vital to open podcasting and a bunch of future features, that it deserves a way to live outside of PodcastIndex even though PodcastIndex does exist.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decentralised GUID system based on Podping called PodpingGUID #533

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 6 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Decentralised GUID system based on Podping called PodpingGUID #533

brianoflondon May 16, 2023

Over reliance on PodcastIndex.org's API

GUID

GUID use increasing

GUID resolving

Proposal

Self Hosted GUID Resolver via Podping

API server and Database

GUID-Slurper

Traefik Reverse Proxy

Podping for GUIDs

Conclusion

Replies: 1 comment · 6 replies

jamescridland May 16, 2023

jamescridland May 17, 2023

francosolerio May 17, 2023

keunes May 17, 2023

jamescridland May 18, 2023

brianoflondon May 18, 2023 Author

brianoflondon
May 16, 2023

Replies: 1 comment 6 replies

jamescridland
May 16, 2023

brianoflondon May 18, 2023
Author