-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Efficient ways to sync with the Podcast Index database #33
Comments
In order to keep the mirror up to date, it would be helpful to have a diff indicating insertions, deletions, updates. I'm not talking about updates to the feed contents, but updates to the feed identity (feed URL, itunes ID, ...). This sort of mirroring has some parallels with the way mirrors are created for Linux distributions using rsync to only transfer what has changed, although in practice, a podcast index mirror DB might either be an exact replica or it might be a custom DB with extra columns. As long as it has the same primary key, the diff approach will still work. Since there are straightforward instructions on how to create a Linux distribution mirror, there are many Linux mirrors and no single point of failure. My Arch Linux mirrorlist file has 500 alternative mirror sites in it. In theory, the podping network could also be used to broadcast "insertions" at least. The podcast index might then publish guidelines on how to independently detect deletions and updates (i.e. to the identity) on their own. Although this approach might need to involve adding the iTunes ID to the podping message. Ideally it would be in the feed content anyway but that is unlikely to be a realistic option in the near to medium term. |
I suppose an alternative would be to leave the podping message format the way it is, so just broadcasting the feed URLs, and then rely on the iTunes API to look up the iTunes ID whenever a new podcast appears. There's no official API to lookup an iTunes ID by feed URL, but you can lookup by title and get a set of results, then iterate over those results to match the feed URL.
There is a limit of 20 API calls per minute, so this assumes new podcasts are created at a rate no greater than 20 per minute. |
In Podcastindex-org/podcast-namespace#558 I wrote:
@daveajones replied:
I'm moving that discussion here and would be interested in the bunch of ways you mentioned. I'm personally interested in ways that don't hit the API server in part due to Podcastindex-org/legal#1 which prohibits building databases out of content returned from the API. I think ideally we want an efficient and permissible way to create mirror databases, not only to improve locality but to facilitate mirroring and prevent a single point of failure.
The text was updated successfully, but these errors were encountered: