Proposal: Signal to auto-refresh chapters and transcripts #548

ryan-lp · 2023-06-26T02:32:40Z

ryan-lp
Jun 26, 2023

This proposal suggests a standard way for hosting services to signal to apps when chapters and transcripts should be re-fetched.

Chapters and transcripts can change after the first time they're fetched, and hence they sometimes need to be re-fetched. This is because chapters and transcripts take time to produce, and sometimes are produced not purely by AI ahead of time, but actually contributed at some later time by listeners of the show. Or, it can simply be because even if these files are AI produced, the inference models take time to execute, and depending on server load, may need to wait in a processing queue. Therefore in the meantime, one of these placeholders tends to be uploaded to indicate "please try again in a few moments, it will be ready soon..." (e.g. the type of placeholder that Dreb uploads before the real chapters and transcripts are ready).

Currently, apps don't have a good signal for when chapters and transcripts are expected to be ready, whether in 5 minutes or in 24 hours, and so apps either have to poll for changes or provide users a refresh button where users end up repeatedly hitting the refresh button, having the same end effect on the network as polling. At the same time, if an app tries to poll on behalf of the user, it must deal with the possibility that actually the first version of the asset that was already fetched IS the final version, and so polling should NOT be attempted.

It would therefore be helpful to have a signal to apps indicating if and when a re-fetch should be attempted.

Checking online for solutions that people are already adopting, I found one interesting approach based on the HTTP protocol. We have the 200 status for a resource that is "ready", we have 404 for a resource that is a broken link. But do we have a status code for a resource that is "not ready yet", or "pending"?

A number of interesting suggestions have been made on StackOverflow and StackExchange:

202 Accepted: The request has been accepted for processing, but the processing has not been completed. The request might or might not eventually be acted upon, as it might be disallowed when processing actually takes place.
404 Not Found.
409 Conflict: The request could not be completed due to a conflict with the current state of the resource. This code is only allowed in situations where it is expected that the user might be able to resolve the conflict and resubmit the request.
503 Service Unavailable: The 503 status code indicates that the server is currently unable to handle the request due to a temporary overload or scheduled maintenance, which will likely be alleviated after some delay. The server MAY send a Retry-After header field to suggest an appropriate amount of time for the client to wait before retrying the request.

Of these, 503 seems particularly interesting and has attracted interest on StackOverflow because unlike the other options, it also supports the optional Retry-After header:

Servers send the "Retry-After" header field to indicate how long the user agent ought to wait before making a follow-up request. When sent with a 503 (Service Unavailable) response, Retry-After indicates how long the service is expected to be unavailable to the client.

Example response:

HTTP/1.1 503 Service Unavailable
Retry-After: 300
Content-Type: application/json

{
  "version": "1.2.0",
  "chapters":
  [
    {
      "startTime": 0,
      "title": "Chapters coming soon"
    }
  ]
}

Thus, we can issue a valid chapters or transcripts file in the response so that it could be consumed and shown by the app, but at the same time use the HTTP status and Retry-After header to signal to the app when a re-fetch should happen.

I do not see any harm in hosts choosing to implement this approach before it is standardised, since apps should already be programmed to handle whatever server response they receive, but apps could then learn to leverage the information returned by the 503 request to more smartly poll for updates.

There may also be alternative solutions to the same problem (e.g. additional attributes tacked onto the chapters and transcript tags, or additional server headers), but in the meantime I think the 503 request status is a conservative solution that could be adopted now with minimal risk.

francosolerio · 2023-06-26T10:43:27Z

francosolerio
Jun 26, 2023

While something to signal chapters and transcripts status would be very useful for apps, the http response status might be a little difficult to implement for podcasters who self-host their shows. As an example: I put all my mp3s, chapters and transcript files in an S3 bucket, and I have no control on the http response status the server sends.

Maybe putting an ETA as an attribute to the podcast:chapter and podcast:transcript tags might be more accessible for both established hosting services and self-hosted podcasts.

4 replies

ryan-lp Jun 26, 2023
Author

As I mentioned in my last sentence, I would certainly have no objection to adding a new tag attribute at some point down the line.

The standards process is also a slow and deliberate process. Keeping in mind that proposals for new tags or attributes can take months or years to discuss, debate, and hopefully adopt, the above proposal is strictly limited to something that could be implemented immediately by hosting services in the meantime. I needed such a solution for my podcast transcript service, and so I have written up essentially the approach I have taken.

By the way, if you are hosting on an S3 bucket, there may (almost) be a way you could get this to work. It's true that you have no control over the HTTP status, however it is possible to set up a redirect rule so that if an object is not found, you redirect to some external server that in turn issues the 503 response. So if you happened to have your own server elsewhere for other things, you could in theory set up a very simple endpoint that all it does is issue 503 responses.

francosolerio Jun 26, 2023

I'm actually using an S3 compatible system (Digital Ocean Spaces), I don't think it has the same rule system as the original Amazon S3, I'll investigate.

johnspurlock Jun 26, 2023

I needed such a solution for my podcast transcript service

What is the url of your service?

ryan-lp Jun 27, 2023
Author

What is the url of your service?

Although it has actually been operational since 2021, it is still not open to the public. The intention is to release it into the public but there are several issues I am waiting on to be standardised.

I will list the critical issues below, but for some background, my goal is primarily to serve the podcast listeners who actually depend on the transcripts, and so the way it will work is that listeners who really need the transcripts can essentially vote on the podcast they want transcribed, and if enough people vote for it, I will donate free transcripts to the podcaster (subject to an invitation they first accept) so that they don't need much persuasion to consider going out of their way to serve an audience segment that they otherwise might not care about. This is about serving the listeners, and so the service is therefore meant to provide as little friction as possible to the podcast creator so that it would be hard for them to say no to transcripts. The service won't target podcasts hosted on platforms that already have a transcription service built in, but will rather mainly target podcasts who are hosted on platforms that don't yet support the namespace tags, or who do but don't have a built-in transcript service, or podcasts that are self hosted but don't have the infrastructure to make transcription easy. Among these, one of the main targets is Anchor.fm (or whatever we now call it :-) ) because despite it not supporting the namespace, keep in mind that my goal is to serve the listeners first and foremost even if the podcasters themselves aren't as motivated by it, and so if Anchor is where many of a litener's favourite podcasts are hosted, I want to be able to provide a transcript solution that even podcasters on Anchor could easily adopt without friction, ultimately so that the listeners are not left behind.

So the first issue that I really depend on is #477 which simply proposes that even if a podcast is on a platform that does not natively support the transcript tag, they could alternatively just add a link such as https://foo.com/transcript.srt to their show notes or description, and podcast apps would be able to pick it up, assuming it becomes a standard approach supported by the apps. This would actually be really easy to scrape from the description based on the link's filename extension alone, at least for SRT. For JSON-based formats, this would depend on #452 . Now that's certainly the most critical issue, but others are #484, #519, #483 and #370.

So although I have a service that is basically functional and ready to use, and would love to put it out into the world, I am also trying to work through the standards process before it can actually work in the ecosystem. And in wanting to provide an option that can serve listeners even when podcasters or hosts may not be interested or motivated, the solution that I have implemented on my end is not something I can do alone, it's something that requires the (minimal) cooperation of other apps in the ecosystem to pick up the transcript link from the description if it's not in the transcript tag. Now admittedly my track record for proposals is currently zero :-) Since it's been longer than a year and there has been very little movement on getting this proposal adopted, I may end up building my own app, even if it is the only podcast app that offers the capability to pick up transcript links this way. Of course that will take time, too, but I my dilemma is that I can't actually release a transcription service that is essentially built around this idea, if there are no apps that will actually pick up the link, and hence I'm a little bit stuck at the moment.

tomrossi7 · 2023-06-26T17:37:05Z

tomrossi7
Jun 26, 2023
Maintainer

For Buzzsprout, the chapters and transcripts can change based on dynamic content. The only way to be sure you have the correct version is to request it at the same time as you request the audio. I hope that makes sense!

1 reply

ryan-lp Jun 27, 2023
Author

@tomrossi7 This proposal is only relevant in scenarios where the chapters and transcripts are completed after the episode is published, either due to extended processing time or lagging community contributions. Dynamic ad insertion is a different problem, and I would not consider changes by way of dynamic insertions to be in the same category that this proposal addresses. However, consider the scenario where a host does dynamic ad insertion, but also has extended processing times for its chapters and transcripts such that they might not be available immediately. In that case, a host could have an initial period of time where it issues 503 responses with the expected processing time, after which it issues a 200 response with the dynamic insertion points cached by IP address (and perhaps time window). If Buzzsprout doesn't have extended processing times and in fact has all assets ready to serve from the very first request, then it would not need any solution of the sort offered by this proposal.

But regarding the dynamic insertion problem, I think we really need a proper solution to that. Having the chapters and transcripts being rewritten with dynamic insertions on each request breaks a lot of things in the ecosystem and we need a solution that allows different services to integrate with each other without surprises.

If I build a transcript search engine where all transcripts in the world are crawled, and the words are indexed with timestamps, that's obviously an expensive operation that we can't afford to do more often than necessary, and we don't want those word and timestamp indices to be invalidated immediately upon crawling. Then when someone performs a search, they may be directed to a segment of text that does not actually exist in the audio we fetch at the time the user performs the search, and more likely, they will jump to a timestamp that is no longer valid. Please let's find a proper solution to dynamic ad insertion. I know this is off topic for the present discussion, but I'm sure you have seen the other proposals on this topic. If none of them are satisfactory to you, let's work on a new proposal that does satisfy your own requirements while also not breaking search engines and other kinds of apps.

francosolerio · 2023-06-26T18:04:35Z

francosolerio
Jun 26, 2023

App developer point of view.
Right now it seems to me we have (at least) two incompatible implementations of chapters/transcripts files mechanics.

Dave + Adam's way: chapters are not yet published, a placeholder file is published and its url is specified in the rss feed. The app will need to download and scan the chapter/transcript file repeatedly, optimally each time the user presses play, to check if the chapters/transcripts have been published or modified.

Buzzsprout way: chapters have to be downloaded at the same time as the mp3 because with dynamic ads / content, the location markers would go out of sync. Refreshing the chapter/transcript file at a later time would download a mismatched version of the files.

With the current P2.0 specifications I see no way to code an app that is compatible with both ways. A signal like the one in this proposal is necessary. I would really prefer it coded in the RSS file which should remain our main source of truth.

4 replies

thebells1111 Jun 27, 2023

Would it make more sense to have something in the chapter/transcript file to indicate if it's a completed chapter, so no further querying is required, or if it's a placeholder so subsequent querying is recommended?

If it's in the feed, then the podcaster will need to reupload the feed after the chapters file is finalized. If it's in the chapter, then no need to republish the feed. Since we're fetching the chapter file anyway, that can be our source of truth for whether the chapter is a placeholder or not.

ryan-lp Jun 27, 2023
Author

Good point. However, I think the problem with embedding it in the chapters/transcript file itself is that we have multiple different formats for transcripts and it would be a bit complex to define for each different format how we are going to encode that information into that format. Whereas if we embed it in the HTTP response, either in the status, or in the response headers, we can do it in a way that is independent of the file format. If we want a solution that works for people who are hand editing their files and uploading them to S3 buckets, then the options are more limited, and for example putting it in the feed may be the only feasible option.

What I would suggest for the self-hosted option is to just use a 404 instead of a 503. We could standardise it so that if an app sees the link to the chapters or transcript in the feed, but that link gives any sort of error, either a 404 or a 503, the app will try again, but we just place some limits on when to give up. And further, if it's a 503, the app can have special logic to use the additional hints offered in the Retry-After header.

francosolerio Jun 27, 2023

I really don't like repurposing something like the status code, it can lead to errors and unwanted behaviours (404 could be returned because the chapters are not ready, or because the podcaster misspelled the url).

Apps already have to refresh feeds periodically, so it would feel natural to put something like:

<podcast:chapters url="https://example.com/episode1/chapters.json" type="application/json+chapters" final=NO />

If final attribute is NO the app will continue to load the chapters every time the episode is resumed. If the final attribute is YES, the app will store the chapters and consider them immutable.

This solution is simple, accessible to both hosting services and self-hosted podcasts, and doesn't lead to secondary effects / unwanted behavior.

ryan-lp Jun 27, 2023
Author

If final attribute is NO the app will continue to load the chapters every time the episode is resumed. If the final attribute is YES, the app will store the chapters and consider them immutable.

I think this is the sort of thing that will take time to fully think out to ensure it is definitely the right approach, hence the proposal for something immediate in the meantime. Aside from the issues already raised, one other is whether the value should be a simple binary.

In the system I am developing, typos can continue to be fixed in the transcript manually, and it may actually take several days, or more, to iron out all of the typos if the author is aiming for an accurate transcript. The same would also apply to speaker tags. And so the way I see it, the transcript is really going through two phases. First is the processing of the initial draft which may be full of mis-transcribed words. After this processing is complete, the transcript may be published immediately so that listeners have "something" at least to go on. But then we enter the second phase where they may want to make further edits to the transcript over a period of the next few days. If the author finishes editing and considers it to be the final version, that is when an app should truly consider it immutable. But until then, an app should still be able to make HEAD requests to see if there have been further modifications that may contain important corrections to the transcript. So if you wanted to reflect that, instead of YES/NO, you might have state=processing/processed/final or something along those lines.

This may also tie in with #458 which proposes another binary tag to indicate whether a transcript was made by a human or by machine. I don't think the binary options are helpful enough there either, but there appears to be some overlap here because if a transcript starts out as machine edited, and then undergoes a revision process by humans, there are all of these in-between stages where it's half-way between machine-produced and human-produced. The transcript may eventually reach a level where it is a human standards, or it may never reach that stage, but it might come close.

So, it's a bit of a complex discussion, but in the meantime, we can still use a simple status code ;-)

As for the 404 workaround for the self hosting scenario you brought up, the 404 approach is not the perfect/ideal solution for self hosted podcasts by any means, although it does conveniently take advantage of the way crawlers traditionally treat 404s. See for example Google's description of how its own crawler works:

So with 404s, along with I think 401s and maybe 403s, if we see a page and we get a 404, we are gonna protect that page for 24 hours in the crawling system, so we sort of wait and we say maybe that was a transient 404, maybe it really wasn’t intended to be a page not found.

In other words, when the crawler gets a 404, it will try again later, but after a certain finite period of time the crawler should give up and treat it as a broken link. So for all intents and purposes, a 404 would induce the sort of behaviour in a crawler that would be desirable here. Until, of course, another proposal for something in the tag language comes along and gets accepted into the spec at some future point.

johnspurlock · 2023-06-27T15:34:04Z

johnspurlock
Jun 27, 2023

Here's what we should do: support ETags in the responses for these external resources (most CDNs already do), and include a similar etag value as an optional attribute in the tag itself (basically a version id). Piggyback on the existing HTTP semantics, this is exactly what we want.

<podcast:chapters url="https://example.com/episode1/chapters.json" type="application/json+chapters" etag="890abc" />

RSS feed is the source of truth, and any change is a signal that the external resource needs to be refetched.

btw this approach should be used for any of the podcast: tags that reference an external HTTP resource. cc @daveajones

This will work for any non-dynamic audio shows, and even "dynamic" hosts like Buzzsprout that are not really request-dynamic, just restitched every so often. Truly dynamic responses would still need to rely on something like Link headers, and would not set an etag in the chapter tag.

6 replies

johnspurlock Jun 27, 2023

If the chapters aren't available, you two options - both of which exist today. Put a provisional stub file there like some shows do, or leave it out, and update the feed when it's available. Good podcast apps will note the difference between lack of chapters and non-lack of chapters as a change.

ryan-lp Jun 27, 2023
Author

I don't object to using etags, it's just that they address a different matter than than the ETA hint I proposed.

johnspurlock Jun 27, 2023

Any ETA retry-after hint would be an estimate anyway (what if transcription fails, takes longer, if the feed author dies, etc).

Nice thing about making it a feed change is that updates are pushed via existing mechanisms like websub and podping, no need to wait/guess at all

ryan-lp Jun 28, 2023
Author

Any ETA retry-after hint would be an estimate

Exactly, and inference models have predictable estimates lending themselves well to the Retry-After header.

anyway (what if transcription fails, takes longer,

Then the client will make the planned request at the hinted Retry-After time, and the server will issue a new response indicating either a new 503 with an updated estimate, or a permanent 500 error indicating failure.

if the feed author dies, etc).

That would not have any effect on the operation of an automated service. Once the processing starts, its behaviour is determined without human intervention.

Nice thing about making it a feed change is that updates are pushed via existing mechanisms like websub and podping, no need to wait/guess at all

Neither websub nor podping solve my scenario in particular, and I doubt they actually solve the average scenario either.

(With the latter, we do not expect podcast player apps themselves to directly subscribe to the podping stream and listen for changes to EVERY feed, nor do we expect the podcast player apps themselves to subscribe to a feed via websub because of the way the technology works. These solutions work best for the use case server side systems such as directories and search engines that need to constantly listen for changes in order to reflect changes to a feed to the directory or index. Even if you disagreed with that, there would still be technical challenges to implementing both of these technologies on device within the podcast player app. But putting that aside...)

In my use case, my goal is not to be the service that hosts the RSS feed, but rather to be the service that hosts the transcript file. There seems to be a "single source of truth" mantra that we must hold to, but I would argue that if we truly want to support an ecosystem where different services can work together rather than having single companies do everything, then we actually want to be able to have a separate hosting company that deals with RSS feeds, and a separate hosting company that deals with transcript files, and to allow these two companies to work together without friction. The "single source of truth" hampers this vision. What I need is a model more akin to the way the web was envisaged, where a link truly can be to anywhere external on the Internet, and where a single web page can draw on resources from multiple services and bring them all together.

Thus, what I want is for a podcaster who is with a host that does not have built-in transcript support (or indeed is self hosted with just static files), to simply be able to add a URL link to a transcript provided by my service, and it will "just work". My service will take care of the signalling of when the transcript is ready because the service hosting the RSS feed does not need to be responsible for everything.

I think it's time to re-think this single source of truth model if we truly want to promote a healthy ecosystem different services that can interoperate with each other: a service that focuses on RSS feeds can integrate with a separate service that just focuses on transcripts and can integrate with another service that just focuses on chapters.

thebells1111 Jun 29, 2023

I agree. Much like prefer web development being components that have a separation of concerns, I prefer chapters, transcripts, and other items to be a component that has a separation of concern. All the RSS needs to know about the chapter is a link to the chapter. All of the metadata about the chapter should be in the chapter file.

theDanielJLewis · 2024-10-18T17:22:11Z

theDanielJLewis
Oct 18, 2024

I like our goal here: to ensure the audience has the correct version of the chapters (which might be different from the latest version if they downloaded audio with dynamically inserted content).

Something to keep in mind. Some developers, such as Marco with Overcast, absolutely refuse to implement anything that can be used to track the audience. So checking for updated chapters every time the episode is opened or played won't fly with him, and I think we should keep those privacy concerns in mind.

I like the idea of using Podping for this because that can reduce the number of times an audience member's app will have to ping a server and thus be trackable. But there needs to be a mechanism to distinguish between changes resulting in dynamic content that will not align with predownloaded audio and changes (like corrections or improvements) that will align with predownloaded audio.

2 replies

samsethi Oct 27, 2024

Hi Daniel

You can lead a horse to water but if they refuse to drink then who fault is that? If Marco refuses to add any podcasting 2.0 tags then that is on him. I agree podping is a good solution. What is needed to move this forward?

jamescridland Oct 28, 2024

checking for updated chapters every time the episode is opened or played won't fly with him

He can proxy this call using his own servers, which will not leak any data to the host of the JSON.

Further - we should be clear in the docs for JSON-chapters that they may be proxied by podcast app developers or others.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Signal to auto-refresh chapters and transcripts #548

{{title}}

Replies: 5 comments 17 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Proposal: Signal to auto-refresh chapters and transcripts #548

Replies: 5 comments · 17 replies

ryan-lp Jun 26, 2023 Author

ryan-lp Jun 27, 2023 Author

tomrossi7 Jun 26, 2023 Maintainer

ryan-lp Jun 27, 2023 Author

ryan-lp Jun 27, 2023 Author

ryan-lp Jun 27, 2023 Author

ryan-lp Jun 27, 2023 Author

ryan-lp Jun 28, 2023 Author

Replies: 5 comments 17 replies

ryan-lp Jun 26, 2023
Author

ryan-lp Jun 27, 2023
Author

tomrossi7
Jun 26, 2023
Maintainer

ryan-lp Jun 27, 2023
Author

ryan-lp Jun 27, 2023
Author

ryan-lp Jun 27, 2023
Author

ryan-lp Jun 27, 2023
Author

ryan-lp Jun 28, 2023
Author