<podcast:transcript> format support is underspecified #63

mattbasta · 2020-10-22T03:06:17Z

Transcripts should prescriptively define transcript format support. HTML5, for instance, requires certain formats (e.g., WebM) to be implemented by browsers to be considered compliant. This allows both browsers and developers to produce files which work for an overwhelming majority of users. By underspecifying transcript format support, the tag does not reduce fragmentation.

I recommend requiring support for the following MIME types:

text/plain - Displayed as plain text
text/html - Displayed as rich text, with minimum support for basic formatting elements (e.g., p, div, br, b, etc.)
application/srt
text/vtt (WebVTT - Displayed in an appropriate way, formatting may be ignored

I would strongly recommend against specifying a new JSON file format as part of this spec.

The text was updated successfully, but these errors were encountered:

tomrossi7 · 2020-10-22T10:22:36Z

I would strongly recommend against specifying a new JSON file format as part of this spec.

Hah! I just commented on your other thread. Can you explain why you think this? Right now AWS, Temi, and Descript all have their own JSON format. This makes it very difficult for hosts and apps to support the higher fidelity transcript we can offer in a JSON format.

mattbasta · 2020-10-22T20:49:08Z

@tomrossi7 Introducing a new format means there's now yet another format. The spec, unless there's a compelling feature gap, should lean on existing standards. WebVTT, for instance, is a popular export format for many transcription services and is well supported in browsers. Parsers already exist.

For services which have their own format already, these can be converted to standardized formats. Proprietary (and often underspecified) formats are the same root causes for fragmentation in the podcast space already. If those formats are advantageous, adding them later is trivial.

daveajones · 2020-10-22T21:46:13Z

@mattbasta We adopted this tag as-is because it was already in use by BuzzSprout and had been integrated by PodcastAddict app. We don't want to break that, so we could just add a new optional format instead of removing the integration they've already established prior to the namespace existing.

Adding WebVTT would be an easy addition to make.

Does anyone know a specific, podcast accessible (in terms of pricing) service that allows exporting speech to text as WebVTT?

mattbasta · 2020-10-22T22:52:03Z

I don't have a list handy, but WebVTT is essentially a W3C standardized version of SRT, and converting between them is ~trivial. Rev and 3Play both support this. The big benefit is that you can use the same file for an HTML5 player as you would for a podcast app.

To be honest, it's frustrating that the spec isn't even formalized, yet changes cannot be made because folks have already integrated against it. Should this tag be marked as out-of-scope for phase 1, if no changes can be made against it? This ties back to my original concern that things are being rushed out the door without broad consensus from stakeholders.

daveajones · 2020-10-22T23:31:40Z

@mattbasta Folks didn’t integrate transcripts against this spec. Transcripts was already being done in the wild. We integrated against it because it was prior art that was already working well. Rule #1 here has to be not to break existing interop. If it does, those people will just ignore the whole thing. Again, referring here to Rules for Standards Makers.

Isn’t adding support for WebVTT as an optional format a good solution? It seems very reasonable to me and doesn’t force anyone to change existing code.

mattbasta · 2020-10-22T23:44:27Z

Whether the JSON support is removed from the spec is orthogonal to my main concern, which is broader: the spec should require support for a set of formats. Whether WebVTT is added is also orthogonal (it would be very nice to be able to have one transcript format that's used across all channels, but it would hardly be the first time we've had to integrate the same thing multiple times).

Requiring implementors to support a set of formats means that podcasts that use the tag will have their transcripts appear. If I publish my transcripts in SRT, but nobody actually ever went and implemented SRT support, my transcripts are as good as absent. And if the lowest common denominator is text/plain or text/html, then everyone will just use that, since it's guaranteed to work everywhere (despite being objectively less useful). This is ultimately the phenomenon that led to Adobe having dominance over web video for nearly two decades: it was the lowest common denominator.

Building on that, my followup ask was to set that minimum standard to be obvious well-defined choices (WebVTT being the one that immediately came to mind). Assuming it's too late to remove a bespoke JSON format, I don't have extremely strong feelings about WebVTT inclusion (though it would be very nice).

tomrossi7 · 2020-10-22T23:51:50Z

The spec as it is written supports WebVTT. Neither SRT or WebVTT have the fidelity that you can achieve with JSON, thats why its included as well. If you don't like it, don't use it. Developers that want to accomplish more than whats possible with lower fidelity transcripts can continue to innovate.

mattbasta · 2020-10-23T00:43:57Z

@tomrossi7 This is my point. The spec should require JSON support (and every other noted format), if that's considered the best. As a podcaster, I'm going to choose the format that works with the most apps. If every app supports HTML because they can plop it on a webview and call it a day, I'm going to always choose HTML because it has the biggest reach. From my perspective as a podcaster, if only one or two apps support "the good stuff," I'm not going to choose the good stuff because compatibility will always outweigh cool good features.

daveajones closed this as completed Oct 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

<podcast:transcript> format support is underspecified #63

<podcast:transcript> format support is underspecified #63

mattbasta commented Oct 22, 2020

tomrossi7 commented Oct 22, 2020

mattbasta commented Oct 22, 2020

daveajones commented Oct 22, 2020

mattbasta commented Oct 22, 2020

daveajones commented Oct 22, 2020

mattbasta commented Oct 22, 2020

tomrossi7 commented Oct 22, 2020

mattbasta commented Oct 23, 2020

<podcast:transcript> format support is underspecified #63

<podcast:transcript> format support is underspecified #63

Comments

mattbasta commented Oct 22, 2020

tomrossi7 commented Oct 22, 2020

mattbasta commented Oct 22, 2020

daveajones commented Oct 22, 2020

mattbasta commented Oct 22, 2020

daveajones commented Oct 22, 2020

mattbasta commented Oct 22, 2020

tomrossi7 commented Oct 22, 2020

mattbasta commented Oct 23, 2020