Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

<podcast:transcript> format support is underspecified #63

Closed
mattbasta opened this issue Oct 22, 2020 · 8 comments
Closed

<podcast:transcript> format support is underspecified #63

mattbasta opened this issue Oct 22, 2020 · 8 comments

Comments

@mattbasta
Copy link

Transcripts should prescriptively define transcript format support. HTML5, for instance, requires certain formats (e.g., WebM) to be implemented by browsers to be considered compliant. This allows both browsers and developers to produce files which work for an overwhelming majority of users. By underspecifying transcript format support, the tag does not reduce fragmentation.

I recommend requiring support for the following MIME types:

  • text/plain - Displayed as plain text
  • text/html - Displayed as rich text, with minimum support for basic formatting elements (e.g., p, div, br, b, etc.)
  • application/srt
  • text/vtt (WebVTT - Displayed in an appropriate way, formatting may be ignored

I would strongly recommend against specifying a new JSON file format as part of this spec.

@tomrossi7
Copy link
Contributor

I would strongly recommend against specifying a new JSON file format as part of this spec.

Hah! I just commented on your other thread. Can you explain why you think this? Right now AWS, Temi, and Descript all have their own JSON format. This makes it very difficult for hosts and apps to support the higher fidelity transcript we can offer in a JSON format.

@mattbasta
Copy link
Author

@tomrossi7 Introducing a new format means there's now yet another format. The spec, unless there's a compelling feature gap, should lean on existing standards. WebVTT, for instance, is a popular export format for many transcription services and is well supported in browsers. Parsers already exist.

For services which have their own format already, these can be converted to standardized formats. Proprietary (and often underspecified) formats are the same root causes for fragmentation in the podcast space already. If those formats are advantageous, adding them later is trivial.

@daveajones
Copy link
Contributor

@mattbasta We adopted this tag as-is because it was already in use by BuzzSprout and had been integrated by PodcastAddict app. We don't want to break that, so we could just add a new optional format instead of removing the integration they've already established prior to the namespace existing.

Adding WebVTT would be an easy addition to make.

Does anyone know a specific, podcast accessible (in terms of pricing) service that allows exporting speech to text as WebVTT?

@mattbasta
Copy link
Author

I don't have a list handy, but WebVTT is essentially a W3C standardized version of SRT, and converting between them is ~trivial. Rev and 3Play both support this. The big benefit is that you can use the same file for an HTML5 player as you would for a podcast app.

To be honest, it's frustrating that the spec isn't even formalized, yet changes cannot be made because folks have already integrated against it. Should this tag be marked as out-of-scope for phase 1, if no changes can be made against it? This ties back to my original concern that things are being rushed out the door without broad consensus from stakeholders.

@daveajones
Copy link
Contributor

@mattbasta Folks didn’t integrate transcripts against this spec. Transcripts was already being done in the wild. We integrated against it because it was prior art that was already working well. Rule #1 here has to be not to break existing interop. If it does, those people will just ignore the whole thing. Again, referring here to Rules for Standards Makers.

Isn’t adding support for WebVTT as an optional format a good solution? It seems very reasonable to me and doesn’t force anyone to change existing code.

@mattbasta
Copy link
Author

Whether the JSON support is removed from the spec is orthogonal to my main concern, which is broader: the spec should require support for a set of formats. Whether WebVTT is added is also orthogonal (it would be very nice to be able to have one transcript format that's used across all channels, but it would hardly be the first time we've had to integrate the same thing multiple times).

Requiring implementors to support a set of formats means that podcasts that use the tag will have their transcripts appear. If I publish my transcripts in SRT, but nobody actually ever went and implemented SRT support, my transcripts are as good as absent. And if the lowest common denominator is text/plain or text/html, then everyone will just use that, since it's guaranteed to work everywhere (despite being objectively less useful). This is ultimately the phenomenon that led to Adobe having dominance over web video for nearly two decades: it was the lowest common denominator.

Building on that, my followup ask was to set that minimum standard to be obvious well-defined choices (WebVTT being the one that immediately came to mind). Assuming it's too late to remove a bespoke JSON format, I don't have extremely strong feelings about WebVTT inclusion (though it would be very nice).

@tomrossi7
Copy link
Contributor

The spec as it is written supports WebVTT. Neither SRT or WebVTT have the fidelity that you can achieve with JSON, thats why its included as well. If you don't like it, don't use it. Developers that want to accomplish more than whats possible with lower fidelity transcripts can continue to innovate.

@mattbasta
Copy link
Author

@tomrossi7 This is my point. The spec should require JSON support (and every other noted format), if that's considered the best. As a podcaster, I'm going to choose the format that works with the most apps. If every app supports HTML because they can plop it on a webview and call it a day, I'm going to always choose HTML because it has the biggest reach. From my perspective as a podcaster, if only one or two apps support "the good stuff," I'm not going to choose the good stuff because compatibility will always outweigh cool good features.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants