Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update transcripts.md #673

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
215 changes: 113 additions & 102 deletions transcripts/transcripts.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,97 +6,78 @@ Some transcript implementations are done in-browser. [CORS headers](https://deve
are required to make these files available from other websites. [A CORS tester is available here](https://cors-test.codehappy.dev/),
to ensure that transcripts are available within browser-based players.

The examples given below are just for convenience. In production you should ensure you are conforming to the actual
spec for each format as defined in it's own documentation.
* **Want to support only one format?** WebVTT is used by Apple Podcasts for ingest, and also natively supported by web browsers. Because the WebVTT format is the most flexible, it's an ideal choice if you can only support one format.

<br><br>
The examples given below are just for convenience. In production you should ensure you are conforming to the actual spec for each format as defined in its own documentation.

## WebVTT

## HTML
The [Web Video Text Tracks Format (WebVTT)](https://www.w3.org/TR/webvtt1/) is designed for use in HTML on the web. You can use the [<track> element](https://developer.mozilla.org/en-US/docs/Web/HTML/Element/track) in your own web-based players to make closed-captions appear on a web-page.

The HTML transcript format provides a solution when a transcript is available but no or limited timecode data is available. HTML transcript files are considered low-fidelity and are
designed to serve as an accessibility aid and provide searchable episode content. The HTML format used for podcast transcripts should adhere to the following specifications.
A VTT file contains medium-fidelity timestamps. It differs from the SRT format (below) because you can optionally add speaker names, including them in a voice span tag `<v>` at the beginning of each caption when they change, as in the snippet below. Apple Podcasts supports these speaker names, and will ingest them into its transcript tool.

#### HTML tags used:
- `<cite>`: Name of the speaker (if available)
- `<time>`: Start time of monologue (if available)
- `<p>`: Content of monologue
While there is no defined maximum line-length, to ensure that displaying WebVTT as closed-captions can work well, a maximum line-length of 65 characters is recommended. If you're using whisper-cpp or equivalent, `--split-on-word --max-len 65` would be a method of achieving this.

The full specification includes formatting features; these are typically not used in podcasting applications.

#### Snippet:
```html
<cite>Kevin:</cite>
<time>0:00</time>
<p>We have an update planned where we would like to give the ability to upload an artwork file for these videos</p>
<cite>Alban :</cite>
<time>0:09</time>
<p>You're triggering Tom right now with a hey, here's a cool feature.</p>
```
WEBVTT

Example file: [example.html](example.html)
00:00:00.000 --> 00:00:05.000
<v John>Podcasting 2.0 is really changing the game.

<br><br>
00:00:05.000 --> 00:00:10.000
<v Tom>Yeah, absolutely. The new features are incredible.

00:00:10.000 --> 00:00:15.000
It's amazing how it's empowering creators like never before.

## JSON
00:00:15.000 --> 00:00:20.000
And the enhanced monetization options are a game-changer.

The JSON representation is a flexible format that accomodates various degrees of fidelity in a concise way. This format for podcast transcripts should adhere to the following specifications.
00:00:20.000 --> 00:00:25.000
<v John>Exactly, Tom. It's revolutionizing the industry.

#### Elements included in this representation:
- `<version>`: The version of JSON transcript specification
- `<segments>`: An array of dialogue elements (segments)
- `<speaker>`: Speaker
- `<startTime>`: Start time for the segment
- `<endTime>`: End time for the segment (if available)
- `<body>`: Dialogue content
00:00:25.000 --> 00:00:30.000
<v Tom>No doubt about it. Podcasting 2.0 is the future.

#### Snippet:
```json
{
"version": "1.0.0",
"segments": [
{
"speaker": "Darth Vader",
"startTime": 0.5,
"endTime": 0.75,
"body": "I"
},
{
"speaker": "Darth Vader",
"startTime": 1,
"endTime": 1.25,
"body": "am"
},
{
"speaker": "Darth Vader",
"startTime": 1.5,
"endTime": 2.0,
"body": "your"
},
{
"speaker": "Darth Vader",
"startTime": 2.25,
"endTime": 2.50,
"body": "father.\n"
},
{
"speaker": "Luke",
"startTime": 2.75,
"endTime": 3.0,
"body": "Nooooo"
}
]
}
00:00:30.000 --> 00:00:35.000
<v John>Couldn't agree more, Tom. The future looks bright.
```

Example file: [example.json](example.json)
Example file: [example.vtt](example.vtt)

<br><br>
#### Web browser support example

This example code will add an audio player on a web page, and display the accompanying WebVTT file as the audio plays. (Note that this basic code will not show speaker names).

```
<div style="height:111px;text-align:center;">
<audio id="vttplayer" controls preroll="none" src="https://podnews.net/audio/podnews240125.mp3?_from=P20spec">
<track default src="https://podnews.net/audio/podnews240125.mp3.vtt">
</track>
</audio>
<br>
<div style="text-align:center;" id="vtt">
</div>
</div>

<script>
document.getElementById('vttplayer').textTracks[0].addEventListener('cuechange', function() {
document.getElementById('vtt').innerText = this.activeCues[0].text;
});
</script>
```

<br><br>

## SRT

The SRT format was designed for video captions but provides a suitable solution for podcast transcripts. The SRT format contains medium-fidelity timestamps and are a
popular export option from transcription services. SRT transcripts used for podcasts should adhere to the following specifications.
popular export option from transcription services. An SRT file can be generated programmatically from a VTT file (and vice-versa).

SRT transcripts used for podcasts should adhere to the following specifications:

#### Properties:
- Max number of lines: 2
Expand Down Expand Up @@ -144,50 +125,80 @@ do we need a podcast trailer?
Example file: [example.srt](example.srt)


## WebVTT
## JSON

Web Video Text Tracks Format (WebVTT) are an alternative to SRT primarily designed for the use in HTML on the web. It is supported in all major web browsers and is similar enough to SRT to be converted.

### Differences from SRT taken from [Wikipedia](https://en.wikipedia.org/wiki/WebVTT):
- WebVTT's first line starts with WEBVTT after the optional UTF-8 byte order mark
- There is space for optional header data between the first line and the first cue
- Timecode fractional values are separated by a full stop instead of a comma
- Timecode hours are optional
- The frame numbering/identification preceding the timecode is optional
- Comments identified by the word NOTE can be added
- Metadata information can be added in a JSON-style format
- Chapter information can be optionally specified
- Only supports extended characters as UTF-8
- CSS in a separate file defined in the companion HTML document for C tags is used instead of the FONT tag
- Cue settings allow the customization of cue positioning on the video
The JSON representation is a flexible format that accomodates various degrees of fidelity in a concise way. At the most precise, it enables word-by-word highlighting. This format for podcast transcripts should adhere to the following specifications.

#### Properties:
- Speaker names (optional): Speakers can be included in a voice span tag `<v>` at the beginning of each caption.
#### Elements included in this representation:
- `<version>`: The version of JSON transcript specification
- `<segments>`: An array of dialogue elements (segments)
- `<speaker>`: Speaker
- `<startTime>`: Start time for the segment
- `<endTime>`: End time for the segment (if available)
- `<body>`: Dialogue content

#### Snippet:
```json
{
"version": "1.0.0",
"segments": [
{
"speaker": "Darth Vader",
"startTime": 0.5,
"endTime": 0.75,
"body": "I"
},
{
"speaker": "Darth Vader",
"startTime": 1,
"endTime": 1.25,
"body": "am"
},
{
"speaker": "Darth Vader",
"startTime": 1.5,
"endTime": 2.0,
"body": "your"
},
{
"speaker": "Darth Vader",
"startTime": 2.25,
"endTime": 2.50,
"body": "father.\n"
},
{
"speaker": "Luke",
"startTime": 2.75,
"endTime": 3.0,
"body": "Nooooo"
}
]
}
```
WEBVTT

00:00:00.000 --> 00:00:05.000
<v John>Podcasting 2.0 is really changing the game.

00:00:05.000 --> 00:00:10.000
<v Tom>Yeah, absolutely. The new features are incredible.
Example file: [example.json](example.json)

00:00:10.000 --> 00:00:15.000
<v Tom>It's amazing how it's empowering creators like never before.
<br><br>

00:00:15.000 --> 00:00:20.000
<v Tom>And the enhanced monetization options are a game-changer.
## HTML

00:00:20.000 --> 00:00:25.000
<v John>Exactly, Tom. It's revolutionizing the industry.
The HTML transcript format provides a solution when a transcript is available but no or limited timecode data is available. HTML transcript files are considered low-fidelity and are designed to serve as an accessibility aid and provide searchable episode content. The HTML format used for podcast transcripts should adhere to the following specifications.

00:00:25.000 --> 00:00:30.000
<v Tom>No doubt about it. Podcasting 2.0 is the future.
#### HTML tags used:
- `<cite>`: Name of the speaker (if available)
- `<time>`: Start time of monologue (if available)
- `<p>`: Content of monologue

00:00:30.000 --> 00:00:35.000
<v John>Couldn't agree more, Tom. The future looks bright.
#### Snippet:
```html
<cite>Kevin:</cite>
<time>0:00</time>
<p>We have an update planned where we would like to give the ability to upload an artwork file for these videos</p>
<cite>Alban :</cite>
<time>0:09</time>
<p>You're triggering Tom right now with a hey, here's a cool feature.</p>
```

Example file: [example.vtt](example.vtt)
Example file: [example.html](example.html)

<br><br>