Proposal: VAST for dynamic ad insertion (that doesn't break timestamps) #556

ryan-lp · 2023-07-11T14:41:40Z

ryan-lp
Jul 11, 2023

Background

Dynamic ad insertion results in a well-known problem for Podcasting 2.0: it invalidates the timestamps in transcript and chapter files. A common approach to dynamic ad insertion is to serve the stitched audio file and the corresponding stitched transcript and chapters files at the same time, so that a transcripts or chapters file is only valid for the audio file that was downloaded at the same time.

However, this approach does not work with timestamp-based search engines which download and index transcript files during the crawl stage, and then serve search results at some later point in time when those timestamps will no longer match with the audio. More generally, this approach presents a problem for any Podcasting 2.0 application that needs to do one-time processing of transcript or chapter files, typically in cases where that processing is large scale or intensive.

Proposal

The VAST specification: https://www.iab.com/wp-content/uploads/2015/06/VASTv3_0.pdf

VAST is a standard for client-side ad insertion, providing cues for ad insertion points, and an endpoint for dynamically fetching the ad to insert. The response from the ad server is an XML document that specifies the type of ad, one or more media files that constitute the ad to be inserted, and other elements that are specific to video ads (such as instructions on what to do when the user clicks on the ad). We can use just the subset that is relevant to audio ads, which includes the insertion times, the ad durations, and the media files associated with the ad, which we can use to include 1) the audio file for the ad, and 2) the transcript file for the ad.

VAST has open source implementations for different client side platforms lending themselves well to adaptation to Podcasting 2.0:

iOS: https://github.com/denivip/ios-vast-player
Android: https://exoplayer.dev/
Web: https://github.com/minznerjosh/vast-player

The key benefit of VAST is that it preserves timestamps regardless of what ad duration is inserted.

Companies currently offering dynamic ad insertion on the server side may naturally be cautious about whether they can trust apps to implement this. However:

Having a spec for client-side ad insertion does not preclude companies from using a server-side solution if they prefer.
If the right incentives are in place (value splits or revenue sharing), clients may be more than happy to implement this on the client side.

jamescridland · 2023-07-11T16:24:19Z

jamescridland
Jul 11, 2023

I really like the idea of looking at a way to get transcripts to work with DAI.

VAST is already used by the podcast industry when stitching ads together, so its use is rather elegant here.

I guess two questions:

What is to stop a podcast app developer making an “ad-free podcast app” by using the VAST tags but not calling the ads?
What happens if the VAST tags are not supported by the app?

Not so long ago, RedCircle’s Mike Kadin was on the Podcasting 2.0 podcast ep 133 with I seem to remember a pretty good solution for it. Here it is…

Yeah. So I think, going back to what you said before, right, each individual download where the listener hits play is unique to that specific listener at that time, right for that episode, right. And that is unique to them. But the transcript URL and the chapter URL is at the feed level right? Now, like I said, before, we do like store or cache, a configuration that says like, we serve this IP address and user agent, this audio file at this time, so that if they come back five minutes later and download the second half of it, everything remains in sync, right? So we do have some concept, it's temporary and time of like, you know, if you asked me like, What did i What did I stitch together? In this episode from this IP address five minutes ago? I can answer that question. Right. So theoretically, right, then, like, if you asked me for a transcript, I can use that same configuration and stitch the transcript together, right? Same thing for the chapters, right. So I think the only thing that's missing with our current setup is just kind of like a specification that details how those URLs for the transcript and the chapters should be requested. They have to be from the client, they have to be close within time for when the download was requested. And they have to be, you know, from the same IP address, same user agent, everything else is same. So we can identify that and say, Alright, that's the same thing that just downloaded this podcast five minutes ago, now I can give you the transcript. If they download the transcript on the server side, that's not going to work. It's going to come from the server's IP, if they download the transcripts, like when the RSS feed updates, but then the listener downloads the podcast two days later, those are going to be out of sync, right. So like, either that's one option is to just kind of specify what the client's behavior should be with respect to downloading those things. Right? That's one option. I don't like it as much, but that's an easier option.

Dave Jones
Okay. And Nathan geth, Nathan, Catherine has boosted in a question. So I'll let you finish your train of thought.

Mike Kadin
Yeah. So that's, that's like, you know, one option, right. And then the other option, which I think some folks have proposed on Macedon, Twitter and stuff is, is to put the URL to that unique version of the episodes transcript and that unique version of the episodes, version of the chapters in headers that come back with the download, right? So you can say, give me that mp3, we give you back the mp3 content, but in the headers is a link, here's where you can get the transcript for this, here's where you can get the chapter file for this, that would tie it specific to that specific download, and make it a lot easier to keep everything in sync. So that's the gist. It's like a disconnect between per download stitching, but per feed URLs for the other things.

Dave Jones
Okay. All right. I got I got I got it now. So So here, so let's talk about a spec, then let's Let's hammer this hammer this out for a second. So there, Nathan's Nathan's thing, and I've got another idea here. So Nathan, Nathan said, why Kelsey would just love the discussion around da triggers, if apps were supporting Uli days, then hosts could provide the same audio and transcripts to match. So yeah, that is that that goes back to sort of like your client. Client level specification. Yeah,

Mike Kadin
that's okay. If I think that solution is good. The idea behind the URL ad is like, I'm uniquely identifying like this, this app for this download, right? And that way I can distract it over users.

Dave Jones
Yeah, each user so every, just to rehash it, every user down every download, generates a unique URL ID, right. And if the app ever downloads that same enclosure, again, it won't use the same URL ID otherwise it always generates a new one.

Mike Kadin
if that would work, that would work. Yeah, it also is one solution. I think that header one is better. But I think that can totally work that basically, we need some way to know that the person who downloaded this thing is coming back, right for either to download the episode again, or to download the files that have this metadata.

Dave Jones
So a Uli D solution, would that cause you to have to save state

Mike Kadin
that we saved the state anyway, I think we would just have to save the URL ID and build an index off it too. I think that'd be okay. We'd have to make sure that like, you know, I think that'd be okay.

Dave Jones
Because Because the other option here is in is, if there was some sort of, if there was a way to describe the breakpoints, and the length, the Start Times and the durations within some sort of like string or something. Then when the when the enclosure is delivered, you deliver a header that says, here's where the breakpoints were. Like, you're you're you're you're delivering with the enclosure, also an extra header that says, here's a list of all the breakpoints and how long they lasted. And that way, then whenever you download any transcript, it's going to if the if the client saves that, save the saves the breakpoints in there, when it goes back later, it doesn't you can just get the transcript ID and then it can look up and say okay, when I downloaded this thing, it had a breakpoint starting it at you know, at second this x y&z and they lasted this long.

Mike Kadin
My only thing with that is that I don't always know how long the breaks are going to be in advance

Dave Jones
that you stitch the mp3 together, so you wouldn't know. Right?

Mike Kadin
I know at the time when I give it to you. But But I guess what I'm saying is across downloads, I don't know. Right? So where are you suggesting I put that, that brake information.

Dave Jones
So that would be like when the client, let's just say that it's cast ematic When cast ematic requests the download, yep, to download the enclosure, you return it, you stitch the mp3 together and then return a header with the with the with, with the breakpoints in it.

Mike Kadin
I mean, that can also work. But I think if we're going to do the headers, we might as well just give you a URL where you can go fetch the transcripts specifically for it. You know, I will say as the ad guy that like a lot of AD, people will not be pumped to tell the client where the breaks are gonna be mixed feelings about that from add people that might help you skip them. Which, you know, there are pros and cons to obviously a lot of pros for the listener. But But yeah, you know, that's what you're saying would would work as well. I think it's easier to just let us stitch the transcript of the chapter is for you, based on on what we know about that download? Right?

Dave Jones
Okay, so that that would be that would be a Uli D type solution and or what you said like the head, the header? I mean, the way in true 2.0 fashion, I mean, the way this probably would work is you just you do you'd start doing the thing that makes the most sense. And then we'll, we'll just all look at it and say and, you know, hash it out. Play it. If you say if you think you have a good solution, start delivering that solution, if it's not too much trouble, and then we'll all start playing with it. It is trouble,

Mike Kadin
but I think but I think I think I'm interested in it. And I think that the the like header based solution is one that like, as I heard from a few other folks that do dynamic insertion recently, they seem to think that was like best. So yeah, I would love to mess around with it. Right now. We let you upload a transcript, but we don't like parse it and mess around with it. So we'd have some serious work to do. But I'm definitely interested in doing because I think transcripts are a big important new thing for podcasts that have right and a good example of where you guys are doing the innovation while other guys are not right.

4 replies

ryan-lp Jul 12, 2023
Author

I guess two questions:

What is to stop a podcast app developer making an “ad-free podcast app” by using the VAST tags but not calling the ads?

This is sort of what I attempted to address with my last two points: Like the value block, it's a voluntary system based on trust and incentives. You could block someone who violates trust with the block tag, although the block tag is also based on trust (it's easy to get around).

I would add that a transcript-based search engine is actually a legitimate example of an ad-free podcast app. It doesn't itself "play" the audio or the ads, it only finds search results and links you to other players at specific timestamps.

At the end of the day, it is always going to be a trust model (much like robots.txt) because an app that wants to skip ads will always be able to find a way to do that. Attempts to obfuscate the ad insertion points by doing it server-side are literally that: obfuscation, but not security.

I was thinking recently that in order to drive this point home, I should publish an open source library for detecting ad insertion points for legitimate apps that depend on these. If bad actors also want to depend on these, that would still not be a reason to undermine the legitimate use cases.

However, it would be much better if hosting companies supported these legitimate client use cases properly. As it stands, legitimate apps that are sensitive to timestamp changes would have no option other than to skip over the ads in order to stick only to those regions that we have valid transcripts and timestamps for. If the hosting companies provided an option for VAST-compliant apps, it would no longer be necessary for all such apps to skip the ads.

What happens if the VAST tags are not supported by the app?

Section 1.1.3 of VAST 4.0 mentions server-side stitching as a fallback option.

Although I was unable to find details on how the client/server shake hands on which approach to use.

If none were specified, we could implement something like this:

The client includes a header in the GET request to indicate that it is VAST compliant. If that header is not present in the request, the server serves a stitched audio file with the ads dynamically inserted on the server side. So the default behaviour would be the current behaviour.

Not so long ago, RedCircle’s Mike Kadin was on the Podcasting 2.0 podcast ep 133 with I seem to remember a pretty good solution for it. Here it is…

Unfortunately that approach doesn't actually solve the use case of timestamp-based transcript search engines. Remember that a search engine doesn't actually play the audio or the ads, it merely presents search results and links you to other players to play the search result at the indexed timestamp. So the timestamps will always be invalidated by an approach that gives you a different response from the same GET request.

What troubles me about this from an HTTP perspective is that it's not idempotent. So while in theory, you could add a response header to a GET response that tells you which transcript corresponds to the audio you're getting this time, the whole thing is not idempotent, and it means that the GET response that search engine receives will be useless when trying to serve up search results.

The other idea mentioned was having a unique URL for each version of the dynamically stitched file. This would be an example of a traditional use of HTTP, since a GET request for each URL would be idempotent. However, the dynamic part of it would just be pushed to another server-side step and idempotency would be broken there. For example, the URL listed in the feed might be dynamically changed, or it might be a redirect that dynamically redirects to a different audio file each time. In either case, the search engine will not be able to download and index a moving target.

jamescridland Jul 14, 2023

I think the idea of using either the ULID or an HTTP header is that it is idempotent - if you send the same ULID or the same HTTP header to the transcript engine, you should get the same transcript back again that corresponds to the audio.

For DAI-delivered audio, the desire is that a listener continues to get relevant ads, but that the transcript deals with the fact that I might hear an initial 30" preroll while other listeners might get a 60" preroll. So, I'm not sure where a transcript search engine fits here. It'll always be transcribing the version of the audio which is given to the search engine. That may well be different to the version of the audio that a typical listener gets. We probably don't want to break that.

ryan-lp Jul 14, 2023
Author

I think the idea of using either the ULID or an HTTP header is that it is idempotent - if you send the same ULID or the same HTTP header to the transcript engine, you should get the same transcript back again that corresponds to the audio.

Subsequent requests are idempotent, but the initial request is not. That's what prevents a timestamp-based transcript search engine from providing other listeners a timestamped link to a podcast episode, and it's also what prevents social podcast apps from sharing timestamped links with other listeners. The lack of idempotency on the initial request is what prevents these types of beneficial use cases.

For DAI-delivered audio, the desire is that a listener continues to get relevant ads, but that the transcript deals with the fact that I might hear an initial 30" preroll while other listeners might get a 60" preroll. So, I'm not sure where a transcript search engine fits here.

Exactly. Ads are intended for the listener's consumption and not for the search engine's consumption, but that is exactly my point. Individual listeners are not the only consumers here. Apps also consume the timestamps and share them with other listeners.

Now there is another very simple solution here, which is that the search engine does not index any podcast that uses dynamic ad insertion, and apps do not permit clip sharing for any podcast that uses dynamic ad insertion. However, I would think that podcasters themselves would really want these things in order to boost the discoverability and virality of their podcasts.

So it really comes down to the question of whether apps take advantage of this and skip ads. Well, we can already skip ads, can't we? Users can press the skip-forward button, and there are also automated ways to detect and skip ads. Another interesting point here is that if we allow an app to declare that they are VAST-compliant, it would actually be possible for an app to implement ads in such a way that you could not skip the ads, or where you need to listen to at least X seconds of the ad before you can skip it. VAST-compliant apps could also take advantage of the ability to click and follow the ad link to whatever promotion deal it is. VAST apps can also handle the case where if a user does click on the seek bar to skip over an ad point, the ad will still be forcefully inserted before jumping to that seek point (see the YouTube player for an idea of how all of these user experiences work). So there are benefits as well to the advertiser that might be overlooked. With the proposal as it is, the default behaviour would be to operate the way it currently does with sever-side stitching, but provide a handshake for VAST-compliant apps to provide a richer client-side experience and support all of these interesting use cases that involve sharing timestamps with other listeners.

ryan-lp Jul 28, 2023
Author

Not so long ago, RedCircle’s Mike Kadin was on the Podcasting 2.0 podcast ep 133 with I seem to remember a pretty good solution for it. Here it is…

If only you had shared a timestamped link to the relevant part of this 2 hour long episode: link

If ever there were a use case for timestamp sharing, here it is.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: VAST for dynamic ad insertion (that doesn't break timestamps) #556

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Proposal: VAST for dynamic ad insertion (that doesn't break timestamps) #556

ryan-lp Jul 11, 2023

Background

Proposal

Replies: 1 comment · 4 replies

jamescridland Jul 11, 2023

ryan-lp Jul 12, 2023 Author

jamescridland Jul 14, 2023

ryan-lp Jul 14, 2023 Author

ryan-lp Jul 28, 2023 Author

ryan-lp
Jul 11, 2023

Replies: 1 comment 4 replies

jamescridland
Jul 11, 2023

ryan-lp Jul 12, 2023
Author

ryan-lp Jul 14, 2023
Author

ryan-lp Jul 28, 2023
Author