[YouTube] Refactor player clients, add support for poTokens, extract visitor data from the service and more #1272

AudricV · 2025-02-01T15:02:26Z

I carefully read the contribution guidelines and agree to them.
I have tested the API against NewPipe.
I agree to create a pull request for NewPipe as soon as possible to make it compatible with the changed API. [YouTube] Add support for poTokens NewPipe#11955

This PR, which supersedes #1247, adds support for poTokens, refactor clients used and extract visitor data from the service itself, as generated visitor data do not work anymore to get valid player responses (rolling out in progress as of writing the first version of this PR description).

It also replaces age-restricted videos streams' fetching with the web embed player, which allows to play a very few age-restricted videos anonymously. This is the best we can do with what we can use (unless some method is still present and we haven't discovered it, but I have a lot of doubts).

TODO:

describe in a more detailed way changes in this PR?
add Javadocs
update mocks
add a poToken implementation for tests (in this PR?)

Fixes TeamNewPipe/NewPipe#11980.
Fixes TeamNewPipe/NewPipe#11803.
Fixes TeamNewPipe/NewPipe#11486.
Fixes partially TeamNewPipe/NewPipe#11382.

kitterion · 2025-02-03T22:40:56Z

...in/java/org/schabi/newpipe/extractor/services/youtube/extractors/YoutubeStreamExtractor.java

+    private static PoTokenProvider poTokenProvider;
+    private static boolean fetchIosClient;


I'm sorry for a sort of a drive-by comment, but is it possible to not make these static? For example, have them in YoutubeService and pass them along when getting the stream extractor? It's trivial to make them static downstream if needed, and in the typical usage, where the YoutubeService is already static, they will also be effectively static with something like ServiceList.YouTube.setPoTokenProvider(...).

My use case is that running the stream extractor is an expensive operation taking multiple seconds, so if I need to run multiple at the same time, I do it from multiple threads using different service and extractor instances.
It would be nice to not have to worry about synchronization here, reduce global state, and have the ability to have extractors with different providers.

I think we have static fields in various places of the extractor, so this issue is not limited to this PR. Please open a pull request to fix this across the whole extractor ;-).

My use case is that running the stream extractor is an expensive operation taking multiple seconds, so if I need to run multiple at the same time, I do it from multiple threads using different service and extractor instances.
It would be nice to not have to worry about synchronization here, reduce global state, and have the ability to have extractors with different providers.

I am not sure how your setup works, but can't you first set the static fields and then start multiple threads? Then if poTokenProvider is thread-safe (NewPipe's one is) everything that uses it should also be thread safe. Or am I missing something?

I think we have static fields in various places of the extractor, so this issue is not limited to this PR. Please open a pull request to fix this across the whole extractor ;-)

Sure, I'll do that :), I thought it was fine already for some reason. Gotta at least fix it for my own sanity.
However, this is part of a public facing api, and I'd like not to break people unnecessarily that will start to depend on this.

I am not sure how your setup works, but can't you first set the static fields and then start multiple threads? Then if poTokenProvider is thread-safe (NewPipe's one is) everything that uses it should also be thread safe. Or am I missing something?

True, it's not impossible to adapt, what I'm saying is that with this change I'm forced to make it thread-safe and I'm forced to have only one provider for the whole program. It seems unnecessary, I would like to have other options. Like having extractors with different providers or different settings could conceivably be useful in case some of them don't work.
A library probably shouldn't force these decisions onto the users. Besides, the users can always make a static provider if they wish.
If I'm reading it correctly, NewPipe does exactly that: makes its own static provider.

Could you help me understand your use-case more? Why would you need more than one provider for all of your extractors?

In case your PoTokenProvider isn't thread-safe, you could just have a field, and use a synchronized block for the unsafe activity. Would it be more helpful to have documentation somewhere for this?

Is it reasonable to want to switch fields like fetchIosClient or return values of PoTokenProvider methods at runtime? I feel like it could be because youtube requirements are constantly changing and it'd be nice to have multiple configurations available and switch between them at runtime if one doesn't work, but maybe I'm wrong.

If it's not reasonable, then one time initialization of read-only statics is perfectly fine. If it is, then for example there's currently no thread safe way to change the value of fetchIosClient without synchronizing calls to the extractor.

There's another issue that I spotted but I'm not sure if it's relevant: I don't see how to switch what the PoTokenProvider methods return in a consistent manner and without a race condition. Specifically, PoTokenProvider currently requires implementations to essentially return 4 values (some of them potentially null). There's no way to consistently switch them to another set of 4 values in a multi-threaded environment. The reason is that onFetchPage simply calls methods in order, and the values can change between calls even if I synchronize everything in a PoTokenProvider on my end.
If this is not an issue, still having to synchronize everything is not great. It's error prone, requires you to think about what you're doing.

I avoid having to think about this because in my use case threads don't really know about other threads, they use thread-local instances. That way there's really no synchronization, data races, inconsistencies, etc., it's very convenient. Though I'm told this is not currently safe and will need some fixing.
(My actual use case is a bit more ~~cursed~~ involved, java is not the main language, I call it through jni, even the threads are created outside of the jvm and then attached to it. So having to juggle multiple languages with threads in the mix, I really try to minimize the possibility of things that can go wrong.)

So if I were to summarize my approach here, it's: why worry about any of this if you can just avoid it altogether and write code as if it's single threaded.
It eliminates a whole class of errors, I'm a big fan.

(I really wrote a whole wall of text, huh)

@kitterion Thanks for your feedback and your use cases.

To make it short, I completely agree that such objects should not be static, but as the extractor doesn't have a way to pass options to extraction requests, that's really hard to do properly. I'd like to add this ability in an extractor rewrite from scratch in Kotlin on which I was working a long time ago for an new extractor API concept. I hope I will be able to advance my work on this and post my ideas as class diagrams.

I don't think we will change anything right now.

Don't worry about breaking public APIs, all of these methods are not stable, this PR introduces breaking changes 😆

FireMasterK · 2025-02-04T19:24:15Z

...in/java/org/schabi/newpipe/extractor/services/youtube/extractors/YoutubeStreamExtractor.java

+                                  @Nonnull final String videoId,
+                                  @Nullable final PoTokenProvider poTokenProviderInstance,


Why don't we simply do a null-check rather than having 2 fields for this? (in all the fetch methods)

It was to avoid recomputing the field nullability multiple times (also in a previous version when I was working on this, the fetchHtml5Client method content was directly in the onFetchPage method if I remember properly), but maybe this should be avoided.

I'm pretty sure passing one more parameter around is slower than redoing the null check. Actually, I would guess that if (o != null) is as fast as if (b), since both perform a memory load and then use the zero flag to determine whether to jump or not (assuming null is represented as 0).

It’s 100% irrelevant anyway, because the extractor is network IO bound

FireMasterK · 2025-02-04T19:45:23Z

extractor/src/main/java/org/schabi/newpipe/extractor/services/youtube/PoTokenProvider.java

+     * @return a {@link PoTokenResult} specific to the WEB InnerTube client
+     */
+    @Nullable
+    PoTokenResult getWebClientPoToken(String videoId);


Could we want to create a potoken with some other data other than the videoId?

https://github.com/LuanRT/BgUtils/blob/main/examples/node/innertube-challenge-fetcher-example.ts
Over here, for example, the pot url parameter is based on a 'session' PoToken based on the visitorData, while the the one sent to the player request is based on the videoId?

In that case, why does PoTokenResult need to have a visitorData?

Is having 2 different PoTokens the behavior on YouTube, or is what this PR does correct?

Is having 2 different PoTokens the behavior on YouTube, or is what this PR does correct?

YouTube desktop website uses indeed two poTokens: one minted with a visitorData, one with a videoId.

The visitorData must be the same for player requests and for the visitorData poToken, used in streaming URLs.

On embeds, that's currently not the case: it uses the old behavior of sending a visitorData poToken for player and streaming URLs requests.

As the extractor doesn't support login with a Google account, you only need to create poTokens from two properties: a visitorData and a videoId (for logged-in users, as written in BgUtils repo, it uses account's dataSyncId).

So maybe we should generate the visitorData at NPE and then pass it to the interface, which will be used by implementations. This also prevents clients to reuse the same visitorData poToken, which is good for privacy but not really for speed and resources consumption.

Also update client version and device info and add TVHTML5 client and WEB_EMBEDDED_PLAYER constants, these will be used in the future. TVHTML5_SIMPLY_EMBED_CLIENT_VERSION has been kept in its place as the corresponding client does not work anonymously anymore, so it will be removed soon.

This allows their internal usage in an upcoming new class to be used on other InnerTube clients than the WEB one.

This internal class, YoutubeStreamHelper, has to be public in order to be used by subpackages of the service's root one. It supports poTokens, HTML5 signatureTimestamp property, embed context and multiple InnerTube clients. It is meant to replace the corresponding methods in YoutubeParsingHelper, in order to reduce the class' size, code duplicates and improve its readability. It adds a new way to get age-restricted videos' streams, the only ones which are playable in YouTube embeds, which is not very common.

Also improve detection of age-restricted statuses in playability error messages returned by the service and provide version 7 of DASH manifests.

- Use POST requests with the same body as official clients do; - Update methods checking the client streaming URLs come from: - Replace TVHTML5_SIMPLY_EMBEDDED_PLAYER by TVHTML5; - Add WEB_EMBEDDED_PLAYER.

This new class, InnertubeClientRequestInfo, composed of two mutable subclasses, ClientInfo and DeviceInfo, allows to store client and device info in a better way, without requiring to pass more than 10 method parameters like in YoutubeStreamHelper currently. Mutability has been added in order to allow changing some fields easily, especially visitorData.

As YouTube is disabling ability to use a random visitor ID in a visitorData on player requests and BotGuard challenges, it shouldn't matter if we use a random one or not for other request types.

Some clients like TVHTML5 are not allowed on the visitor_id endpoint (with this client, a 400 HTTP response is returned with a precondition check failed error). Also disable pretty printing for these requests, like we do for others.

…some fixes visitorData are get using InnertubeClientRequestInfo and YoutubeParsingHelper.prepareJsonBuilder, which is replacing the corresponding method in YoutubeStreamHelper, removed in this commit. Also fix some bugs like JsonBuilder usages in some places and remove HLS manifest filtering for the iOS client, as we still use for now the full player response, in the case having streams requiring poTokens after some time as a last resort is useful.

This argument, which has been forgot, is required to get valid streaming URLs with this client.

This commits provides methods to get InnertubeClientRequestInfo instances, which can be used by extractor clients to get visitor data to pass to PoTokenProvider implementations using YoutubeParsingHelper. Ability to create custom instances has been removed, but returned objects can be modified. This is what YoutubeStreamHelper now uses to set the visitorData property.

Also make YoutubeParsingHelper.getOriginReferrerHeaders public, in order to be used by other extractor classes and improve the name of a parameter of YoutubeParsingHelper.getVisitorDataFromInnertube.

…creators Also use TVHTML5 user agent for requests from this client in these DASH manifests creators.

Stypox · 2025-02-05T09:02:45Z

I am merging this as it does work well (many users tested it in NewPipe) and we need to make a NewPipe release asap. I will publish the extractor release as a pre-release. Please create PRs to fix/improve/finish this PR and then we can do a proper release.

AudricV added bug Issue is related to a bug enhancement New feature or request ASAP Issue needs to be fixed as soon as possible youtube service, https://www.youtube.com/ labels Feb 1, 2025

Ales315 mentioned this pull request Feb 1, 2025

Error : Video 'xx_XXXxXxxX' is unplayable. Reason: 'Please sign in' Tyrrrz/YoutubeExplode#868

Closed

6 tasks

omansak mentioned this pull request Feb 2, 2025

Sign in to confirm you're not a bot omansak/libvideo#298

Closed

kitterion reviewed Feb 3, 2025

View reviewed changes

FireMasterK reviewed Feb 4, 2025

View reviewed changes

AudricV added 17 commits February 5, 2025 09:56

[YouTube] Add an interface and a class to fetch and provide poTokens

3691fc2

[YouTube] Make client and origin HTTP headers methods package-private

1df0267

This allows their internal usage in an upcoming new class to be used on other InnerTube clients than the WEB one.

[YouTube] Add support for poTokens, refactor player clients' fetching

3878696

Also improve detection of age-restricted statuses in playability error messages returned by the service and provide version 7 of DASH manifests.

[YouTube] Remove unused methods and constants in YoutubeParsingHelper

38e2b67

[YouTube] Update DASH manifest creation clients' handling

9333d7f

- Use POST requests with the same body as official clients do; - Update methods checking the client streaming URLs come from: - Replace TVHTML5_SIMPLY_EMBEDDED_PLAYER by TVHTML5; - Add WEB_EMBEDDED_PLAYER.

[YouTube] Do not send a visitorData for every InnerTube request

9e45c80

As YouTube is disabling ability to use a random visitor ID in a visitorData on player requests and BotGuard challenges, it shouldn't matter if we use a random one or not for other request types.

[YouTube] Add ability to get a visitorData from InnerTube

6533a33

[YouTube] Add IOS and ANDROID client IDs in ClientsConstants

862a607

[YouTube] Add signatureTimestamp argument to TVHTML5 client requests

4644e17

This argument, which has been forgot, is required to get valid streaming URLs with this client.

[YouTube] Add ability to get TVHTML5 user agent used

c48d449

Also make YoutubeParsingHelper.getOriginReferrerHeaders public, in order to be used by other extractor classes and improve the name of a parameter of YoutubeParsingHelper.getVisitorDataFromInnertube.

[YouTube] Fix usage of WEB client headers for all HTML5 URLs in DASH …

96911ae

…creators Also use TVHTML5 user agent for requests from this client in these DASH manifests creators.

Stypox force-pushed the yt_clients_changes_and_potokens_support branch from 15e35a2 to 96911ae Compare February 5, 2025 08:57

Stypox merged commit fe168ab into TeamNewPipe:dev Feb 5, 2025
3 of 4 checks passed

AudricV deleted the yt_clients_changes_and_potokens_support branch February 5, 2025 09:14

zbee mentioned this pull request Feb 5, 2025

IOS player response is not valid libre-tube/LibreTube#6982

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[YouTube] Refactor player clients, add support for poTokens, extract visitor data from the service and more #1272

[YouTube] Refactor player clients, add support for poTokens, extract visitor data from the service and more #1272

AudricV commented Feb 1, 2025 •

edited

Loading

kitterion Feb 3, 2025

Stypox Feb 4, 2025

kitterion Feb 4, 2025

FireMasterK Feb 4, 2025

kitterion Feb 4, 2025 •

edited

Loading

AudricV Feb 5, 2025

FireMasterK Feb 4, 2025

AudricV Feb 5, 2025

Stypox Feb 5, 2025

Profpatsch Feb 5, 2025

FireMasterK Feb 4, 2025

AudricV Feb 5, 2025

Stypox commented Feb 5, 2025

		private static PoTokenProvider poTokenProvider;
		private static boolean fetchIosClient;

		@Nonnull final String videoId,
		@Nullable final PoTokenProvider poTokenProviderInstance,

[YouTube] Refactor player clients, add support for poTokens, extract visitor data from the service and more #1272

[YouTube] Refactor player clients, add support for poTokens, extract visitor data from the service and more #1272

Conversation

AudricV commented Feb 1, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kitterion Feb 4, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Stypox commented Feb 5, 2025

AudricV commented Feb 1, 2025 •

edited

Loading

kitterion Feb 4, 2025 •

edited

Loading