Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guided reclock interface #5

Open
alexmarsev opened this issue Jul 26, 2015 · 41 comments
Open

Guided reclock interface #5

alexmarsev opened this issue Jul 26, 2015 · 41 comments
Assignees
Milestone

Comments

@alexmarsev
Copy link
Owner

Design and implement.

@alexmarsev alexmarsev self-assigned this Jul 26, 2015
@alexmarsev alexmarsev added this to the 0.2 milestone Jul 26, 2015
@alexmarsev alexmarsev modified the milestones: 0.3, 0.2 Aug 2, 2015
@alexmarsev
Copy link
Owner Author

First draft

// This interface is queried from graph clock, preferably in IMediaFilter::SetSyncSource()
IGuidedReclockDraft : IUnknown
{
    // Returns the average amount of adjustment per second below which audio renderer is
    // guaranteed to use high quality sample rate conversion instead of time stretching.
    // Usually, audio renderer allows short bursts of larger adjustments before
    // switching to time stretching.
    //
    // The value is guaranteed to stay the same until a new graph clock is
    // set by IMediaFilter::SetSyncSource()
    STDMETHOD(GetTimestretchingThreshold)(REFERENCE_TIME* pThreshold) = 0;

    // Instantly offsets graph clock by requested amount. When the method returns, the clock
    // is guaranteed to be adjusted already. Audio renderer is responsible for smoothing out
    // the values and catches up to adjustment after some time.
    STDMETHOD(OffsetClock)(REFERENCE_TIME offset) = 0;
};

@alexmarsev
Copy link
Owner Author

@zachsaw This is what I meant by "new public interface". Will it be useful? Do video renderers need something more?

@zachsaw
Copy link

zachsaw commented Aug 5, 2015

This is perfect! Very very useful and won't take much time to add to MPDN at all!

BTW, would it be possible to do pitch shift via the same interface too? For example, someone might want to correct PAL speedup by lowering framerate along pitch.

@zachsaw
Copy link

zachsaw commented Aug 5, 2015

Actually, thinking of this a bit more, I think OffsetClock needs to be against system clock (i.e. QPC) and not relative to the default reference clock derived from audio card. There's no way for a video renderer to know what the exact drift is to make a correction until it's gone about 4 minutes into the playback without pausing.

I believe to make this useful without user having to wait 4 minutes to fine tune each media type of different FPS, offset needs to be made relative to QPC clock - this is how reclock does it too. With reclock, you can see MPDN's refclk deviation is always 0%. The only trouble is, there's no way to change this refclk deviation via an interface. Even if it's relative to system time, a video renderer still has to figure out what the actual FPS is of the source before setting the offset. Once it knows what the actual FPS is, it's quite easy then for the video renderer to change the refclk deviation so it speeds up or slows down to match FPS to display refresh rate.

@zachsaw
Copy link

zachsaw commented Aug 5, 2015

BTW this would not work if we're bitstreaming right?

@alexmarsev
Copy link
Owner Author

BTW, would it be possible to do pitch shift via the same interface too? For example, someone might want to correct PAL speedup by lowering framerate along pitch.

0.1% difference in pitch is not perceivable. I think it makes more sense to just slave to monitor rate, and if the drift is low enough audio renderer will adjust with pitch (not time stretching).

BTW this would not work if we're bitstreaming right?

Yes, I can't do much there. Reclock and bitstreaming just can't work together. Decoding and re-encoding is not really an option.

Actually, thinking of this a bit more, I think OffsetClock needs to be against system clock (i.e. QPC) and not relative to the default reference clock derived from audio card.

Resolution of graph clock is already the same as QPC. I receive audio position in conjunction with QPC for that position, then "overclock"/extrapolate with current QPC value.

IGuidedReclock::OffsetClock() is intended to be called repeatedly. If graph clock drifts slightly, video renderer just corrects it repeatedly. Like, at every frame. I don't really see why video renderer can't assume in the beginning that graph clock will be the same as QPC, make initial decisions based on that and then correct the clock later?

@zachsaw
Copy link

zachsaw commented Aug 5, 2015

It's definitely possible for the video renderer to do all of that but wouldn't it be better if it's implemented in the audio renderer then it saves all of us having to write our own version of it?

IGuidedReclock::SlaveToSystemClock()
IGuidedReclock::OffsetClock() repeatedly as refresh rate gets detected to higher precision.

If you don't provide us with the SlaveToSystemClock() method, we'll just be doing exactly that before calling OffsetClock().

@alexmarsev
Copy link
Owner Author

I thought video renderer could drop clock smoothing entirely with IGuidedReclock. The amount of drift is abysmal and can't affect immediate presentation logic.

@alexmarsev
Copy link
Owner Author

I just need to be sure we're not complicating the interface needlessly.

@zachsaw
Copy link

zachsaw commented Aug 5, 2015

Yes that's the goal, but in order to do that, we'll need to call OffsetClock with a value to correct the drift. But what is the drift? We need something to detect the drift first don't we? Otherwise what value do we call OffsetClock with?

@zachsaw
Copy link

zachsaw commented Aug 5, 2015

The goal is to correct the ref clock so it changes the playback rate to make it match display refresh rate, isn't it?

If so, since display refresh rate is always measured relative to system clock (QPC), it's just natural for video renderers to first correct the audio drift continually to make it stay as close to system clock as possible, then apply the difference to make it match display refresh rate (calculation wise, not literally calling OffsetClock in two steps of course).

All I'm saying is, if you make it slave to system clock to begin with, then all we have to do for video renderers is to apply that final difference.

@zachsaw
Copy link

zachsaw commented Aug 5, 2015

Hmm just so I understand what your original thoughts were, could you let me know how you'd expect OffsetClock to be called so we could replicate what reclock does in its dormant mode (i.e. it simply slaves to system clock)?

@alexmarsev
Copy link
Owner Author

You have current graph time and current QPC.
You have some first graph time and corresponding QPC.
Substract, adjust, everything is simple.

But, just thought of it, I can't warp graph clock backwards, msdn says so. For backwards adjustments graph clock will stay at the same value for some time. So we need IGuidedReclock::GetTime() that can warp backwards.

@zachsaw
Copy link

zachsaw commented Aug 5, 2015

Yes we're on the same page as far as correcting drift is concerned then. However, all I'm asking is you do the subtract and adjust internally - you have current graph time and current QPC too, so why make the video renderer / player do it?

I think there's no need for backward adjustments is there? Negative offset only slows down the clock, you should never make it to go backwards...

@alexmarsev
Copy link
Owner Author

why make the video renderer / player do it?

It will add a layer of hidden adjustment invisible to video renderer. And will limit the usefulness of IGuidedReclock::GetTimestretchingThreshold() method if video renderer decides to evade time stretching at all costs (it has much worse audio quality than correcting with pitch).

I think there's no need for backward adjustments is there?

I was thinking about the situation when movie and display rates are the same, but frames are slightly misaligned. When display frame comes just a little before movie frame, I thought it would make more sense to do a little backwards adjustment than adjusting forward. Especially for ~24p content on ~24p displays .

@alexmarsev
Copy link
Owner Author

And these hidden adjustments will be quite large. There's a delay between when I call Start() in wasapi and it actually starts to consume samples. Also audio device may be changed/reinitialized during playback complicating things further. Bottom line - I can't simply slave to system clock without significant amount of adjustment.

@alexmarsev
Copy link
Owner Author

Though I may return lesser value in IGuidedReclock::GetTimestretchingThreshold() (let's say by 0.1%) and use it for hidden slaving to system clock. With 1ms per 1s I should be able to do it. Does that sound good to you?

@zachsaw
Copy link

zachsaw commented Aug 6, 2015

Though I may return lesser value in IGuidedReclock::GetTimestretchingThreshold() (let's say by 0.1%) and use it for hidden slaving to system clock. With 1ms per 1s I should be able to do it. Does that sound good to you?

Yes. I think I haven't seen a refclk deviation more than 0.05% with any systems so it should be save to assume a 0.1% safety margin.

So as we stand, the interface looks something like this now?

IGuidedReclockDraft : IUnknown
{
    // Returns the average amount of adjustment per second below which audio renderer is
    // guaranteed to use high quality sample rate conversion instead of time stretching.
    // Usually, audio renderer allows short bursts of larger adjustments before
    // switching to time stretching.
    //
    // The value is guaranteed to stay the same until a new graph clock is
    // set by IMediaFilter::SetSyncSource()
    STDMETHOD(GetTimestretchingThreshold)(REFERENCE_TIME* pThreshold) = 0;

    STDMETHOD(SlaveToSystemClock)(BOOL enable) = 0;

    // Instantly offsets graph clock by requested amount. When the method returns, the clock
    // is guaranteed to be adjusted already. Audio renderer is responsible for smoothing out
    // the values and catches up to adjustment after some time.
    STDMETHOD(OffsetClock)(REFERENCE_TIME offset) = 0;
};

For a video renderer to replicate reclock, we can simply do the following:

  1. Let X = difference between refresh rate and frame rate in REFERENCE_TIME
  2. Call GetTimestretchingThreshold() to find out how much headroom we're allowed (for HQ)
  3. If X within the threshold, we'll just call SlaveToSystemClock(true), and OffsetClock(X)
  4. Show the equivalent of a green systray icon like reclock
  5. Keep calculating X and call OffsetClock(X) - this is still required since measurement of display refresh rate gets more accurate over time
  6. If the video renderer detects that the source contains variable framerate, call SlaveToSystemClock(false) and OffsetClock(0) to disable guided reclock (this can happen in the middle of playback - we won't get told it is variable FPS until we've decoded the frames and compared the timestamp)
  7. If it's not within the threshold, we don't do anything, unless user requests that they want it to be activated anyway (i.e. they don't care about lower audio quality)
  8. If SlaveToSystemClock is false and OffsetClock is set to a non-zero value, it'll simply function as a direct offset without the hidden adjustment

I think I've covered all the permutations?

@alexmarsev
Copy link
Owner Author

More or less, but I expected OffsetClock() to work with relative, not absolute values. Meaning OffsetClock(0) won't do anything.

@zachsaw
Copy link

zachsaw commented Aug 6, 2015

OffsetClock(0) means turning off any offsets in relative terms doesn't it?

@alexmarsev
Copy link
Owner Author

OffsetClock(1); OffsetClock(2); OffsetClock(0); would be the same as OffsetClock(3)

@zachsaw
Copy link

zachsaw commented Aug 6, 2015

So essentially all we have to do to set a new offset is like this?

OffsetClock(X);
OffsetClock(-X);

OffsetClock(Y);

@alexmarsev
Copy link
Owner Author

OffsetClock() is for immediate clock warps, not clock rate adjustments. Audio renderer turns these immediate warps into rate adjustment after smoothing them (applying low pass filter in terms of signal processing).

@zachsaw
Copy link

zachsaw commented Aug 6, 2015

Hmm, I'm not sure I like it at all. I know MPDN wouldn't work with a clock that warps backwards and I believe a lot of other players wouldn't either - because MSDN specifically disallows it, for good reasons.

@alexmarsev
Copy link
Owner Author

That's why we should include a method in IGuidedReclock that copies GetTime() but can warp backwards.

@alexmarsev
Copy link
Owner Author

And you expected OffsetClock() to set offset relative to QPC after calling SlaveToSystemClock(), now I finally understand why you needed that method.

@zachsaw
Copy link

zachsaw commented Aug 6, 2015

Yes there are good reasons why MSDN doesn't allow it, otherwise it would've implemented it. I can't remember off the top of my head why it was the case but it made sense when I was writing the player from scratch.

By adding GetTime(), do you then expect all video renderers to use it for reference clock instead? That's a big departure from a player's design. My advise is you should just make OffsetClock adjust playback speed, not clock.

@alexmarsev
Copy link
Owner Author

Adjusting only rate won't allow you to get rid of half-frame offset, for 24p it's around 20ms which is a lot.

@zachsaw
Copy link

zachsaw commented Aug 6, 2015

That's already possible - by changing audio delay in LAV Audio Decoder (i.e. changing time-stamp).

@alexmarsev
Copy link
Owner Author

It makes a lot more sense to include such adjustments in this interface.

@zachsaw
Copy link

zachsaw commented Aug 6, 2015

There's no need to create a new interface for this at all. In fact, any player can change it (audio timestamp) without an additional interface (a simple audio passthrough transform filter would do the job). If that's all IGuidedReclock does, then there's not much use for it at all I'm afraid.

@zachsaw
Copy link

zachsaw commented Aug 6, 2015

What reclock does is it slaves audio clock to system clock, and then allows the additional offset to be applied on top. IReclock would be the perfect interface for such an implementation.

@alexmarsev
Copy link
Owner Author

In fact, any player can change it (audio timestamp) without an additional interface.

Player or video renderer? How exactly?

And the interface will do what we include in it, nothing more nothing less.

@zachsaw
Copy link

zachsaw commented Aug 6, 2015

Just change the timestamp of the audio samples. It's up to the audio renderer then to gradually catchup / slowdown. It is this particularly reason why you don't get an audio break when you change the delay.

@alexmarsev
Copy link
Owner Author

But video renderer can't change the timestamps of audio samples. We're designing the interface between video renderer and audio renderer here.

@zachsaw
Copy link

zachsaw commented Aug 6, 2015

Why not? You can easily traverse the whole graph, add your own audio passthrough transform filter between an audio renderer and whatever comes before it. That's a much more generic solution that will work with every audio renderer out there.

@zachsaw
Copy link

zachsaw commented Aug 6, 2015

I initially though the whole point of IGuidedReclock is to provide something like Reclock where you change sample rates so you can speedup / slowdown audio playback... What you're suggesting is simply changing the delay, which wouldn't be of much more use than what's already available.

Actually come to think of it, maybe we should leave IGuidedReclock out entirely. Like I said, the whole thing can easily be implemented as a transform filter that can be used for every audio renderer, including DirectSound.

@alexmarsev
Copy link
Owner Author

What I propose is that IGuidedReclock can do both.

Like I said, the whole thing can easily be implemented as a transform filter that can be used for every audio renderer, including DirectSound.

You know, by that logic video renderers can come equipped with their own audio renderers (heck, why not).

This is going nowhere, I propose a timeout. Also, packing for vacation right now, will be back on August 23. But will try to release sanear with status page fixes before going.

Also, we should really include madshi in the talk in case he will be interested.

@zachsaw
Copy link

zachsaw commented Aug 6, 2015

Well, having an audio transform filter isn't any different to a custom audio renderer. For example, MPDN (or madVR) still needs to traverse the graph to find out if Sanear is used as an audio renderer. Instead, it would just do the same for the audio transform filter.

BTW, I'm not suggesting that a video renderer should include its own audio transform filter - but a player usually does that.

@zachsaw
Copy link

zachsaw commented Aug 6, 2015

Anyway enjoy your vacation! Try not to think too hard about this :)

@madshi
Copy link

madshi commented Aug 10, 2015

It seems there are 2 different topics here. Here's my thoughts on them:

  1. Should all of this be done in the audio renderer or in a separate transform filter?

To be honest, I've zero experience with the audio side of DirectShow. I've no idea how much tweaking the audio renderer might be able to do without doing actual resampling. Maybe some tricks are possible, by talking to OS audio APIs, the audio driver or even the hardware (e.g. changing audio playback clocks to odd values or something). If any of that is feasible at all then it makes a lot of sense implementing all this in the audio renderer because it wouldn't make sense for an audio transform filter to do things like talking to the audio driver or anything like that.

If this reclock magic is solely based on either resampling or time stretching then zachsaw has a point in suggesting that this could be done in a simple transform filter and would not necessarily have to be part of the audio renderer. However, where it's implemented is ultimately the decision of the developer who implements it, and I personally don't really care much where it ends up in, so it's fine with me either way. Of course the benefit of doing this in a transform filter would be that it could be used with any audio renderer. Not sure if that's just a theoretical advantage or a practical one.

  1. Is the interface as suggested in the first post of this thread ok? Or should this interface adjust the playback rate instead?

To make it short: The way madVR works the originally suggested interface would be just perfect for my needs. It would not only allow me to avoid any frame drops/repeats (when refresh rate and frame rate are reasonably close), it would furthermore allow me to make lipsync perfect. That's just my 2 cents, of course.

A separate GetTime() method which allows back warping might be useful. Adding it wouldn't hurt, in any case.

Personally, I'll probably never call GetTimestretchingThreshold, but that's just me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants