Add Super-xBR and NNEDI3 support #2427

bjin · 2015-10-28T01:58:53Z

An initial version ready for code review

This commit marks the image size variables temporary, and renames them in order to prevent any potential confusion in the future.

Akemi · 2015-10-28T02:44:30Z

i have two remarks.

the options "superxbr-sharpness" and "superxbr-edge-strength" were named but not described in the manual.
would it be feasible to implement a parameter "auto" for the option "prescale-passes". to clarify the auto parameter, it should run as many passes as needed to reach the first resolution greater than the target resolution.

bjin · 2015-10-28T02:54:03Z

the options "superxbr-sharpness" and "superxbr-edge-strength" were named but not described in the manual.

They are mentioned in the manual, but I don't think they deserve detailed explanation.

would it be feasible to implement a parameter "auto" for the option "prescale-passes". to clarify the auto parameter, it should run as many passes as needed to reach the first resolution greater than the target resolution.

There are hard limit on the number of passes, but with the help of prescale-downscaling-threshold, you can get this behavior by setting prescale-passes to the maximal value (currently 5). Once it's greater than the display size, it will stop perform further passes automatically.

ghost · 2015-10-28T10:42:59Z

I'm still very suspicious towards the prescaling stuff. Are multiple passes really needed? Why is alpha/chroma apparently handled separately in some places? How do we know that awful to handle features like chroma positioning, non-mod-2 cropping, and rotation still work correctly in all cases?

ghost · 2015-10-28T10:46:03Z

Oh, and some larger chunks could probably be moved into separate functions, and the prescale stuff could probably be in its own commit.

bjin · 2015-10-28T12:34:33Z

@wm4 Before addressing individual review comments (probably later tonight), I would like to describe how the prescaling works in general.

The current pass_read_video() works as following

merge texture1 and texture2, and resize the merged texture1 to the WxH (the main texture size)
merge texture1 into texture0.gb, and texture3 into texture0.a

With prescaling, it works like this:

apply upscaler on texture0, and store the result in texture4, it's 2Wx2H with offset
merge texture1 and texture2, and resize the merged texture1 to 2Wx2H, with offset applied.
At this point, texture4 (the upscaled main texture) and texture1 (the merged chroma texture) are 2Wx2H with offset, while texture0 (the main texture) and texture3 (the alpha texture) are still WxH without offset.
For the main texture merging, we need to apply offset to texture0 (only non-essential planes, those not overwritten by texture4) and texture3 (the alpha plane), this is how bilinear comes.

And offset will be corrected by main scaler later.

Are multiple passes really needed?

4x upscaling (two passes) is actually very helpful for upscaling DVD content.

Why is alpha/chroma apparently handled separately in some places?

chroma processing happens before the main texture merging, and there are resizer involved as well, we need to calculate the offset carefully.

alpha (and also those non-essential plane from texture0) are process during the main texture merging, and bilinear is used.

How do we know that awful to handle features like chroma positioning, non-mod-2 cropping, and rotation still work correctly in all cases?

chroma positioning, non-mod-2 cropping are already handled by chromafix I believe. the rotation is handled by the compute_src_transform(). the transform there is aware of prescaling.

Oh, and some larger chunks could probably be moved into separate functions, and the prescale stuff could probably be in its own commit.

Acknowledged, will fix that by next push.

bjin · 2015-10-28T18:18:38Z

@wm4 pushed new commits, addressed most of the comments except that I didn't make prescaling into a single function or commit. There is already a separated function for prescaling named pass_prescale(), and the code in pass_read_video() are just glue code to process the planes: back up pass_tex, optionally perform debanding, and some context switch code later.

I should add that I only tested the code with nvidia card and driver (355.11), on Linux.

Also, I have a small concern about my use of GL_UNIFORM_BUFFER macro, it's defined in OpenGL 3.1, will it guaranteed to compile on all platforms we support?

ghost · 2015-10-28T19:09:45Z

There is already a separated function for prescaling named pass_prescale(), and the code in pass_read_video() are just glue code to process the planes

You should try to add new functions, instead of letting existing functions grow to 100s of lines. (If they're already big, there's little justification to make them even bigger.)

Also, I have a small concern about my use of GL_UNIFORM_BUFFER macro, it's defined in OpenGL 3.1, will it guaranteed to compile on all platforms we support?

Generally yes.

bjin · 2015-10-28T20:20:05Z

@wm4 pushed new commits, I moved the luma prescaling part into the own function, since they are kind of self contained. But I had to leave the plane merging part in pass_read_video(), could figure out a nice way to split it.

ghost · 2015-10-28T20:47:19Z

OK, I do think this is a bit better. Since I have no idea what's going on anyway (the concept is simple, the interaction with all the code and possible configurations is not), I have nothing more to say I guess.

@bjin: do you still have any further plans for this code, or do you consider this final?

@haasn: what is your opinion? Review would be appreciated too.

bjin · 2015-10-28T20:52:34Z

@wm4: I have no plan to make major changes to these commits. They should already be big enough just for introducing new features.

ghost · 2015-10-28T21:06:36Z

(Oh I see, a for loop is actually slightly awkward, because you want compute and return the number of passes in the first place. Well, doesn't matter.)

bjin · 2015-10-29T14:44:57Z

made some minor changes to the nnedi3 commit

bjin · 2015-10-30T11:26:43Z

Updated commit message for nnedi3 with details that's too technical to be in the manual.

ghost · 2015-10-30T19:28:34Z

ping @haasn

haasn · 2015-10-31T18:28:15Z

Do we have any comparison images of this version of NNEDI3 against a reference implementation (eg. vapoursynth)?

bjin · 2015-10-31T18:45:08Z

@haasn I would like to describe how the current code prescale non-YUV video on the main thread.

The background is that both Super-xBR and NNEDI3 are designed to handle luma only, it probably will somehow works on chroma or RGB planes (and they actually did, for example the original NNEDI3 filter will also use NNEDI3 to handle chroma, madvr also do chroma upscaling with them).

The original Super-xBR implementation just hardcode color matrix and use the luma value (calculated from RGBA) to find the weights, and applies to all channels later. This is an ugly hack which I don't want to follow. So I removed all these to make superxbr works on luma only.

So the problem comes, how we handle RGB (and XYZ) video? It's difficult choice, and a lot of tradeoffs are involved. We can of course convert RGB back to YUV, upscaling Y with prescale, upscale UV with dscale, and convert it back to RGB later, but it's just too complicated with no actual benefits. We can also disallow prescaling on non-YUV video, but it's just not user friendly.

So the decision I made is to:

prescale RGB (and other non-YUV content) by pretending they are all luma planes.
Add a warning in the manual, suggesting that it might not be ideal to use prescalers on non YUV video, and user can manually overwrite this behavior by using a custom format video filter.

The superxbr is significantly slower for non-YUV. But It's much faster on YUV, and more importantly, correct without ugly code. (Well, I also suspect there will be any complains on this either, superxbr is already very fast compare to NNEDI3. And I don't care much about its performance either).

haasn · 2015-10-31T19:05:33Z

So the problem comes, how we handle RGB (and XYZ) video? It's difficult choice, and a lot of tradeoffs are involved. We can of course convert RGB back to YUV, upscaling Y with prescale, upscale UV with dscale, and convert it back to RGB later, but it's just too complicated with no actual benefits. We can also disallow prescaling on non-YUV video, but it's just not user friendly.

I would personally prefer to disable scaling non-YUV for super-xbr. A version that works directly (and correctly) on RGB exists, and in fact it would just require some abstraction on the underlying type (eg. #define T (channels = 1 ? float : vec3)) + a few more lines to handle the luma coefficients. We can add that sort of abstraction for multiple channels later (based on the code I already have), but for now I would just disable it.

For NNEDI3 I'm undecided. You can keep the current loop around for NNEDI3 simply for lack of a better alternative. It might make sense to add an option for it, since scaling RGB is a huge performance hit and some users might prefer to not take the performance hit on this type of content. (Although the content is rare enough to justify having those users create a non-NNEDI3 profile instead)

Add a warning in the manual, suggesting that it might not be ideal to use prescalers on non YUV video, and user can manually overwrite this behavior by using a custom format video filter.

The problem is that the swscale format filter probably does a pretty bad job of converting to yuv. It would absolutely only be a hack, and never something intended for “real” viewing. (And certainly not one I would recommend users to try, in a man page)

haasn · 2015-10-31T19:08:54Z

Note that for scaling multiple channels there might be a way to reconcile efficiency with speed - perform the heavy “edge detection”-style calculations on the luminance and store its result in a separate texture, than only use this texture + more lightweight blending code on each texture.

That might be the best approach overall as it's both correct and reasonable fast. You can even skip the separate texture when processing luma only (for YUV).

But I would fine with taking the algorithm as-is right now without this change, which would be a significant redesign, and then adding that in in a later commit.

ghost · 2015-10-31T19:58:34Z

and in fact it would just require some abstraction on the underlying type (eg. #define T (channels = 1 ? float : vec3)).

I didn't look what exactly the shader code does here, but generally processing multiple channels shouldn't add much to the performance costs, if at all.

bjin · 2015-10-31T20:39:29Z

I would personally prefer to disable scaling non-YUV for super-xbr.

Yes, it's an option. What I considered is that it's just very cheap to enable it for RGB video. Just introduce an extra variable "prescaled_planes" and we are done.

It would absolutely only be a hack, and never something intended for “real” viewing. (And certainly not one I would recommend users to try, in a man page)

Will remove that part.

UPDATE: quoted wrong sentence

@haasn

Add the Super-xBR filter for image doubling, and the prescaling framework to support it. The shader code was ported from MPDN extensions project, with modification to process luma only. This commit is largely inspired by code from #2266, with `gl_transform_trans()` authored by @haasn taken directly.

Escaping all question marks as well, they can be used to form trigraph characters which are effective even within string literal.

Implement NNEDI3, a neural network based deinterlacer. The shader is reimplemented in GLSL and supports both 8x4 and 8x6 sampling window now. This allows the shader to be licensed under LGPL2.1 so that it can be used in mpv. The current implementation supports uploading the NN weights (up to 51kb with placebo setting) in two different way, via uniform buffer object or hard coding into shader source. UBO requires OpenGL 3.1, which only guarantee 16kb per block. But I find that 64kb seems to be a default setting for recent card/driver (which nnedi3 is targeting), so I think we're fine here (with default nnedi3 setting the size of weights is 9kb). Hard-coding into shader requires OpenGL 3.3, for the "intBitsToFloat()" built-in function. This is necessary to precisely represent these weights in GLSL. I tried several human readable floating point number format (with really high precision as for single precision float), but for some reason they are not working nicely, bad pixels (with NaN value) could be produced with some weights set. We could also add support to upload these weights with texture, just for compatibility reason (etc. upscaling a still image with a low end graphics card). But as I tested, it's rather slow even with 1D texture (we probably had to use 2D texture due to dimension size limitation). Since there is always better choice to do NNEDI3 upscaling for still image (vapoursynth plugin), it's not implemented in this commit. If this turns out to be a popular demand from the user, it should be easy to add it later. For those who wants to optimize the performance a bit further, the bottleneck seems to be: 1. overhead to upload and access these weights, (in particular, the shader code will be regenerated for each frame, it's on CPU though). 2. "dot()" performance in the main loop. 3. "exp()" performance in the main loop, there are various fast implementation with some bit tricks (probably with the help of the intBitsToFloat function). The code is tested with nvidia card and driver (355.11), on Linux. Closes #2230

bjin · 2015-10-31T21:00:03Z

pushed new commits, with all comments addressed (plus several other changes to DOCS/)

bjin · 2015-10-31T21:11:03Z

Do we have any comparison images of this version of NNEDI3 against a reference implementation (eg. vapoursynth)?

Super-xBR (GLSL)

NNEDI3 (GLSL)

NNEDI3 (vapoursynth, taken from `Upscaling` Wiki)

haasn · 2015-11-01T01:33:36Z

That super-xBR image looks sort of funny. Can you compare it to the version in #2266 please?

Also, the NNEDI3 version looks different too for some reasons, but this might be due to settings differences on your end (?). I'll do some testing myself as well.

haasn · 2015-11-01T01:47:28Z

For comparison with the NNEDI3 vapoursynth version: that one was generated with the default settings, which are 8x6 window and 128 neurons.

haasn · 2015-11-01T02:12:49Z

Here is a version of the super-xbr test with my commit. Looks significantly different to me; but that might be because I'm processing based on luma instead of RGB planes individually. (More reason to do it the proper way)

haasn · 2015-11-01T18:55:41Z

The updated commit is basically okay, I just want some more confirmation that the result matches what it actually should be; since there still seem to be some deviations here and there.

Best test on a black and white image to make sure it's not due to the RGB thing.

bjin · 2015-11-01T19:41:59Z

source is produced with convert rose: -grayscale Rec709Luma rose.png

vapoursynth (neurons=32, window=8x6)

vapoursynth (neurons=32, window=8x6, no prescreener)

GLSL (neurons=32, window=8x6)

Looks like the difference is due to vapoursynth's use of prescreener in its default setting (it's an optimization to use another NN to predict which pixel could be interpolated with bicubic instead of the main NN)

haasn · 2015-11-02T00:43:09Z

Okay, looks good. (for NNEDI3 at least)

ghost · 2015-11-05T16:54:50Z

Merged, thanks.

vo_opengl: make image size dynamic during rendering

ec8366c

This commit marks the image size variables temporary, and renames them in order to prevent any potential confusion in the future.

bjin changed the title ~~Add Super-xBR and NNEDI3 shaders~~ Add Super-xBR and NNEDI3 support Oct 28, 2015

bjin added 3 commits October 31, 2015 21:57

file2string: mark question mark as non-safe

2571218

Escaping all question marks as well, they can be used to form trigraph characters which are effective even within string literal.

ghost closed this Nov 5, 2015

ghost mentioned this pull request Aug 15, 2023

gpu-next: cscale x2 filter like super xbr #12163

Closed

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Super-xBR and NNEDI3 support #2427

Add Super-xBR and NNEDI3 support #2427

bjin commented Oct 28, 2015

Akemi commented Oct 28, 2015

bjin commented Oct 28, 2015

ghost commented Oct 28, 2015

ghost commented Oct 28, 2015

bjin commented Oct 28, 2015

bjin commented Oct 28, 2015

ghost commented Oct 28, 2015

bjin commented Oct 28, 2015

ghost commented Oct 28, 2015

bjin commented Oct 28, 2015

ghost commented Oct 28, 2015

bjin commented Oct 29, 2015

bjin commented Oct 30, 2015

ghost commented Oct 30, 2015

haasn commented Oct 31, 2015

bjin commented Oct 31, 2015

haasn commented Oct 31, 2015

haasn commented Oct 31, 2015

ghost commented Oct 31, 2015

bjin commented Oct 31, 2015

bjin commented Oct 31, 2015

bjin commented Oct 31, 2015

haasn commented Nov 1, 2015

haasn commented Nov 1, 2015

haasn commented Nov 1, 2015

haasn commented Nov 1, 2015

bjin commented Nov 1, 2015

haasn commented Nov 2, 2015

ghost commented Nov 5, 2015

Add Super-xBR and NNEDI3 support #2427

Add Super-xBR and NNEDI3 support #2427

Conversation

bjin commented Oct 28, 2015

Akemi commented Oct 28, 2015

bjin commented Oct 28, 2015

ghost commented Oct 28, 2015

ghost commented Oct 28, 2015

bjin commented Oct 28, 2015

bjin commented Oct 28, 2015

ghost commented Oct 28, 2015

bjin commented Oct 28, 2015

ghost commented Oct 28, 2015

bjin commented Oct 28, 2015

ghost commented Oct 28, 2015

bjin commented Oct 29, 2015

bjin commented Oct 30, 2015

ghost commented Oct 30, 2015

haasn commented Oct 31, 2015

bjin commented Oct 31, 2015

haasn commented Oct 31, 2015

haasn commented Oct 31, 2015

ghost commented Oct 31, 2015

bjin commented Oct 31, 2015

bjin commented Oct 31, 2015

bjin commented Oct 31, 2015

haasn commented Nov 1, 2015

haasn commented Nov 1, 2015

haasn commented Nov 1, 2015

haasn commented Nov 1, 2015

bjin commented Nov 1, 2015

haasn commented Nov 2, 2015

ghost commented Nov 5, 2015