-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Super-xBR and NNEDI3 support #2427
Conversation
This commit marks the image size variables temporary, and renames them in order to prevent any potential confusion in the future.
i have two remarks.
|
They are mentioned in the manual, but I don't think they deserve detailed explanation.
There are hard limit on the number of passes, but with the help of |
I'm still very suspicious towards the prescaling stuff. Are multiple passes really needed? Why is alpha/chroma apparently handled separately in some places? How do we know that awful to handle features like chroma positioning, non-mod-2 cropping, and rotation still work correctly in all cases? |
Oh, and some larger chunks could probably be moved into separate functions, and the prescale stuff could probably be in its own commit. |
@wm4 Before addressing individual review comments (probably later tonight), I would like to describe how the prescaling works in general. The current
With prescaling, it works like this:
And
4x upscaling (two passes) is actually very helpful for upscaling DVD content.
chroma processing happens before the main texture merging, and there are resizer involved as well, we need to calculate the offset carefully. alpha (and also those non-essential plane from texture0) are process during the main texture merging, and bilinear is used.
chroma positioning, non-mod-2 cropping are already handled by
Acknowledged, will fix that by next push. |
@wm4 pushed new commits, addressed most of the comments except that I didn't make prescaling into a single function or commit. There is already a separated function for prescaling named I should add that I only tested the code with nvidia card and driver (355.11), on Linux. Also, I have a small concern about my use of |
You should try to add new functions, instead of letting existing functions grow to 100s of lines. (If they're already big, there's little justification to make them even bigger.)
Generally yes. |
@wm4 pushed new commits, I moved the luma prescaling part into the own function, since they are kind of self contained. But I had to leave the plane merging part in |
OK, I do think this is a bit better. Since I have no idea what's going on anyway (the concept is simple, the interaction with all the code and possible configurations is not), I have nothing more to say I guess. @bjin: do you still have any further plans for this code, or do you consider this final? @haasn: what is your opinion? Review would be appreciated too. |
@wm4: I have no plan to make major changes to these commits. They should already be big enough just for introducing new features. |
(Oh I see, a for loop is actually slightly awkward, because you want compute and return the number of passes in the first place. Well, doesn't matter.) |
made some minor changes to the nnedi3 commit |
Updated commit message for nnedi3 with details that's too technical to be in the manual. |
ping @haasn |
Do we have any comparison images of this version of NNEDI3 against a reference implementation (eg. vapoursynth)? |
@haasn I would like to describe how the current code prescale non-YUV video on the main thread. The background is that both Super-xBR and NNEDI3 are designed to handle luma only, it probably will somehow works on chroma or RGB planes (and they actually did, for example the original NNEDI3 filter will also use NNEDI3 to handle chroma, madvr also do chroma upscaling with them). The original Super-xBR implementation just hardcode color matrix and use the luma value (calculated from RGBA) to find the weights, and applies to all channels later. This is an ugly hack which I don't want to follow. So I removed all these to make superxbr works on luma only. So the problem comes, how we handle RGB (and XYZ) video? It's difficult choice, and a lot of tradeoffs are involved. We can of course convert RGB back to YUV, upscaling Y with So the decision I made is to:
The superxbr is significantly slower for non-YUV. But It's much faster on YUV, and more importantly, correct without ugly code. (Well, I also suspect there will be any complains on this either, superxbr is already very fast compare to NNEDI3. And I don't care much about its performance either). |
I would personally prefer to disable scaling non-YUV for super-xbr. A version that works directly (and correctly) on RGB exists, and in fact it would just require some abstraction on the underlying type (eg. For NNEDI3 I'm undecided. You can keep the current loop around for NNEDI3 simply for lack of a better alternative. It might make sense to add an option for it, since scaling RGB is a huge performance hit and some users might prefer to not take the performance hit on this type of content. (Although the content is rare enough to justify having those users create a non-NNEDI3 profile instead)
The problem is that the swscale format filter probably does a pretty bad job of converting to yuv. It would absolutely only be a hack, and never something intended for “real” viewing. (And certainly not one I would recommend users to try, in a man page) |
Note that for scaling multiple channels there might be a way to reconcile efficiency with speed - perform the heavy “edge detection”-style calculations on the luminance and store its result in a separate texture, than only use this texture + more lightweight blending code on each texture. That might be the best approach overall as it's both correct and reasonable fast. You can even skip the separate texture when processing luma only (for YUV). But I would fine with taking the algorithm as-is right now without this change, which would be a significant redesign, and then adding that in in a later commit. |
I didn't look what exactly the shader code does here, but generally processing multiple channels shouldn't add much to the performance costs, if at all. |
Yes, it's an option. What I considered is that it's just very cheap to enable it for RGB video. Just introduce an extra variable "prescaled_planes" and we are done.
Will remove that part. UPDATE: quoted wrong sentence |
Add the Super-xBR filter for image doubling, and the prescaling framework to support it. The shader code was ported from MPDN extensions project, with modification to process luma only. This commit is largely inspired by code from #2266, with `gl_transform_trans()` authored by @haasn taken directly.
Escaping all question marks as well, they can be used to form trigraph characters which are effective even within string literal.
Implement NNEDI3, a neural network based deinterlacer. The shader is reimplemented in GLSL and supports both 8x4 and 8x6 sampling window now. This allows the shader to be licensed under LGPL2.1 so that it can be used in mpv. The current implementation supports uploading the NN weights (up to 51kb with placebo setting) in two different way, via uniform buffer object or hard coding into shader source. UBO requires OpenGL 3.1, which only guarantee 16kb per block. But I find that 64kb seems to be a default setting for recent card/driver (which nnedi3 is targeting), so I think we're fine here (with default nnedi3 setting the size of weights is 9kb). Hard-coding into shader requires OpenGL 3.3, for the "intBitsToFloat()" built-in function. This is necessary to precisely represent these weights in GLSL. I tried several human readable floating point number format (with really high precision as for single precision float), but for some reason they are not working nicely, bad pixels (with NaN value) could be produced with some weights set. We could also add support to upload these weights with texture, just for compatibility reason (etc. upscaling a still image with a low end graphics card). But as I tested, it's rather slow even with 1D texture (we probably had to use 2D texture due to dimension size limitation). Since there is always better choice to do NNEDI3 upscaling for still image (vapoursynth plugin), it's not implemented in this commit. If this turns out to be a popular demand from the user, it should be easy to add it later. For those who wants to optimize the performance a bit further, the bottleneck seems to be: 1. overhead to upload and access these weights, (in particular, the shader code will be regenerated for each frame, it's on CPU though). 2. "dot()" performance in the main loop. 3. "exp()" performance in the main loop, there are various fast implementation with some bit tricks (probably with the help of the intBitsToFloat function). The code is tested with nvidia card and driver (355.11), on Linux. Closes #2230
pushed new commits, with all comments addressed (plus several other changes to DOCS/) |
That super-xBR image looks sort of funny. Can you compare it to the version in #2266 please? Also, the NNEDI3 version looks different too for some reasons, but this might be due to settings differences on your end (?). I'll do some testing myself as well. |
For comparison with the NNEDI3 vapoursynth version: that one was generated with the default settings, which are 8x6 window and 128 neurons. |
The updated commit is basically okay, I just want some more confirmation that the result matches what it actually should be; since there still seem to be some deviations here and there. Best test on a black and white image to make sure it's not due to the RGB thing. |
source is produced with
Looks like the difference is due to vapoursynth's use of prescreener in its default setting (it's an optimization to use another NN to predict which pixel could be interpolated with bicubic instead of the main NN) |
Okay, looks good. (for NNEDI3 at least) |
Merged, thanks. |
An initial version ready for code review
@haasn