Skip to content

Commit

Permalink
Simplifying downscaling logic again
Browse files Browse the repository at this point in the history
This offers a small performance increase for all scaling factors while
providing the exact same quality at 2x (normal 4:2:0 use-case).

At 4x (when using a luma doubler) it is a bit worse but nothing
noticeable on real content. A nice side-effect is that the problem with
vanishing line-art seems to be a bit less severe now (I think it's due
to lower correlation, the older downscaling code was blurrier which
made the correlation between planes higher).

Again I may or may not revert this at some point, but I'm reasonably
happy with the change and it also reduces bloat a bit.
  • Loading branch information
Artoriuz committed Jan 20, 2024
1 parent 471c65d commit 3fc1c1a
Showing 1 changed file with 2 additions and 37 deletions.
39 changes: 2 additions & 37 deletions CfL_Prediction.glsl
Original file line number Diff line number Diff line change
Expand Up @@ -25,47 +25,12 @@
//!BIND HOOKED
//!SAVE LUMA_LOWRES
//!WIDTH CHROMA.w
//!HEIGHT LUMA.h
//!WHEN CHROMA.w LUMA.w <
//!DESC Chroma From Luma Prediction (Downscaling Luma 1st Step)

vec4 hook() {
float factor = ceil(LUMA_size.x / HOOKED_size.x);
int start = int(ceil(-factor / 2.0 - 0.5));
int end = int(floor(factor / 2.0 - 0.5));

float output_luma = 0.0;
int wt = 0;
for (int dx = start; dx <= end; dx++) {
output_luma += LUMA_texOff(vec2(dx + 0.5, 0.0)).x;
wt++;
}
vec4 output_pix = vec4(output_luma / float(wt), 0.0, 0.0, 1.0);
return output_pix;
}

//!HOOK CHROMA
//!BIND LUMA_LOWRES
//!BIND HOOKED
//!SAVE LUMA_LOWRES
//!WIDTH CHROMA.w
//!HEIGHT CHROMA.h
//!WHEN CHROMA.w LUMA.w <
//!DESC Chroma From Luma Prediction (Downscaling Luma 2nd Step)
//!DESC Chroma From Luma Prediction (Downscaling Luma)

vec4 hook() {
float factor = ceil(LUMA_LOWRES_size.y / HOOKED_size.y);
int start = int(ceil(-factor / 2.0 - 0.5));
int end = int(floor(factor / 2.0 - 0.5));

float output_luma = 0.0;
int wt = 0;
for (int dy = start; dy <= end; dy++) {
output_luma += LUMA_LOWRES_texOff(vec2(0.0, dy + 0.5)).x;
wt++;
}
vec4 output_pix = vec4(output_luma / float(wt), 0.0, 0.0, 1.0);
return output_pix;
return LUMA_texOff(0.0);
}

//!HOOK CHROMA
Expand Down

45 comments on commit 3fc1c1a

@Jules-A
Copy link

@Jules-A Jules-A commented on 3fc1c1a Jan 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should give this a try: https://pastebin.com/raw/G338Rzfh
It's not perfect and not as fast as just doing return LUMA_texOff(0.0); but it's ~3.5x faster than the previous downscaling logic.
Testing on 1440p test image with default MPV settings an no other shaders:
PSNR: 43.6134
DSSIM: 0.1275
MAE: 118.464 (0.00180765)

Image

mpv-shot0002

return LUMA_texOff(0.0);
PSNR: 43.3833
DSSIM: 0.127592
MAE: 120.589 (0.00184007)

Image

mpv-shot0001

EDIT: I'm seeing the same thing when testing with luma 2x and 1080p -> output res x2 (2.6667x luma).
For some reason it scores higher in metrics than the old downscaling logic also but I haven't compared them visually properly.
EDIT2: Oh, you'll want to linearize/delinearize if you aren't using any luma scalers as the edges will be a bit furry otherwise. If you want to truly get rid of the furryness though you'll need to loop over every x pixel instead of every 2nd though that doubles the processing cost but should still be around ~1.8x faster than previous downscaling logic (depends what you are really after).

@deus0ww
Copy link

@deus0ww deus0ww commented on 3fc1c1a Jan 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jules-A Isn't that hermite? How did you get it to be 3.5x faster?

This is basically what I'm using:

//!HOOK CHROMA
//!BIND CHROMA
//!BIND LUMA
//!SAVE LUMA_LOWRES
//!WIDTH CHROMA.w
//!HEIGHT LUMA.h
//!WHEN CHROMA.w LUMA.w <
//!DESC CfL Downscaling Yx Hermite

#define axis 0

vec2  scale = LUMA_size / CHROMA_size;
ivec2 start = ivec2(ceil(-scale - 0.5));
ivec2 end   = ivec2(floor(scale - 0.5));
ivec2 axle  = ivec2(0);

vec4 hook() {
    float d;
    float w;
    float wsum = 0.0;
    float ysum = 0.0;
    axle[axis] = 1;
    for(int i = start[axis]; i <= end[axis]; i++) {
        d = i + 0.5;
        w = smoothstep(0.0, 1.0, 1 - abs(d) / scale[axis]);
        wsum += w;
        ysum += w == 0.0 ? 0.0 : w * LUMA_texOff(axle * vec2(d)).x;
    }
    return vec4(ysum / wsum, 0.0, 0.0, 1.0);
}

//!HOOK CHROMA
//!BIND CHROMA
//!BIND LUMA_LOWRES
//!SAVE LUMA_LOWRES
//!WIDTH CHROMA.w
//!HEIGHT CHROMA.h
//!WHEN CHROMA.w LUMA.w <
//!DESC CfL Downscaling Yy Hermite

#define axis 1

vec2  scale = LUMA_LOWRES_size / CHROMA_size;
ivec2 start = ivec2(ceil(-scale - 0.5));
ivec2 end   = ivec2(floor(scale - 0.5));
ivec2 axle  = ivec2(0);

vec4 hook() {
    float d;
    float w;
    float wsum = 0.0;
    float ysum = 0.0;
    axle[axis] = 1;
    for(int i = start[axis]; i <= end[axis]; i++) {
        d = i + 0.5;
        w = smoothstep(0.0, 1.0, 1 - abs(d) / scale[axis]);
        wsum += w;
        ysum += w == 0.0 ? 0.0 : w * LUMA_LOWRES_texOff(axle * vec2(d)).x;
    }
    return vec4(ysum / wsum, 0.0, 0.0, 1.0);
}

@Jules-A
Copy link

@Jules-A Jules-A commented on 3fc1c1a Jan 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't that hermite? How did you get it to be 3.5x faster?

@deus0ww
It's only sampling along x every 2nd. Yours is doing x and y for each pixel along the respective lines.
Honestly though, I'm not sure the tradeoff is really worth it though unless you absolutely need the speed. I think doing this is a better tradeoff: https://pastebin.com/raw/mWds8rJM (at least while the main upscaling is still so heavy in comparison).

I wanted to do hermite interpolation between the samples but I'm still not quite sure how to do that yet.
This is my attempt to get it working: https://pastebin.com/raw/c5Aw8MdP
I'm sure it's not correct but surprisingly it seems pretty close to my first version linearized.

@deus0ww
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jules-A Performance wise, 2-pass hermite is only a bit slower than box for me. I'm using it because it's less blocky in gradients (light red to dark red, for example). I don't have objective scores but, at the moment, hermite downscale + fsr upscale, looks best to me.

@Jules-A
Copy link

@Jules-A Jules-A commented on 3fc1c1a Jan 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jules-A Performance wise, 2-pass hermite is only a bit slower than box for me. I'm using it because it's less blocky in gradients (light red to dark red, for example). I don't have objective scores but, at the moment, hermite downscale + fsr upscale, looks best to me.

@deus0ww
Ahahha talking about reds, switching away from the previous downscaling logic to a single pass (like in this commit or the ones I've been posting) almost completely fix #5 . I didn't even think it could have been the downscaler at fault. It still occurs in your Hermite version so I guess maybe it has something to do with sampling Y.
As for how your version scores, it scores a little lower than just using hermite sampling for X. However, it's a LOT sharper, I think due to FSR, I didn't test just your version of downscaler on blurry content so it should pull ahead there. I'm really not sure about your "less blocky" though, to me I see the opposite vs Cfl with old DS logic.

@deus0ww
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jules-A The blockiness is from fsr upscale, not hermite downscale, I think. I suspect the difference we're seeing is because I'm scaling at higher factor, i.e. 540p/720p 4:2:0 ==> 2160p, with luma pre-scaled.

@Jules-A
Copy link

@Jules-A Jules-A commented on 3fc1c1a Jan 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jules-A The blockiness is from fsr upscale, not hermite downscale, I think. I suspect the difference we're seeing is because I'm scaling at higher factor, i.e. 540p/720p 4:2:0 ==> 2160p, with luma pre-scaled.

@deus0ww
I was sort of was able to reproduce scaling 1080p to 12k as a test, I wasn't really able to notice it much before that though.
Anyway, I ended up settling with https://pastebin.com/raw/W9s4HRVF for now for 2.5x speed over the old DS code. The ugly conditions is just a speed hack that I can't notice any difference on. I know it seems weird with such small numbers but it adds up so simply 0ing them becomes noticeable. Linearizing is a bit questionable, likely down to personal opinion but I slightly preferred it when blind testing.

@deus0ww
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jules-A
Try this one. Two-pass hermite with an extra sample + optimized (about 20-25% faster).

//!HOOK CHROMA
//!BIND CHROMA
//!BIND LUMA
//!SAVE LUMA_LOWRES
//!WIDTH CHROMA.w
//!HEIGHT LUMA.h
//!WHEN CHROMA.w LUMA.w <
//!DESC CfL Downscaling Yx Hermite
#define axis 0
#define weight hermite

float box(const float d)      { return float(abs(d) <= 0.5); }
float hermite(const float d)  { return smoothstep(0.0, 1.0, 1 - abs(d)); }

vec2  scale = LUMA_size / CHROMA_size;
const ivec2 axle = ivec2(axis == 0, axis == 1);

vec4 hook() {
    float d, w;
    float wsum = weight(0);
    float ysum = LUMA_tex(LUMA_pos).x;
    for(int i = 0; i < scale[axis]; i++) {
        d = i + 0.5;
        w = weight(d / scale[axis]);
        if (w == 0.0) { continue; }
        wsum += w * 2.0;
        ysum += w * (LUMA_texOff(axle * vec2( d)).x +
                     LUMA_texOff(axle * vec2(-d)).x);
    }
    return vec4(ysum / wsum, 0.0, 0.0, 1.0);
}

//!HOOK CHROMA
//!BIND CHROMA
//!BIND LUMA_LOWRES
//!SAVE LUMA_LOWRES
//!WIDTH CHROMA.w
//!HEIGHT CHROMA.h
//!WHEN CHROMA.w LUMA.w <
//!DESC CfL Downscaling Yy Hermite
#define axis 1
#define weight hermite

float box(const float d)      { return float(abs(d) <= 0.5); }
float hermite(const float d)  { return smoothstep(0.0, 1.0, 1 - abs(d)); }

vec2  scale = LUMA_LOWRES_size / CHROMA_size;
const ivec2 axle = ivec2(axis == 0, axis == 1);

vec4 hook() {
    float d, w;
    float wsum = weight(0);
    float ysum = LUMA_LOWRES_tex(LUMA_LOWRES_pos).x;
    for(int i = 0; i < scale[axis]; i++) {
        d = i + 0.5;
        w = weight(d / scale[axis]);
        if (w == 0.0) { continue; }
        wsum += w * 2.0;
        ysum += w * (LUMA_LOWRES_texOff(axle * vec2( d)).x +
                     LUMA_LOWRES_texOff(axle * vec2(-d)).x);
    }
    return vec4(ysum / wsum, 0.0, 0.0, 1.0);
}

@Jules-A
Copy link

@Jules-A Jules-A commented on 3fc1c1a Jan 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jules-A Try this one. Two-pass hermite with an extra sample + optimized (about 20-25% faster).

@deus0ww I only tested with base Cfl code, not FSR upscaling but as far as I can tell, it's worse (vs my above solution) in every single way (metrics are lower too), it's not even sharper as I was expecting, not even at extremely high res. It's also extremely heavy (>3x slower than my above solution) and still artifacts on reds.

@deus0ww
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jules-A Last one. Same performance but with quadratic weights.

//!HOOK CHROMA
//!BIND CHROMA
//!BIND LUMA
//!SAVE LUMA_LOWRES
//!WIDTH CHROMA.w
//!HEIGHT LUMA.h
//!WHEN CHROMA.w LUMA.w <
//!DESC CfL Downscaling Yx Quadratic
#define axis 0
#define weight quadratic

float box(const float d)      { return float(abs(d) <= 0.5); }
float triangle(const float d) { return max(1.0 - abs(d), 0.0); }
float hermite(const float d)  { return smoothstep(0.0, 1.0, 1 - abs(d)); }
float quadratic(const float d) {
    float x = 1.5 * abs(d);
    if (x < 0.5)
        return(0.75 - x * x);
    if (x < 1.5)
        return(0.5 * (x - 1.5) * (x - 1.5));
    return(0.0);
}

vec2  scale = LUMA_size / CHROMA_size;
const ivec2 axle = ivec2(axis == 0, axis == 1);

vec4 hook() {
    float d, w, wsum, ysum = 0.0;
    for(int i = 0; i < scale[axis]; i++) {
        d = i + 0.5;
        w = weight(d / scale[axis]);
        if (w == 0.0) { continue; }
        wsum += w * 2.0;
        ysum += w * (LUMA_texOff(axle * vec2( d)).x +
                     LUMA_texOff(axle * vec2(-d)).x);
    }
    return vec4(ysum / wsum, 0.0, 0.0, 1.0);
}

//!HOOK CHROMA
//!BIND CHROMA
//!BIND LUMA_LOWRES
//!SAVE LUMA_LOWRES
//!WIDTH CHROMA.w
//!HEIGHT CHROMA.h
//!WHEN CHROMA.w LUMA.w <
//!DESC CfL Downscaling Yy Quadratic
#define axis 1
#define weight quadratic

float box(const float d)      { return float(abs(d) <= 0.5); }
float triangle(const float d) { return max(1.0 - abs(d), 0.0); }
float hermite(const float d)  { return smoothstep(0.0, 1.0, 1 - abs(d)); }
float quadratic(const float d) {
    float x = 1.5 * abs(d);
    if (x < 0.5)
        return(0.75 - x * x);
    if (x < 1.5)
        return(0.5 * (x - 1.5) * (x - 1.5));
    return(0.0);
}

vec2  scale = LUMA_LOWRES_size / CHROMA_size;
const ivec2 axle = ivec2(axis == 0, axis == 1);

vec4 hook() {
    float d, w, wsum, ysum = 0.0;
    for(int i = 0; i < scale[axis]; i++) {
        d = i + 0.5;
        w = weight(d / scale[axis]);
        if (w == 0.0) { continue; }
        wsum += w * 2.0;
        ysum += w * (LUMA_LOWRES_texOff(axle * vec2( d)).x +
                     LUMA_LOWRES_texOff(axle * vec2(-d)).x);
    }
    return vec4(ysum / wsum, 0.0, 0.0, 1.0);
}

@Jules-A
Copy link

@Jules-A Jules-A commented on 3fc1c1a Jan 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jules-A Last one. Same performance but with quadratic weights.

@deus0ww Looks better imo, metrics didn't really change much.
I've given up on sampling Y though, it just seems to be causing excessive thinning and artifacting reds.
It turns out the downscaler I was using fell off a bit at higher scaling and for whatever reason sampling less often seems to look better and obviously perform much better (at every 3 it's >6x faster than old DS code) so I switched to using this super ugly mess: https://pastebin.com/raw/bS8hCUD9

@deus0ww
Copy link

@deus0ww deus0ww commented on 3fc1c1a Jan 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jules-A Instead of sparsely sampling, how about adding a pass to downscale by 2x with LUMA_texOff(0.0) first?

@Jules-A
Copy link

@Jules-A Jules-A commented on 3fc1c1a Jan 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jules-A Instead of sparsely sampling, how about adding a pass to downscale by 2x with LUMA_texOff(0.0) first?

@deus0ww
While I originally thought that was a good idea for speed, it turns out my testing was flawed since it will kill details on non-upscaled content but I was only testing with Ravu first so I could test for speed, I didn't think the results would vary so much without it. For higher scaling factors it could be an option but I'd have to be conditional and trying to set it up like that is hard considering I can't seem to get my head around Reverse Polish Notation that MPV uses....

It does score higher in metrics (on the new test image) at higher scaling (2x Luma) though (just doing to Chroma * 2 with LUMA_texOff(0) and sampling every 1x ):

MAE: 647.52 (0.00988051), PSNR: 33.2872, DSSIM: 0.138371

mpv-shot0001

vs

MAE: 651.323 (0.00993855), PSNR: 33.2374, DSSIM: 0.138551

mpv-shot0002

However I'm not overkeen on relying on the metrics as the killing of shadows/boarders is still visible. However it does add a nice smoothing effect to the image which can't really be ignored.

EDIT: Looks like I did it: https://pastebin.com/raw/hKJj4x1j
Not quite what you suggested but doing a LUMA_texOff is less damaging as a 2nd pass, though doing 1x sampling is very costly at very high resolutions (5x was slower than old DS code but 2x is a decent bit faster.

EDIT2: In the end I ended up settling with: https://pastebin.com/raw/TQ6ZU9bC as the gains were worth it imo. Had to reduce sampling to every 2nd at very high scaling for performance reasons (and that it just doesn't end up any better sampling every one at that point).

@Jules-A
Copy link

@Jules-A Jules-A commented on 3fc1c1a Jan 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay so testing vs Master with exact same settings/test image as Artorias's blog gives these results (couldn't seem to replicate his):

---With no luma upscaling/extra shaders---

Mine:

MAE: 476.586 (0.00727224), PSNR: 37.3049, DSSIM: 0.135971

mpv-shot0002

Master:
MAE: 485.375 (0.00740634), PSNR: 36.9746, DSSIM: 0.136233

mpv-shot0001

---With Ravu-lite (so LUMA_texOff(0) pass is activated)---

Mine:

MAE: 646.465 (0.00986442), PSNR: 32.9257, DSSIM: 0.138081

mpv-shot0002

Master:

MAE: 655.511 (0.0100025), PSNR: 32.8141, DSSIM: 0.138452

mpv-shot0001

Differences are just as large with the old test image but that's enough testing for today...

NOTE: with large scaling factors you will get much better metric results (does look better for the most part) with first pass set to Chroma * 2, however you will see more thinning/disappearing shadows and discoloured reds which bothered me (hence just doing luma / 2.
I also figured out how to get doing LUMA_texOff(0) pass first without always activating and that seems to be producing some better results at very high scaling but I want to watch something now lol.

@deus0ww
Copy link

@deus0ww deus0ww commented on 3fc1c1a Jan 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jules-A The problem, from my testing, is that LUMA_texOff(0) is terrible when scaling not at 2^n factor, while a "standard" sampling filter is terrible with fractional downscale.

I don't even care about the ultimate picture quality... I just want to be able to resize the player window without strange artifacts at certain scales.

By the way, LUMA_texOff(0) + ortho quadratic (unoptimized to support fractional scaling) was slower than just optimized ortho quadratic.

Edit:
My scaler had an issue with non-even integer downscale. This is fixed. It's now clean of artifact (or has similar level of artifact anyway...) at all scale factor (previously, 1440p->4k and 720p->1080p were not good). I'll probably stick with ortho quadratic (2rd order b-spline approximation of gaussian) because of this. Performance wise, it's a bit slower than the original box filter before the texOff commit.

@Jules-A
Copy link

@Jules-A Jules-A commented on 3fc1c1a Jan 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't even care about the ultimate picture quality... I just want to be able to resize the player window without strange artifacts at certain scales.
Edit: My scaler had an issue with non-even integer downscale. This is fixed. It's now clean of artifact (or has similar level of artifact anyway...) at all scale factor (previously, 1440p->4k and 720p->1080p were not good). I'll probably stick with ortho quadratic (2rd order b-spline approximation of gaussian) because of this. Performance wise, it's a bit slower than the original box filter before the texOff commit.

@deus0ww
Did you see it with my code? I don't resize the window often but I often play at fractional factors and don't really see any artifacts that aren't also seen at perfect scaling other than a bit of aliasing (more like furryness) which linearizing helped with, however it's not really needed when LUMA_texOff(0) is done in the last pass.
As for your code, sorry but I totally forgot to mention that you weren't sampling the last pixel (though I doubt that would have caused much of an issue), it wasn't too noticeable but it did show up in metrics.

@deus0ww
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deus0ww As for your code, sorry but I totally forgot to mention that you weren't sampling the last pixel (though I doubt that would have caused much of an issue), it wasn't too noticeable but it did show up in metrics.

How am I missing the last pixel?

The artifact I'm seeing is like moire pattern, not any furriness.

@Jules-A
Copy link

@Jules-A Jules-A commented on 3fc1c1a Jan 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How am I missing the last pixel?

You were scaling to < axis instead of <= but looks like you fixed that.

The artifact I'm seeing is like moire pattern, not any furriness.

Weird, I'm not seeing anything like that...

@deus0ww
I just did some tests with/without linearization again and it seems to look pretty in blind tests, even with doing a LUMA_texOff(0) pass but metrics don't like it because it does thin edges a bit.

I also tested replacing my code that re-uses old data with just skipping sampling at 0.0 like you are doing and for whatever reason, simply skipping at 0.0 compared to what I'm doing is causing edges to get thinned a bit and metrics are slightly worse :/

@deus0ww
Copy link

@deus0ww deus0ww commented on 3fc1c1a Jan 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm using the rtings.com chroma subsampling test image, which is obviously useless for tweaking quality but very useful for testing the sampling code and weights.

4K 4:4:4:
TextTest_2160_rgb

@Jules-A
Copy link

@Jules-A Jules-A commented on 3fc1c1a Jan 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deus0ww
I finally found a solution but it's bloody weird mess that I found by trying pretty much everything... Turns out hermite was causing more issues than I originally thought, making anime with non-solid lines super hairy even with the default cfl upscaler. It was also causing blocking (? actually more like banding) artifacts on street lights ect.
Mixing with jinc window and some wrong implementation of fsr (i don't know why but it works...) seems to fix it and also reduce the amount of other artifacts like thinning/discoloured reds ect.
Unfortunately it loses a large amount of sharpness and smoothness (which I changed the correlation to 0.88 to help with) but using FSR for upscaling just has too many issues. The most common ones are the blocking/banding/saturation artifacts but at large scaling factors the blocking becomes massive (I think the term is macroblocking). You can minimize the affect of it by not letting the negative weights go any lower than -7.15 (with your fsr version) but you start to lose some fine details so still can't decide if the sharpness/smoothness is worth it.
Obviously by calculating 3 separate weights it's a lot more expensive, though still a bit faster than cfl's previous ds code at smaller factors to slightly slower at larger factors.

I only tested 720p -> 4k and between so I didn't bother with other cases but I did briefly test 480p.

Downscale code I came up with:
//!HOOK CHROMA
//!BIND LUMA
//!BIND HOOKED
//!SAVE LUMA_LOWRES
//!WIDTH LUMA.w 2 /
//!HEIGHT LUMA.h 2 /
//!WHEN LUMA.w CHROMA.w >
//!DESC Chroma From Luma Prediction (Downscaling Luma 1)

#define M_PI 3.1415927 // pi
#define M_PI_4 0.7853982 // pi/4
#define M_2_PI 0.6366198 // 2/pi
#define M_SQRT2 1.4142136 // sqrt(2)
#define EPS 1e-6

//Credit Garamond13 for jinc code
float bessel_J1(float x)
{
  if (x < 2.2931157)
      return x / 2.0 - x * x * x / 16.0 + x * x * x * x * x / 384.0 - x * x * x * x * x * x * x / 18432.0;
  else
      return sqrt(M_2_PI / x) * (1.0 + 0.1875 / (x * x) - 0.1933594 / (x * x * x * x)) * cos(x - 3.0 * M_PI_4 + 0.375 / x - 0.1640625 / (x * x * x));
}

#define jinc(x) ((x < EPS) ? 1.0 : (2.0 * bessel_J1(M_PI * x) / (M_PI * x)))
#define fsr(x) (25.0 / 16.0 * pow(2.0 / 5.0 * x - 1.0, 2.0) - (25.0 / 16.0 - 1.0)) * pow(1.0 / 4.0 * x - 1.0, 2.0)
#define hermite(x,y) smoothstep(0.0, 1.0, 1.0 - ((x) / (y + 0.5)))

vec4 hook() {
  float factor = ceil(LUMA_size.x / HOOKED_size.x);
  ivec2 posx = ivec2(int(ceil(-factor / 2.0 - 0.5)), int(floor(factor / 2.0 - 0.5)));
  float output_luma, wt, w, d, off = 0.0;
  int age = 1; //Pre-set age so first sample is fresh
  
  for (int dx = posx.x; dx <= posx.y; dx++) {
      d = dx + 0.5;
      wt += w = hermite(d, posx.y) * jinc(d/factor*1.2196699) * fsr(min((d/2)/factor, 4.0));
      if (w == 0.0) {
          age++;
          continue;
      }
      if (age < 1) {
          if (w < 0.0005) {
              age++;
          } else {
              off = LUMA_texOff(vec2(d, 0.0)).x;
              age = 0;
          }
      } else {
          off = LUMA_texOff(vec2(d, 0.0)).x;
          age = 0;
      }
      output_luma += w * off;
  }
  return vec4(output_luma / wt, 0.0, 0.0, 1.0);
}

//!HOOK CHROMA
//!BIND LUMA_LOWRES
//!BIND HOOKED
//!SAVE LUMA_LOWRES
//!WIDTH CHROMA.w
//!HEIGHT CHROMA.h
//!WHEN LUMA.w CHROMA.w >
//!DESC Chroma From Luma Prediction (Downscaling Luma 2)

vec4 hook() {
  return LUMA_LOWRES_texOff(0.0);
}

and here's a quick comparison with luma scaling to 2.5x output (ravu zoom ar3) and my usual MPV settings:

Mine:

MAE: 633.524 (0.00966696), PSNR: 34.0036, DSSIM: 0.138046

mpv-shot0001

Yours:

MAE: 645.129 (0.00984404), PSNR: 33.7475, DSSIM: 0.138418

mpv-shot0002

Notice the Christmas tree in the top right? Yours is taking a huge chunk out of the left branches.

EDIT: The difference with no luma scaling is quite a bit larger.

@deus0ww
Copy link

@deus0ww deus0ww commented on 3fc1c1a Feb 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have stopped trying to affect the artifacts by tinkering with the downscaler. At this point, I'm happy with ortho quadratic, which is providing clean, low-artifact downscale at all scaling factor. If you're still testing, test 1.5x luma scaling, too (like 720p -> 1080p, or 1440p -> 4k). That's where texOff(0) and the previous box scaler get really bad.

As for the upscaler... tinkering in progress. My observation so far: softer weights = higher fuzziness/furriness, sharper wieghts = more blocking. How difficult would it be to port all of FSR to chroma, I wonder....?

@Jules-A
Copy link

@Jules-A Jules-A commented on 3fc1c1a Feb 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have stopped trying to affect the artifacts by tinkering with the downscaler. At this point, I'm happy with ortho quadratic, which is providing clean, low-artifact downscale at all scaling factor. If you're still testing, test 1.5x luma scaling, too (like 720p -> 1080p, or 1440p -> 4k). That's where texOff(0) and the previous box scaler get really bad.

As for the upscaler... tinkering in progress. My observation so far: softer weights = higher fuzziness/furriness, sharper wieghts = more blocking. How difficult would it be to port all of FSR to chroma, I wonder....?

I tried 1440p -> 4k and it was fine, never tried 720 -> 1080 though since I have 1440p monitor. I just tested 720 -> 1080p and it still seems to edge out your code but that's was just testing with cfl changes also.
FSR already has an RGB version that you can easily just hook chroma and clamp, it isn't terrible but ravu-zoom-ar3 (rgb) does better doing the same method. Maybe porting them properly can show some improvements but I'm really not sure what else needs to be done.

I don't get how you can say you're seeing no artifacts with your solution? I'm seeing way more more with than with just texOff(0), though previous ds suffered from those same issues. There's tonnes of excessive thinning and discoloration of reds.

@deus0ww
Copy link

@deus0ww deus0ww commented on 3fc1c1a Feb 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where's FSR RGB? I'll try ravu, too, but I'm expecting it to be too slow.

@Jules-A
Copy link

@Jules-A Jules-A commented on 3fc1c1a Feb 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where's FSR RGB? I'll try ravu, too, but I'm expecting it to be too slow.

Ravu isn't really that slow compared to CFL (pre-DS change). Not sure where I got it, if it was an old version of the main author (https://gist.githubusercontent.com/agyild/82219c545228d70c5604f865ce0b0ce5/raw/4ef91348ab4ade0ef74c6c487df27cf31bdc69ae/FSR.glsl) or somewhere else.
I was using the

version on the left

image

sourced from somewhere else so it may not have been performing as well as it could.

@deus0ww
Copy link

@deus0ww deus0ww commented on 3fc1c1a Feb 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get how you can say you're seeing no artifacts with your solution? I'm seeing way more more with than with just texOff(0), though previous ds suffered from those same issues. There's tonnes of excessive thinning and discoloration of reds.

I'm thinking that the clean downscale is exposing the artifacts in the upscale, which is where I think it should be fixed.

@Jules-A
Copy link

@Jules-A Jules-A commented on 3fc1c1a Feb 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking that the clean downscale is exposing the artifacts in the upscale.

Nah, I think it has something to do with it varying too much from the original luma. I tried many downscaler variants on just luma and they seemed fine, but when I tried to use them for downscaling for cfl they all bombed. I don't think it's the upscaler in cfl since I also tried many different ones there and they showed the same issues.

EDIT: Whoops, I was only able to get EASU working for chroma, I think I had issues with RCAS.

@deus0ww
Copy link

@deus0ww deus0ww commented on 3fc1c1a Feb 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try this for upscale weight. It's not as sharp but zero blockiness for me.

float quadratic(const vec2 d) {
    float x = length(d);
    if (x < 0.5)
        return(0.75 - x * x);
    if (x < 1.5)
        return(0.5 * (x - 1.5) * (x - 1.5));
    return(0.0);
}

Edit: How did you get ravu to work with chroma?

@Jules-A
Copy link

@Jules-A Jules-A commented on 3fc1c1a Feb 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edit: How did you get ravu to work with chroma?

I was scaling it to output before and using it along with cfl (which was scaling to luma from output) before I fixed the cfl's downscaler, if that's all you wanted you can just change MAIN to CHROMA with this shader: https://raw.githubusercontent.com/bjin/mpv-prescalers/master/ravu-zoom-ar-r3-rgb.hook and maybe edit/remove the when clause if you aren't meeting the condition. If you want to scale it to LUMA you can just change the width/height to LUMA. You may also want to do a final 0.0,1.0 clamp on the output but it doesn't seem any different from just the default AR code.

Metrics are at 2x Luma scaling:
EASU - MAE: 661.52 (0.0100942), PSNR: 33.0653, DSSIM: 0.140013
RAVU - MAE: 648.929 (0.00990202), PSNR: 33.3853, DSSIM: 0.139405
Lanczos - MAE: 675.909 (0.0103137), PSNR: 32.9115, DSSIM: 0.140761

Ravu does much better scaling to Output though and FSR does better on the other test image.

@deus0ww
Copy link

@deus0ww deus0ww commented on 3fc1c1a Feb 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ravu-zoom-ar-r3-rgb.hook derives luma from rgb.... I would be surprised if just hooking it to CHROMA works.

@Jules-A
Copy link

@Jules-A Jules-A commented on 3fc1c1a Feb 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ravu-zoom-ar-r3-rgb.hook derives luma from rgb.... I would be surprised if just hooking it to CHROMA works.

Well, it clearly works:

Test images

EASU:
mpv-shot0001

RAVU:
mpv-shot0002

LANCZOS:
mpv-shot0003

but likely not as well as properly making it a chroma upscaler though I think it does better than the old chroma ravu scalers since they had so many offset issues and are years old.

@Jules-A
Copy link

@Jules-A Jules-A commented on 3fc1c1a Feb 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try this for upscale weight. It's not as sharp but zero blockiness for me.

It's bloody dreadful, kills a massive amount of detail and blurs a heap of content, you'd be far better off capping your FSR to a minimum of -4 weights (which ends up with less blocking than cfl's default without killing too much detail).

Default CFL upscaler - MAE: 618.029 (0.00943052)
Quadratic - MAE: 761.112 (0.0116138)

@deus0ww
Copy link

@deus0ww deus0ww commented on 3fc1c1a Feb 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jules-A
Copy link

@Jules-A Jules-A commented on 3fc1c1a Feb 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proper RAVU Zoom AR chroma:

It's only slightly better and about 10% heavier (I was honestly expecting it to be faster, only tested r3)

CfL w/ Ravu spatial: https://github.com/deus0ww/mpv-conf/blob/master/shaders/bilateral/CfL_Prediction_Ravu.glsl

Does not work very well at all, it can look good in some scenarios but it clips way too much off the edges that stuff just looks broken.

@deus0ww
Copy link

@deus0ww deus0ww commented on 3fc1c1a Feb 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but likely not as well as properly making it a chroma upscaler though I think it does better than the old chroma ravu scalers since they had so many offset issues.

The offset issues were from before mpv had //!OFFSET ALIGN, not a problem with ravu chroma itself.

... it isn't terrible but ravu-zoom-ar3 (rgb) does better doing the same method....
Does not work very well at all, it can look good in some scenarios but it clips way too much off the edges that stuff just looks broken.

I only made it because of your suggestion but now I'm liking it more and more...

@Jules-A
Copy link

@Jules-A Jules-A commented on 3fc1c1a Feb 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only made it because of your suggestion but now I'm liking it more and more...

I wasn't using it to scale all the way to luma though, I was scaling to output (1440p) and using cfl to scale to luma which was above that. It didn't suffer from the same issue though that was r3 not r2 like you are using and was just the repurposed rgb version so not sure if it was that.
Anyway, this thread is about the downscaling, not the upscaler.
On that regard I found a really bad scenario for the one I'm using and that is Junji Ito Collection - S1E13 end screen with all red with black shadowed text with aliasing and ringing. The default ds actually does better here even... Your current one is far worse, Cfl with ravu chroma as spatial is slightly better but Ravu-chroma does massively better here and catmull_rom with pixel-clipper on chroma only seems to do the best here though...

Test images Mine:

mpv-shot0001

Cfl-default:
mpv-shot0002

Your current:
mpv-shot0002

Ravu chroma as spatial:
mpv-shot0003

Ravu-r3-ar chroma:
mpv-shot0001

Catmull+PC:
mpv-shot0001

Not sure if it can even be fixed by changing downscaler but I considering LUMA_texOff(0) does slightly better I'm not sure if it's just the extra sharpness hurting it or something else.

EDIT: I managed to find one part of what was making it look worse. Since I was using your version as a base for speed reasons I was also using your version of getting the pp value which is actually a different output to the one cfl was using, after reverting I found edges were smoother.

@deus0ww
Copy link

@deus0ww deus0ww commented on 3fc1c1a Feb 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used R2 because R3 was too slow.

Try my downscaler again but with the weight changed to box. This should be the same as the old default but with non-integer scaling fixed.

I don't see how my 'pp' value can be different.

@Jules-A
Copy link

@Jules-A Jules-A commented on 3fc1c1a Feb 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how my 'pp' value can be different.

Honestly I have no idea, maybe precision or MPVs shader caching, either way I don't understand why it needs to be done in 3 variables even when using fract.

Try my downscaler again but with the weight changed to box.

Omg, I literally spent 5mins trying to decide which was better (vs quadratic) and I'd have to say they're equally terrible. It does beat the old cfl's ds code though when running it on your version (which is worse than master's upscaling).

I did try some other things to improve it and sampling y (in the same pass) definitely helps but it's obviously too slow in my current config and does slightly worse in test images (didn't test other content much but seems like a subjective improvement. Increasing AR, changing to FSR upscaling (with capping negative weights), decreasing mix_coeff all improve results in that scenario. Damn Chroma is so bloody tricky....

@deus0ww
Copy link

@deus0ww deus0ww commented on 3fc1c1a Feb 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jules-A
Copy link

@Jules-A Jules-A commented on 3fc1c1a Feb 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll have to compare properly when I get more time, btw I ended up dropping mix coeff to 0.81 and slightly increasing AR to 0.83 but that lost some sharpness so I ended up removing jinc window. Sampling Y was just a bit too costly directly from luma (even just dropping to half res) so wasn't really worth it. Mostly fixes that tough Junji Ito sample but does bring back some blocking though honestly it's not really a major issue until 10x factor which I don't do with this shader anyway.

@Jules-A
Copy link

@Jules-A Jules-A commented on 3fc1c1a Feb 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compare these two
@deus0ww

CfL_Prediction_Ravu.glsl is quite a bit better than the 2nd one, the shader is a little bit ringy still and is quite costly, though cheaper than r3-zoom on it's own. It still struggles terribly on the Junji Ito sample but I think pretty much anything CFL based does at this point.

Here's a sample:

Junji.Ito.Collection.-.S1E13.-.Tomie.Part.2.-.14-00.10.23.181-00.10.35.057.mp4

@deus0ww
Copy link

@deus0ww deus0ww commented on 3fc1c1a Feb 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try adjusting the antiring param higher. I'm out of ideas to try. At this point I'll probably be sticking with box downscale + ravu2 upscale.

@Jules-A
Copy link

@Jules-A Jules-A commented on 3fc1c1a Feb 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try adjusting the antiring param higher. I'm out of ideas to try. At this point I'll probably be sticking with box downscale + ravu2 upscale.

It does look slightly better replacing the downscaler with my own imo (and metrics) but honestly I don't think it's worth the cost. I think maybe if you want the ravu look you can do what I was doing before and scaling Ravu part way and apply it to Chroma (maybe just 2x with LITE since it should be better than zoom for linear scales).
Something like this:
ravu-zoom-ar-r3chromaxCFL.glsl.txt

rename (remove .txt, pastebin doesn't support such big files). Looks very good in quick tests but didn't test speed.

@deus0ww
Copy link

@deus0ww deus0ww commented on 3fc1c1a Feb 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try ravu2 one more time.... sorry, I forgot to commit the antiring code changes.

Edit: You may want to extract the ravu-zoom-ar-r2 chroma pass from cfl+ravu2 and test that separately. Unlike the normal one that uses luma to upscale chroma, mine upscale both chroma plane independently without using luma at all (that's why it's slower).

@Artoriuz
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It still struggles terribly on the Junji Ito sample but I think pretty much anything CFL based does at this point.

I really shouldn't be replying to this, but this is what the luma plane looks like in the video you provided:
luma

And this is what the respective chroma planes look like with constant luma:
chroma

As it's very easy to see, the chroma planes are relatively clean but the luma one isn't. It's full of weird bright pixels inside the character's shadows, which is why CfL is making chroma equally brighter there. This is just another case of you seeing issues because the source you're using has issues.

@Jules-A
Copy link

@Jules-A Jules-A commented on 3fc1c1a Feb 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The source definitely has issues (though it's not too surprising since it's CR source), it's ringy, aliased and the shadows are kind of weird and uneven it seems. Maybe trying to improve on that sample is a waste of time but I still feel like at least trying to make the worst case scenarios look a bit better is important, even if it wouldn't logically be correct. Basically I just don't think it should be looking subjectively worse than most of the inbuilt scalers (well all the ones I tried). Maybe it should be the job of the luma scaler but Ravu doesn't fully fix it and neither does ArtCNN_C4F32 so really not sure.

EDIT: I'm not saying to aim to get it looking perfect or anything, just enough so it doesn't immediately stand out as subjectively worse.

Please sign in to comment.