Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UHD mode causing encoding failure. #4

Open
WubTheGame opened this issue Sep 3, 2022 · 1 comment
Open

UHD mode causing encoding failure. #4

WubTheGame opened this issue Sep 3, 2022 · 1 comment

Comments

@WubTheGame
Copy link

Simply put, having UHD mode active causes two failing attempts at encoding (I think, might be decoding) the first image, followed by the third image being a successful attempt at the first image. Unfortunately, this means the first working frame is named the third in the series (from 0 of course). And in my experience, this option also just reduces stability and consistency of success in general. Perhaps delete an unsuccessful file (or put it in the working location and move if/when it's saved properly) and/or make it more clear that it's not skipping frames due to failing processing other ones?

Also, what's this mode supposed to actually do anyway? Just wondering. And what's TTA?
Thanks in advance, and great work.

@NeedsMoar
Copy link

TL;DR - It's probably never worth using TTA. UHD is broken.

TTA is some kind of undocumented filtering that's normally done (on GPU) by:

  1. Creating 8 extra buffers to store copies of "spread out" normalized (0.0-1.0f) values of each pixel in, each in one of the locations originally surrounding it in a buffer corresponding to that pixel offset. At least I think that's what they were going for. Giving variables meaningful names is hard. Aside from eating 8x the memory, the copy might be fairly inefficient depending on how warps on vulkan work. It isn't bothering to attempt any kind of linear copy which you'd think it could do since it's just data shifted around slightly. Keep in mind it's possible I read things wrong and they're doing 3x supersampling, but it doesn't look like it and the lack of attempt to fix tile boundaries still makes it bad.
  2. Then the model is run 8x instead of once for each tile of the image. (It doesn't appear to be run on the copy that's in the original location).
  3. The pixel at the original location is replaced with the average of the pixels in the 8 buffers the model was run on surrounding it, but the original value that was at that spot is never used in the calculation, nor is a version run through the model.
  4. In practice this leads to an overall image which either looks oversharpened (if the model wasn't sensitive to the slight movement in pixels and produced nearly the same results for all of the locations) or something really blurry. I've actually never had it produce good results. Tried on the cartoon models and it degraded every one considerably. And bonus items:
  5. Since these models split images into tiles, and the pre / post process TTA runs on the tiles, they have a boundary of pixels that was arbitrarily darkened by the black border that resulted from padding the image.
  6. The pixel that was actually there is never taken into account by the model (at its original location) or in the final image, and the whole point of running computationally expensive ML instead of a faster approach is to try to get that part right. Unfortunately even on a stable model just moving a value in a matrix might be enough to give a different result from a matmul, especially at fp16.

Some variants have a CPU implementation as well, and I'm not in the mood to decipher it, the pointer math would be far easier to read in assembly language. Those have 3 nested for loops; color channel, then row, then column. The inner two loops each set their own set of 4 float pointers which are then used to construct a final value in some other location, because sure, why not store the colors in planar format like nobody has done since the 80s and specialized graphics file formats. This particular repo randomly decides to +1, -1, -1, +1 the first 4 values it uses to calculate the pixel, while realsr does +1, +1, -1, -1. I'm not sure if these are in any order related to their position because of the pointer mess so I won't attempt to come up with whatever convolution filter that is, but if they're in rough order this one produces an artifact-inducing blur where the other is a very slight unsharp mask type thing that seems to thin lines.

I don't know why UHD is doing that for you. I looked at the source for that, too. The command line processor sets a variable that's never used by anything again after parsing it, so it does nothing... that would be the expected behavior of using it. :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants