-
Notifications
You must be signed in to change notification settings - Fork 3
Thanks for the shout out <3 Also I was wondering if you could help me out with something #1
Comments
Figured you'd appreciate being linked to, didn't really feel right naming my repo "advanced noise" when it isn't really, well, advanced.
Fair, though I don't actually have a background in any of this AI stuff so I'd still double-check anything I say with GPT4 if I were you lol. Anyway, to address your question, I think the main problem with upscaling latents is that not all features map 1:1 properly. Especially around edges/borders/fine details, it seems to be non-linear, at least to some degree. This part makes sense right, how else would it fit a 3x512x512 image into 4x64x64. Another way to think of it is like this: If you grab a brush in photoshop and set it to 4px wide, then it will be the same width at any resolution, both 512x512 and 1024x1024. Now, if you upscale your 512x512 drawing to 1024x1024 and start drawing with your 4px brush, your lines will no longer line up to the old ones. The "VAE" expects all "lines" to look the same, regardless of resolution. Here's an example. This image is perfectly split in the middle and the res is divisible by 8, meaning even when it gets upscaled, it should retain that "split" in the middle. It works for black and white (mostly), but add any colors and suddenly the weird behavior appears again. I think this might be because the blue channel is negative in some of the latent channels, meaning to make the image "more" blue you have to make the actual numbers in the latent space smaller, but then again, no clue, I haven't messed with this enough to know and the SDXL latent space is completely different to begin with.
I know even less about image manipulation algorithms, but can't you directly run lanczos on your 4D vector if you're not turning it into an image? I doubt lanczos would mess up the details inside the latent space any less than bicubic does but who knows, might still be worth a try. Unless you mean fixing the issue with the details getting muddled, in which case I have zero idea. I guess you could get some content-aware algo going but that seems... painful.
Now, the problem here is that any time you force it back into the image, the "leftover noise" gets lost (I think). If I understand it correctly, your repo solves that by generating the same noise but at a different resolution, which is pretty genius. Your problem is the actual image part, which you'd have to somehow beat into the right shape to go into the second sampler. My idea/solution for something like this would be to train a neural network for it. Probably something like a modified ESRGAN that works on latents instead of images, trained from scratch. This would (probably) preserve enough of the noise + the details to not ruin your image, and would also avoid the VAE encode/decode stuff. Not sure how viable this is and it's not like I have an A100 to train on (all I have is 2xP40s and a crappy 10GB 3080 lol) but I can give it a shot if you want. Anyway, I hope my rambling helps you at least somewhat. If it doesn't just ask me again, I'm horrible with explanations but happy to help lmao |
Actually, wait, it doesn't even need ESRGAN. I can literally get a working upscaler with a 2MB model lol. module_list = [
nn.Conv2d(4, 64, kernel_size=5, padding=2),
nn.ReLU(),
nn.Upsample(scale_factor=2.0),
nn.ReLU(),
nn.Conv2d(64, 64, kernel_size=7, padding=3),
nn.ReLU(),
nn.Conv2d(64, 64, kernel_size=7, padding=3),
nn.ReLU(),
nn.Conv2d(64, 32, kernel_size=7, padding=3),
nn.ReLU(),
nn.Conv2d(32, 4, kernel_size=5, padding=2),
] |
That might explain why I had blue dots on my images during my attempts at creating usable noise. I was only able to solve that issue by "compressing towards zero" the values within the latent. Example:
Me neither as you might have guessed lol!
You helped me understand the problem better indeed and I thank you for you detailed answer! :)
WAIT WHAT?! How do you do that? I have yet to know how to do such things. |
I just changed the training code from my latent interposer to take v1 latents on both sides then "designed" and trained a small neural net (the code part) that scales it up by a fixed amount. I'll clean up the code a bit and cook up some models overnight. Should have your HQ latent upscaler by tomorrow ;D (Only real problem is that it can only do fixed ratios, would x1.25, x1.5 and x2.0 scaling be enough or should I do more?) |
That would be so awesome! Thank you!
Big max x4 if you can without bothering but these ratios seems to be pretty good already! And it is always possible to try to loop back to get bigger sizes. My perlin latent noise generator is quite limited regarding the ratios anyway too. |
4x is pushing it, my training code/dataset is probably just awful. Though you do make a good point, chaining two of the 2x ones should work as a stopgap for the madman who wants to directly 4x his latents lol. |
Yeah no then don't bother, I would rather loop through anyway! |
Greetings. I finished the models/repo. It's available here: https://github.com/city96/SD-Latent-Upscaler I have models for x1.25, x1.5 and x2.0, for both XL and v1.5. I teseted chaining two of the 2x ones for x4 and it works just fine. LMK if it works for your usecase. |
I was frenetically refreshing the page since yesterday lol thank you so much! |
No problem. These models are still relatively undertrained but from my testing they seems to work OK. Could've trained them longer but I just set them to a fixed epoch count then left to go to work hoping they'd be finished by the time I get back. Thankfully they were lol. SD XL behaves a bit weird, but then again, it always does. |
Well it is still a better way to upscale the latents that anything that has been made so far so congrats! |
I am currently trying to get results by combining it with the perlin-based noise generator but I wonder if I am not pushing it too far by multiplying the layers 6 times. While the overall pattern is still matching, I am getting results that so far are blurry. My generator is surely hard to set up correctly. I was able earlier to get, without upscale, correct results with 3 layers (I mean the "noise_iteration" value) after adding an option to make the mean value for each produced perlin layer to be at 0 instead of substracting it at the end. But trying to understand what "kind of mess" should be passed to the refiner after an upscale is finicky. |
If you send me a sample workflow I can check it out. I'm currently trying to further finetune the upscaler. It looks like I'll be able to fix the odd hue shift stuff at least. |
I think your main problem is that your first advanced K-sampler doesn't return the leftover noise. I don't think you can avoid doing that, even if you inject your own perlin noise? (The step count might also be messed up, I just converted those back to widgets for testing.) Not sure if it's possible to properly scale the actual leftover noise from the sampler. My crappy model certainly struggles to do it. You can re-inject your perlin noise, as long as you don't scale it: |
How the hell does this even work lmao. I would've expected it to destroy the image completely. I have a quick question. Is your perlin noise generator specific to SDXL? I was testing on v1 and noticed it was outputting garbage most of the time. Also, I got this semi-coherent example by upscaling the noise separately and slightly overlapping the two samplers, like mentioned in the "BlenderNeko/ComfyUI_Noise" readme/example, though this is on v1 with the BNK noisy latent image node. Anyway, I'm out of time for today. |
To not fool myself I decided to inject higher scale usual noise after the upscale for an honnest comparison. So here are the results of a usual noise use as if we never tried anything differently:It's just as smooth as usual SD1.5 even tho the model is good, the details seems to be smudged. Now a batch of fractals based, using your latent upscaler! :DI think that we can say that it works. |
@city96 The update also handles leftover noise quite well. Very clean, just the hue got changed a bit (but that's not a big deal for post processing). |
@ntdviet Sorry, I saw you started the proper discussion but didn't have the time to post it there as well. I'm glad it works, it seems a lot more flexible now but I didn't really test it with the leftover noise stuff. It's kinda amazing that it works, especially on XL (the v1 model still seems to outperform the xl model I think.). |
When using LatentUpscaler, a ratio of 1.25-2 is sufficient. Translated with www.DeepL.com |
Nx1080 is an odd resolution. We start with 512, 768 or 1024 for SDXL so if |
If the scaling is something like 1.2 / 1.4, then a 1080 image is perfectly fine. |
@suede299 The problem with resolutions like 1.2 and 1.4 is that they don't align with my training dataset properly. My max resolution is You could try out NNLatentUpscale which can scale by arbitrary amounts. It's probably better than my solution anyway, since it used visual loss during training instead of just estimating loss across the latents. |
That was a pleasant surprise! :D
I used GPT4 to help me out about upscaling latents through lanczos this afternoon.
You can find it here.
Now of course the results that I am getting are as blurry as you can imagine and so just as viable as any other method.
You can find a workflow here
I was visiting your repository, wondering if there would be a way to somehow rearrange the latent into an image so to be able to do a proper lanczos upscale and then "put it back" into a latent. Since you know a lot more than me about the subject, I figured I would just ask :)
There wouldn't even be a need to modify the datas to make them RGB, just to be able to use lanczos on them with the provided functions within my node and reshape it to the correct latent format.
This combined with a fractal noise generator would allow to upscale mid-generation and new levels of creativity. As for now my only option is to decode->upscale image with lanczos (I made two small nodes using PIL that you can find in my linked repository if you want)->reencode with the VAE and finish the work with the refiner. While this is not a completely bad method it is also definitely not fast. Given that my knowledge about how the VAE works is pretty low, I also suspect that doing this mid-generation might also reduce the quality or complexity. It is also pretty limited overall.
Here is a Perlin Merlin Rabbit who used the wrong spell:
The text was updated successfully, but these errors were encountered: