-
Notifications
You must be signed in to change notification settings - Fork 201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhancement Suggestions: Mask for Stable Diffusion, Dynamic Resizing Based on VRAM, and Clarification on Diffuser Integration #79
Comments
Thank you for your suggestion Mask for Stable DiffusionThis function is easy to implement based on blended latent diffusion.
after Rerender_A_Video/src/ddim_v_hacked.py Lines 322 to 323 in fcb7431
You can apply blending on specific steps or all steps to find a better results. The difficulty lies in how to obtain the masks in a video. Dynamic Resizing Based on VRAMThank you for your suggestions! Maybe you can submit a pull request? Clarification on Diffuser Integrationebsynth will not be integrated in diffusers. |
I went through your recommendations for implementing inpaiting using a mask, and I have a couple of questions:
|
Yes, they are the same.
mask should be a tensor with values of 1 and 0. Corresponding the your example 2
the size of the mask should be [1, 1, 72, 64] if img is of [1, 4, 72, 64] you need to modify the Rerender_A_Video/src/ddim_v_hacked.py Line 158 in fcb7431
and Rerender_A_Video/src/ddim_v_hacked.py Line 239 in fcb7431
to add a new input parameter named M and load M when you load video frames (which looks like frame = cv2.imread(imgs[i]) in the code)and feed M into ddim_v_sampler.sample(...) in the code
|
Thank you very much. I have the same need. Can this functionality be added to the project, requiring only file input parameters. Regarding mask information, we can output it into a video or picture sequence through professional software |
I have created this pull request, if the project owner approves the new changes, then the project can be launched on devices with small VRAM. |
The pull request is under review by @SingleZombie. We are under other deadline pressure, so it will take time. I think @lymanzhao 's need here is to add masking functionality . However, I'm busy with another project and have no time to add masking functionality recently. |
I examined the results of applying a mask in the project and noticed that while the intended region within the mask was altered significantly, the area outside the mask also underwent slight changes. How can we address this to ensure only the masked region is affected?
I have this params in config:
Code in Rerender_A_Video/src/ddim_v_hacked.py line 323:
I read mask as 0 and 1:
|
currently i am running with 1280x720 video it is using 24 GB of RTX 3090 ti + 12 GB shared VRAM is this expected? default settings |
@wladradchenko Did you add your code inside the Rerender_A_Video/src/ddim_v_hacked.py Line 309 in fcb7431
if block of Rerender_A_Video/src/ddim_v_hacked.py Line 315 in fcb7431
If so, the original x0 will be only applied to the mask_period = (0.5, 0.8) steps.To ensure the consistency, you can set a some hyperparamter like inpainting_mask_period=(0, 1) that ensures your added code is applied untill the final step (the begin step 0 can be tuned)That means in Line 324, outside the if block, you can add
And in the original paper of blended latent diffusion, it further optimizes the decoder to fit the input image in the unmasked region, which is time-consuming for video processing. |
@williamyang1991 thank u. Now work fine. For suggested dynamically size by VRAM @FurkanGozukara if 24 GB VRAM when limit video resolution will be 1280x1280 fine (for that video it will be 1280x720) if set option |
so 24 GB working for you for 1280x1280? how is this possible it is using 24 + 12 shared on windows for me for 1280x720 what is your pip freeze? |
I use cuda 11.8, xformers and torch 2.0.0 for cuda 11.8. Also I have cuDNN. I don't use pip freeze because I think what it is bad practice, I set limit for libs if version need to freeze, and I don't include sub libs. And because I have experimented with difference approach my pip freeze will be not correct. I would also like to note that we are not talking about RAM, but about VRAM. If you have multiple GPUs on a device, you can use the device to select your GPU, for example |
Hello,
Today I found out about your project and have been using it a bit using example config files and have some feedback and suggestions for possible improvements.
Mask for Stable Diffusion:
I'm curious if there's a way to use masks for achieving more stable diffusion on inpating target object. It would be a beneficial feature to have, especially for more complex scenarios.
Dynamic Resizing Based on VRAM:
The README mentions a requirement of 24GB VRAM, which might not be feasible for all users.
Suggestion: Offer a way to control the resolution/size dynamically based on the user's available GPU VRAM. This way, users with lower VRAM can still utilize the project without running into memory issues.
As a reference, I've implemented a example function in the project to dynamically resize images based on available VRAM. Here's a snippet for reference:
and after I use resize in
video_util.py
:Note: I've tested this on an RTX 3090 with 8GB VRAM with torch==2.0.1 cuda11.8 and xformers==0.0.21, and it seems to work as intended.
Clarification on Diffuser Integration:
The project mentions integration with diffuser. Does this mean that ebsynth will be built into the virtual environment library? And if it will be inside diffuser, you will use
from diffusers import ControlNetModel, StableDiffusionControlNetInpaintPipeline, UniPCMultistepScheduler
without git clone ControlNet repo? Some clarity on this would be appreciated.Thanks for the hard work on this project, and I look forward to future updates!
The text was updated successfully, but these errors were encountered: