Skip to content

Latest commit

 

History

History
222 lines (166 loc) · 9.8 KB

README.md

File metadata and controls

222 lines (166 loc) · 9.8 KB

⛹️‍♀️:basketball: Stable-Diffusion-Playground :soccer:⛹️

License: MIT

An application that generates images or videos using Stable Diffusion models.

Description 📜

What is the term "diffusion"?

From Wikipedia, "Diffusion is the net movement of anything (for example, atoms, ions, molecules, energy) generally from a region of higher concentration to a region of lower concentration."

Similar to the definition, diffusion models apply noise to an image sequentially across multiple steps in forward pass. This essentially diffuses the pixels. In the backward pass, the noisy image is denoised across same steps. Since it is a sequential process, there is less chance of mode collapse (a problem with GANs) to occur.

Most diffusion models use UNet architecture to preserve the dimensionality of the image. Usually, diffusion models apply diffusion in pixel space, but stable diffusion models apply diffusion in latent space. Hence, the term "Latent diffusion model (LDM)". The conversion between pixel space to latent space is done using Encoder and Decoder. This method is memory efficient compared to previous methods, and also produces highly detailed image.

Read through the paper for more details. Big-ups to the researchers/creators for the work and for open-sourcing it.

General Requirements 🧙‍♂️

  • Atleast 6GB of VRAM is required to generate a single 512x512 image.
  • For better image generation, use descriptive and detailed prompt.

Code Requirements 🧙‍♀️

Use Python 3.8.13. Setup conda environment, git clone repo and run the below commands,

pip install -r requirements.txt
python setup.py
mkdir models
mkdir pretrained
cd animation_mode
python setup.py
cd ..

How to run 🏃‍♂️

Command line arguments:

Argument Requirement Default Choices Description
--mode / -m True - "txt2img", "img2img", "inpaint", "dream", "animate" Mode of application.
--local / -l False False True / False If argument is provided, use local model files. Else download from hugging face.
--device / -d False "cpu" "cpu", "gpu" Run on target device.
--num / -n False 1 integer number Number of images to generate.
--save / -s False False True / False If argument is provided, save generated images.
--limit / -limit False True True / False If argument is provided, limit memory usage.

There are five different modes of running the application,

  • Text to Image (txt2img)
  • Image to Image (img2img)
  • Inpaint (inpaint)
  • Dream (dream)
  • Animate (animate) - sub-modes: 2D, 3D, Video Input

Mode: Text to Image

python run.py --mode txt2img --device gpu --save

Mode: Image to Image

python run.py --mode img2img --device gpu --save

Mode: Inpaint

python run.py --mode inpaint --device gpu --save

Mode: Dream

python run.py --mode dream --device gpu --save --num <number of frames>

Mode: Animate

python run.py --mode animate --device gpu --save

Note:

  • For each of the modes, run the command and follow the cli to provide hugging face user token, prompt and size (Height, Width) of image.
  • Generated images or video will be saved to $PWD/images dir. For animate mode, video will be saved to $PWD/out_video dir.
  • Single 512x512 image generation takes ~12 seconds on NVIDIA GeForce RTX 3060 with 6GB VRAM.
  • Dream mode will generate --num image frames based on input prompt, and create a video.
  • Image to Image mode will generate new image from initial image and input prompt. Inpaint mode will generate the masked part of image from initial image, mask image and input prompt. The strength input in CLI will indicate the amount of change from initial image. In range [0, 1]; with 0 indicating no change and 1 indicating complete change from original image.

Hugging face Access Token:

  • Create an account in huggingface.co. Go to Settings -> Access Tokens. Create an access token with read permission.

How to use Animate mode 🖌️

This implemetation is an optimized version of DeforumStableDiffusionLocal and Deforum_Stable_Diffusion.ipynb. Thanks for their work.

Animate mode is quite different from the other modes of the app. Animate mode can generate "2D" or "3D" videos from input prompts. Also, it can perform Video-to-Video conversion of a "Video Input" based on input prompts.

To use this mode, follow the below steps,

Requirements

Clone the repo, and run the following cmds,

pip install -r requirements.txt
python setup.py
mkdir models
mkdir pretrained
cd animation_mode
python setup.py
cd ..

Next, manually download the models,

Animate mode uses configurations specified in ./animation_mode/config.py. Specify the configurations for video generation in this file. Refer animation_mode/README.md for details on parameters usage in config.py.

Run command

python run.py --mode animate --save

Generated video will be saved to ./out_video dir.

Results 📊

Text to Image

python run.py --mode txt2img --device gpu --num 1 --limit --save

Image to Image

python run.py --mode img2img --device gpu --num 1 --limit --save

CLI inputs:

Enter Hugging face user access token: <user access token>

Loading model...

Model loaded successfully

Enter initial image path: flower.png

Enter prompt: beautiful red flower, vibrant, realistic, smooth, bokeh, highly detailed, 4k

Enter strength in [0, 1] range: 0.8

Running Image to Image generation...

Inpaint

python run.py --mode inpaint --device gpu --num 1 --limit --save

CLI inputs:

Enter Hugging face user access token: <user access token>

Loading model...

Model loaded successfully

Enter initial image path: rose.png

Enter mask image path: mask_rose.png

Enter prompt: beautiful blue butterfly on a rose, glossy, detailed, sharp, 4k

Enter strength in [0, 1] range: 0.8

Running Inpaint...
Initial image Mask Inpainted image

Dream

python run.py --mode dream --device gpu --num 780 --limit --save

CLI inputs:

Enter Hugging face user access token: <user access token>

Loading model...

Model loaded successfully

Enter prompt: highly detailed bowl of lucrative ramen, stephen bliss, unreal engine, fantasy art by greg rutkowski, loish, rhads and lois van baarle, ilya kuvshinov, rossdraws, tom bagshaw, alphonse mucha, global illumination, detailed and intricate environment

Enter height and width of image: 512 512

Dreaming...
ramen.mp4

Animate

2D 3D
TODO boat_in_storm

References 📄

Happy Learning! 😄