Release v0.2.0: Stable Diffusion early access, K-LMS sampling · huggingface/diffusers

Stable Diffusion

Stable Diffusion is a text-to-image latent diffusion model created by the researchers and engineers from CompVis, Stability AI and LAION. It's trained on 512x512 images from a subset of the LAION-5B database. This model uses a frozen CLIP ViT-L/14 text encoder to condition the model on text prompts. With its 860M UNet and 123M text encoder, the model is relatively lightweight and runs on a GPU with at least 10GB VRAM.
See the model card for more information.

The Stable Diffusion weights are currently only available to universities, academics, research institutions and independent researchers. Please request access applying to this form

from torch import autocast
from diffusers import StableDiffusionPipeline

# make sure you're logged in with `huggingface-cli login`
pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-3-diffusers", use_auth_token=True)  

prompt = "a photograph of an astronaut riding a horse"
with autocast("cuda"):
    image = pipe(prompt, guidance_scale=7)["sample"][0]  # image here is in PIL format
    
image.save(f"astronaut_rides_horse.png")

K-LMS sampling

The new LMSDiscreteScheduler is a port of k-lms from k-diffusion by Katherine Crowson.
The scheduler can be easily swapped into existing pipelines like so:

from diffusers import StableDiffusionPipeline, LMSDiscreteScheduler

model_id = "CompVis/stable-diffusion-v1-3-diffusers"
# Use the K-LMS scheduler here instead
scheduler = LMSDiscreteScheduler(beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear", num_train_timesteps=1000)
pipe = StableDiffusionPipeline.from_pretrained(model_id, scheduler=scheduler, use_auth_token=True)

Integration test with text-to-image script of Stable-Diffusion

#182 and #186 make sure that DDIM and PNDM/PLMS scheduler yield 1-to-1 the same results as stable diffusion.
Try it out yourself:

In Stable-Diffusion:

python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --n_samples 4 --n_iter 1 --fixed_code --plms

python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --n_samples 4 --n_iter 1 --fixed_code

In diffusers:

from diffusers import StableDiffusionPipeline, DDIMScheduler
from time import time
from PIL import Image
from einops import rearrange
import numpy as np
import torch
from torch import autocast
from torchvision.utils import make_grid

torch.manual_seed(42)

prompt = "a photograph of an astronaut riding a horse"
#prompt = "a photograph of the eiffel tower on the moon"
#prompt = "an oil painting of a futuristic forest gives"

# uncomment to use DDIM
# scheduler = DDIMScheduler(beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear", clip_sample=False, set_alpha_to_one=False)
# pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-3-diffusers", use_auth_token=True, scheduler=scheduler)  # make sure you're logged in with `huggingface-cli login`

pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-3-diffusers", use_auth_token=True)  # make sure you're logged in with `huggingface-cli login`

all_images = []
num_rows = 1
num_columns = 4
for _ in range(num_rows):
    with autocast("cuda"):
        images = pipe(num_columns * [prompt], guidance_scale=7.5, output_type="np")["sample"]  # image here is in [PIL format](https://pillow.readthedocs.io/en/stable/)
        all_images.append(torch.from_numpy(images))

# additionally, save as grid
grid = torch.stack(all_images, 0)
grid = rearrange(grid, 'n b h w c -> (n b) h w c')
grid = rearrange(grid, 'n h w c -> n c h w')
grid = make_grid(grid, nrow=num_rows)

# to image
grid = 255. * rearrange(grid, 'c h w -> h w c').cpu().numpy()
image = Image.fromarray(grid.astype(np.uint8))

image.save(f"./images/diffusers/{'_'.join(prompt.split())}_{round(time())}.png")

Improvements and bugfixes

Allow passing non-default modules to pipeline by @pcuenca in #188
Add K-LMS scheduler from k-diffusion by @anton-l in #185
[Naming] correct config naming of DDIM pipeline by @patrickvonplaten in #187
[PNDM] Stable diffusion by @patrickvonplaten in #186
[Half precision] Make sure half-precision is correct by @patrickvonplaten in #182
allow custom height, width in StableDiffusionPipeline by @patil-suraj in #179
add tests for stable diffusion pipeline by @patil-suraj in #178
Stable diffusion pipeline by @patil-suraj in #168
[LDM pipeline] fix eta condition. by @patil-suraj in #171
[PNDM in LDM pipeline] use inspect in pipeline instead of unused kwargs by @patil-suraj in #167
allow pndm scheduler to be used with ldm pipeline by @patil-suraj in #165
add scaled_linear schedule in PNDM and DDPM by @patil-suraj in #164
add attention up/down blocks for VAE by @patil-suraj in #161
Add an alternative Karras et al. stochastic scheduler for VE models by @anton-l in #160
[LDMTextToImagePipeline] make text model generic by @patil-suraj in #162
Minor typos by @pcuenca in #159
Fix arg key for dataset_name in create_model_card by @pcuenca in #158
[VAE] fix the downsample block in Encoder. by @patil-suraj in #156
[UNet2DConditionModel] add cross_attention_dim as an argument by @patil-suraj in #155
Added diffusers to conda-forge and updated README for installation instruction by @sugatoray in #129
Add issue templates for feature requests and bug reports by @osanseviero in #153
Support training with a local image folder by @anton-l in #152
Allow DDPM scheduler to use model's predicated variance by @eyalmazuz in #132

Full Changelog: 0.1.3...v0.2.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.2.0: Stable Diffusion early access, K-LMS sampling

Stable Diffusion

K-LMS sampling

Integration test with text-to-image script of Stable-Diffusion

Improvements and bugfixes

Contributors