Official PyTorch implementation of our NeurIPS 2022 paper
Generative Visual Prompt: Unifying Distributional Control of Pre-Trained Generative Models
Chen Henry Wu, Saman Motamed, Shaunak Srivastava, Fernando De la Torre
Carnegie Mellon University
NeurIPS 2022
[Paper link] | [Website] | [Poster]
Generative models (e.g., GANs and diffusion models) learn the underlying data distribution in an unsupervised manner. However, many applications of interest require sampling from a specific region of the generative model's output space or evenly over a range of characteristics. To allow efficient sampling in these scenarios, we propose Generative Visual Prompt (PromptGen), a framework to achieve distributional control over pre-trained generative models by incorporating knowledge of arbitrary off-the-shelf models. PromptGen defines control as an energy-based model (EBM) and samples images in a feed-forward manner by approximating the EBM with invertible neural networks, avoiding optimization at inference. We demonstrate how PromptGen can control several generative models (e.g., StyleGAN2, diffusion autoencoder, StyleNeRF, NVAE) using various off-the-shelf models:
- With the CLIP model, PromptGen can sample images guided by the text.
- With image classifiers, PromptGen can de-bias generative models across a set of attributes.
- With inverse graphics models, PromptGen can sample images of the same identity in different poses.
- Finally, PromptGen reveals that the CLIP model shows "reporting bias" when used as control, and PromptGen can further de-bias this controlled distribution in an iterative manner.
PromptGen requires no training data, and the only supervision comes from off-the-shelf models that help define the control. It samples images in a feed-forward manner, which is highly efficient, and it also stands alone at inference, meaning that we can discard the off-the-shelf models after training. PromptGen not only offers generality for algorithmic design and modularity for control composition, but also enables iterative controls when some controls are contingent on others.
- Create an environment by running
conda env create -f environment.yml
conda activate generative_prompt
pip install git+https://github.com/openai/CLIP.git
- Install
torch
andtorchvision
based on your CUDA version. For instance, the following installation works for me.
pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0 --extra-index-url https://download.pytorch.org/whl/cu113
- Install PyTorch 3D. Installing this library can be painful, but you can skip it if you are not using StyleNeRF or the pose experiments.
- Set up wandb for logging (registration is required). You should modify the
setup_wandb
function inmain.py
to accomodate your wandb credentials.
We provide a unified interface for various pre-trained generative models. Checkpoints for generative models used in this paper are provided below.
- StyleGAN2
cd ckpts/
wget https://www.dropbox.com/s/iy0dkqnkx7uh2aq/ffhq.pt
wget https://www.dropbox.com/s/lmjdijm8cfmu8h1/metfaces.pt
wget https://www.dropbox.com/s/z1vts069w683py5/afhqcat.pt
wget https://www.dropbox.com/s/a0hvdun57nvafab/stylegan2-church-config-f.pt
wget https://www.dropbox.com/s/x1d19u8zd6yegx9/stylegan2-car-config-f.pt
wget https://www.dropbox.com/s/hli2x42ekdaz2br/landscape.pt
- Diffusion Autoencoder
cd ckpts/
wget https://www.dropbox.com/s/ej0jj8g7crvtb5e/diffae_ffhq256.ckpt
wget https://www.dropbox.com/s/w5y89y57r9nd1jt/diffae_ffhq256_latent.pkl
wget https://www.dropbox.com/s/rsbpxaswnfzsyl1/diffae_ffhq128.ckpt
wget https://www.dropbox.com/s/v1dvsj6oklpz652/diffae_ffhq128_latent.pkl
- StyleNeRF
cd ckpts/
wget https://www.dropbox.com/s/n80cr7isveh5yfu/StyleNeRF_ffhq_1024.pkl
- BigGAN
# BigGAN will be downloaded automatically
PromptGen allows us to use arbitrary off-the-shelf models to control pre-trained generative models. The off-the-shelf models used in this paper are provided below.
- CLIP
# CLIP will be downloaded automatically
- ArcFace IR-SE 50 model, provided by the Colab demo in this repo
cd ckpts/
wget https://www.dropbox.com/s/qg7co4azsv5sacm/model_ir_se50.pth
- DECA model, provided by this repo.
You should first download the FLAME model (registration is required),
choose FLAME 2020 and unzip it,
copy
generic_model.pkl
intomodel/lib/decalib/data/
, and then run the following command
wget https://www.dropbox.com/s/972j1vgfd19b6gx/deca_model.tar -O model/lib/decalib/data/deca_model.tar
- FairFace classifier, provided by this repo
cd ckpts/
wget https://www.dropbox.com/s/v1rp0uubk30esdh/res34_fair_align_multi_7_20190809.pt
- CelebA classifier, trained by ourselves
cd ckpts/
wget https://www.dropbox.com/s/yzc8ydaa4ggj1zs/celeba.zip
unzip celeba.zip
For the moment constraint experiments, one need to train the
- FFHQ (1024)
$\hat{\beta}$ model
cd ckpts/
wget https://www.dropbox.com/s/htdfv5w1xzsnajj/ffhq_debias.bin
- MetFaces (1024)
$\hat{\beta}$ model
cd ckpts/
wget https://www.dropbox.com/s/j2z9lha15mb2hfj/metfaces_debias.bin
Generally, each block stands for an experiment, with the following exceptions:
- Each set notation
{A,B,C}
stands for several independent experiments. You should always replace{A,B,C}
with one ofA
,B
, andC
. - In some cases, evaluation and plotting are separated from training. These cases are usually marked by
After convergence
. - For de-biasing with the moment constraint, the
(optional)
means that you can use the pre-trained$\hat{\beta}$ model following the instruction above. - For iterative control, all blocks should be run sequentially.
Each command can be run on a single NVIDIA RTX A4000 GPU.
Model checkpoints and image samples will be saved under --output_dir
.
export CUDA_VISIBLE_DEVICES=0
export RUN_NAME=clip_{a_baby,an_asian_man,a_girl,a_boy}_ffhq
export SEED=42
nohup python -m torch.distributed.launch --nproc_per_node 1 --master_port 1234 main.py --seed $SEED --cfg experiments/$RUN_NAME.cfg --run_name $RUN_NAME$SEED --logging_strategy steps --logging_first_step true --logging_steps 4 --evaluation_strategy steps --eval_steps 50 --metric_for_best_model CLIPEnergy --greater_is_better false --save_strategy steps --save_steps 50 --save_total_limit 1 --load_best_model_at_end --gradient_accumulation_steps 4 --num_train_epochs 10 --adafactor false --learning_rate 1e-3 --do_train --do_eval --output_dir output/$RUN_NAME$SEED --overwrite_output_dir --per_device_train_batch_size 2 --per_device_eval_batch_size 4 --eval_accumulation_steps 4 --ddp_find_unused_parameters true --verbose true > $RUN_NAME$SEED.log 2>&1 &
export CUDA_VISIBLE_DEVICES=0
export RUN_NAME=clip_a_baby_ffhq256_diffae
export SEED=42
nohup python -m torch.distributed.launch --nproc_per_node 1 --master_port 1433 main.py --seed $SEED --cfg experiments/$RUN_NAME.cfg --run_name $RUN_NAME$SEED --logging_strategy steps --logging_first_step true --logging_steps 4 --evaluation_strategy steps --eval_steps 50 --metric_for_best_model CLIPEnergy --greater_is_better false --save_strategy steps --save_steps 50 --save_total_limit 1 --load_best_model_at_end --gradient_accumulation_steps 4 --num_train_epochs 10 --adafactor false --learning_rate 1e-3 --do_train --do_eval --output_dir output/$RUN_NAME$SEED --overwrite_output_dir --per_device_train_batch_size 2 --per_device_eval_batch_size 4 --eval_accumulation_steps 4 --ddp_find_unused_parameters true --verbose true > $RUN_NAME$SEED.log 2>&1 &
export CUDA_VISIBLE_DEVICES=0
export RUN_NAME=clip_{autumn,winter}_scene_landscape
export SEED=42
nohup python -m torch.distributed.launch --nproc_per_node 1 --master_port 1234 main.py --seed $SEED --cfg experiments/$RUN_NAME.cfg --run_name $RUN_NAME$SEED --logging_strategy steps --logging_first_step true --logging_steps 4 --evaluation_strategy steps --eval_steps 50 --metric_for_best_model CLIPEnergy --greater_is_better false --save_strategy steps --save_steps 50 --save_total_limit 1 --load_best_model_at_end --gradient_accumulation_steps 2 --num_train_epochs 10 --adafactor false --learning_rate 1e-3 --do_train --do_eval --output_dir output/$RUN_NAME$SEED --overwrite_output_dir --per_device_train_batch_size 4 --per_device_eval_batch_size 8 --eval_accumulation_steps 4 --ddp_find_unused_parameters true --verbose true > $RUN_NAME$SEED.log 2>&1 &
export CUDA_VISIBLE_DEVICES=0
export RUN_NAME=clip_a_baby_ffhq1024_stylenerf
export SEED=42
nohup python -m torch.distributed.launch --nproc_per_node 1 --master_port 1234 main.py --seed $SEED --cfg experiments/$RUN_NAME.cfg --run_name $RUN_NAME$SEED --logging_strategy steps --logging_first_step true --logging_steps 4 --evaluation_strategy steps --eval_steps 50 --metric_for_best_model CLIPEnergy --greater_is_better false --save_strategy steps --save_steps 50 --save_total_limit 1 --load_best_model_at_end --gradient_accumulation_steps 4 --num_train_epochs 10 --adafactor false --learning_rate 1e-3 --do_train --do_eval --output_dir output/$RUN_NAME$SEED --overwrite_output_dir --per_device_train_batch_size 2 --per_device_eval_batch_size 4 --eval_accumulation_steps 4 --ddp_find_unused_parameters true --verbose true > $RUN_NAME$SEED.log 2>&1 &
BigGAN ImageNet (256) {"a photo of a glow and light dog", "an ink wash of a church near forest under moonlight", "a painting of a melancholy robot"}
export CUDA_VISIBLE_DEVICES=0
export RUN_NAME={clip_photo_of_a_glow_and_light_dog_biggan,clip_ink_wash_of_a_church_near_forest_under_moonlight_biggan,clip_painting_of_a_melancholy_robot_biggan}
export SEED=42
nohup python -m torch.distributed.launch --nproc_per_node 1 --master_port 1234 main.py --seed $SEED --cfg experiments/$RUN_NAME.cfg --run_name $RUN_NAME$SEED --logging_strategy steps --logging_first_step true --logging_steps 4 --evaluation_strategy steps --eval_steps 200 --metric_for_best_model CLIPEnergy --greater_is_better false --save_strategy steps --save_steps 200 --save_total_limit 1 --load_best_model_at_end --gradient_accumulation_steps 1 --num_train_epochs 15 --adafactor false --learning_rate 1e-3 --do_train --do_eval --output_dir output/$RUN_NAME$SEED --overwrite_output_dir --per_device_train_batch_size 4 --per_device_eval_batch_size 8 --eval_accumulation_steps 4 --ddp_find_unused_parameters true --verbose true > $RUN_NAME$SEED.log 2>&1 &
export CUDA_VISIBLE_DEVICES=0
export RUN_NAME=inverse_graphics_pose_ffhq
export SEED=42
nohup python -m torch.distributed.launch --nproc_per_node 1 --master_port 1234 main.py --seed $SEED --cfg experiments/$RUN_NAME.cfg --run_name $RUN_NAME$SEED --logging_strategy steps --logging_first_step true --logging_steps 4 --evaluation_strategy steps --eval_steps 100 --metric_for_best_model PoseEnergy --greater_is_better false --save_strategy steps --save_steps 100 --save_total_limit 1 --load_best_model_at_end --gradient_accumulation_steps 1 --num_train_epochs 10 --adafactor false --learning_rate 1e-3 --do_train --do_eval --output_dir output/$RUN_NAME$SEED --overwrite_output_dir --per_device_train_batch_size 4 --per_device_eval_batch_size 8 --eval_accumulation_steps 4 --ddp_find_unused_parameters true --verbose true > $RUN_NAME$SEED.log 2>&1 &
After convergence, evaluate and plot by running
export CUDA_VISIBLE_DEVICES=0
export RUN_NAME=inverse_graphics_pose_ffhq_test_fid
export SEED=42
nohup python -m torch.distributed.launch --nproc_per_node 1 --master_port 1234 main.py --seed $SEED --cfg experiments/$RUN_NAME.cfg --run_name $RUN_NAME$SEED --logging_strategy steps --logging_first_step true --logging_steps 4 --evaluation_strategy steps --eval_steps 100 --metric_for_best_model PoseEnergy --greater_is_better false --save_strategy steps --save_steps 100 --save_total_limit 1 --load_best_model_at_end --gradient_accumulation_steps 1 --num_train_epochs 0 --adafactor true --learning_rate 1e-3 --do_eval --output_dir output/$RUN_NAME$SEED --overwrite_output_dir --per_device_train_batch_size 4 --per_device_eval_batch_size 8 --eval_accumulation_steps 4 --ddp_find_unused_parameters true --verbose true --resume_from_checkpoint output/inverse_graphics_pose_ffhq42 > $RUN_NAME$SEED.log 2>&1 &
export CUDA_VISIBLE_DEVICES=0
export RUN_NAME=class_{male,female,female_glasses,female_noglasses,male_glasses,male_noglasses,female_young,female_old,male_young,male_old,noeyeglasses_young,noeyeglasses_old,eyeglasses_young,eyeglasses_old,blond_hair,noblond_hair,eyeglasses,noeyeglasses}
export SEED=42
nohup python -m torch.distributed.launch --nproc_per_node 1 --master_port 1234 main.py --seed $SEED --cfg experiments/$RUN_NAME.cfg --run_name $RUN_NAME$SEED --logging_strategy steps --logging_first_step true --logging_steps 4 --evaluation_strategy steps --eval_steps 50 --metric_for_best_model ClassEnergy --greater_is_better false --save_strategy steps --save_steps 50 --save_total_limit 1 --load_best_model_at_end --gradient_accumulation_steps 1 --num_train_epochs 10 --adafactor false --learning_rate 1e-3 --do_train --do_eval --do_predict --output_dir output/$RUN_NAME$SEED --overwrite_output_dir --per_device_train_batch_size 8 --per_device_eval_batch_size 4 --eval_accumulation_steps 4 --ddp_find_unused_parameters true --verbose true > $RUN_NAME$SEED.log 2>&1 &
This step can be skipped if you download the pre-trained
export CUDA_VISIBLE_DEVICES=0
export RUN_NAME=debias_ebm_ffhq
export SEED=42
nohup python -m torch.distributed.launch --nproc_per_node 1 --master_port 1234 main.py --seed $SEED --cfg experiments/$RUN_NAME.cfg --run_name $RUN_NAME$SEED --logging_strategy steps --logging_first_step true --logging_steps 4 --evaluation_strategy steps --eval_steps 100 --metric_for_best_model get_debias_ebm/neg_weighted_loss --greater_is_better true --save_strategy steps --save_steps 100 --save_total_limit 1 --load_best_model_at_end --gradient_accumulation_steps 1 --num_train_epochs 500 --adafactor false --learning_rate 5e-2 --do_train --do_eval --output_dir output/$RUN_NAME$SEED --overwrite_output_dir --per_device_train_batch_size 1024 --per_device_eval_batch_size 16 --eval_accumulation_steps 4 --ddp_find_unused_parameters true --verbose true > $RUN_NAME$SEED.log 2>&1 &
After convergence, move the trained checkpoint to ckpts/
by running
cd output/debias_ebm_ffhq42
scp -r pytorch_model.bin ../../ckpts/ffhq_debias.bin
This step can be skipped if you download the pre-trained
export CUDA_VISIBLE_DEVICES=0
export RUN_NAME=debias_ebm_metfaces
export SEED=42
nohup python -m torch.distributed.launch --nproc_per_node 1 --master_port 1234 main.py --seed $SEED --cfg experiments/$RUN_NAME.cfg --run_name $RUN_NAME$SEED --logging_strategy steps --logging_first_step true --logging_steps 4 --evaluation_strategy steps --eval_steps 100 --metric_for_best_model get_debias_ebm/neg_weighted_loss --greater_is_better true --save_strategy steps --save_steps 100 --save_total_limit 1 --load_best_model_at_end --gradient_accumulation_steps 1 --num_train_epochs 500 --adafactor false --learning_rate 5e-2 --do_train --do_eval --output_dir output/$RUN_NAME$SEED --overwrite_output_dir --per_device_train_batch_size 1024 --per_device_eval_batch_size 16 --eval_accumulation_steps 4 --ddp_find_unused_parameters true --verbose true > $RUN_NAME$SEED.log 2>&1 &
After convergence, move the trained checkpoint to ckpts/
by running
cd output/debias_ebm_metfaces42
scp -r pytorch_model.bin ../../ckpts/metfaces_debias.bin
export CUDA_VISIBLE_DEVICES=0
export RUN_NAME={race,age}_debias_{1,2}_ffhq
export SEED=42
nohup python -m torch.distributed.launch --nproc_per_node 1 --master_port 1234 main.py --seed $SEED --cfg experiments/$RUN_NAME.cfg --run_name $RUN_NAME$SEED --logging_strategy steps --logging_first_step true --logging_steps 4 --evaluation_strategy steps --eval_steps 200 --metric_for_best_model ffhq_debias/race_kl --greater_is_better false --save_strategy steps --save_steps 200 --save_total_limit 1 --load_best_model_at_end --gradient_accumulation_steps 32 --num_train_epochs 500 --adafactor false --learning_rate 1e-3 --do_train --do_eval --do_predict --output_dir output/$RUN_NAME$SEED --overwrite_output_dir --per_device_train_batch_size 8 --per_device_eval_batch_size 8 --eval_accumulation_steps 4 --ddp_find_unused_parameters true --verbose true > $RUN_NAME$SEED.log 2>&1 &
export CUDA_VISIBLE_DEVICES=0
export RUN_NAME={race,age,gender}_debias_{1,2}_metfaces
export SEED=42
nohup python -m torch.distributed.launch --nproc_per_node 1 --master_port 1234 main.py --seed $SEED --cfg experiments/$RUN_NAME.cfg --run_name $RUN_NAME$SEED --logging_strategy steps --logging_first_step true --logging_steps 4 --evaluation_strategy steps --eval_steps 200 --metric_for_best_model metfaces_debias/age_kl --greater_is_better false --save_strategy steps --save_steps 200 --save_total_limit 1 --load_best_model_at_end --gradient_accumulation_steps 32 --num_train_epochs 500 --adafactor false --learning_rate 1e-3 --do_train --do_eval --do_predict --output_dir output/$RUN_NAME$SEED --overwrite_output_dir --per_device_train_batch_size 8 --per_device_eval_batch_size 8 --eval_accumulation_steps 4 --ddp_find_unused_parameters true --verbose true > $RUN_NAME$SEED.log 2>&1 &
export CUDA_VISIBLE_DEVICES=0
export RUN_NAME=clip_a_person_without_makeup
export SEED=42
nohup python -m torch.distributed.launch --nproc_per_node 1 --master_port 1234 main.py --seed $SEED --cfg experiments/$RUN_NAME.cfg --run_name $RUN_NAME$SEED --logging_strategy steps --logging_first_step true --logging_steps 4 --evaluation_strategy steps --eval_steps 50 --metric_for_best_model CLIPEnergy --greater_is_better false --save_strategy steps --save_steps 50 --save_total_limit 1 --load_best_model_at_end --gradient_accumulation_steps 1 --num_train_epochs 10 --adafactor false --learning_rate 1e-3 --do_train --do_eval --do_predict --output_dir output/$RUN_NAME$SEED --overwrite_output_dir --per_device_train_batch_size 2 --per_device_eval_batch_size 4 --eval_accumulation_steps 4 --ddp_find_unused_parameters true --verbose true > $RUN_NAME$SEED.log 2>&1 &
After convergence, compute gender distribution by running
export CUDA_VISIBLE_DEVICES=0
export RUN_NAME=no_debias_person_without_makeup
export SEED=42
nohup python -m torch.distributed.launch --nproc_per_node 1 --master_port 1234 main.py --seed $SEED --cfg experiments/$RUN_NAME.cfg --run_name $RUN_NAME$SEED --logging_strategy steps --logging_first_step true --logging_steps 4 --evaluation_strategy steps --eval_steps 50 --metric_for_best_model CLIPEnergy --greater_is_better false --save_strategy steps --save_steps 50 --save_total_limit 1 --load_best_model_at_end --gradient_accumulation_steps 1 --num_train_epochs 0 --adafactor false --learning_rate 1e-3 --do_eval --do_predict --output_dir output/$RUN_NAME$SEED --overwrite_output_dir --per_device_train_batch_size 2 --per_device_eval_batch_size 4 --eval_accumulation_steps 4 --ddp_find_unused_parameters true --verbose true > $RUN_NAME$SEED.log 2>&1 &
(Preparation for iteration 2) Train $\hat{\beta}$ model for StyleGAN2 FFHQ (1024) "a photo of a person without makeup"
export CUDA_VISIBLE_DEVICES=0
export RUN_NAME=debias_ebm_person_without_makeup
export SEED=42
nohup python -m torch.distributed.launch --nproc_per_node 1 --master_port 1234 main.py --seed $SEED --cfg experiments/$RUN_NAME.cfg --run_name $RUN_NAME$SEED --logging_strategy steps --logging_first_step true --logging_steps 4 --evaluation_strategy steps --eval_steps 100 --metric_for_best_model get_debias_ebm/neg_weighted_loss --greater_is_better true --save_strategy steps --save_steps 100 --save_total_limit 1 --load_best_model_at_end --gradient_accumulation_steps 1 --num_train_epochs 500 --adafactor false --learning_rate 5e-2 --do_train --do_eval --output_dir output/$RUN_NAME$SEED --overwrite_output_dir --per_device_train_batch_size 1024 --per_device_eval_batch_size 16 --eval_accumulation_steps 4 --ddp_find_unused_parameters true --verbose true > $RUN_NAME$SEED.log 2>&1 &
After convergence, move the trained checkpoint to ckpts/
by running
cd output/debias_ebm_person_without_makeup42
scp -r pytorch_model.bin ../../ckpts/person_without_makeup_gender_debias.bin
export CUDA_VISIBLE_DEVICES=0
export RUN_NAME=gender_debias_2_person_without_makeup
export SEED=42
nohup python -m torch.distributed.launch --nproc_per_node 1 --master_port 1234 main.py --seed $SEED --cfg experiments/$RUN_NAME.cfg --run_name $RUN_NAME$SEED --logging_strategy steps --logging_first_step true --logging_steps 4 --evaluation_strategy steps --eval_steps 200 --metric_for_best_model ffhq_debias/gender_kl --greater_is_better false --save_strategy steps --save_steps 200 --save_total_limit 1 --load_best_model_at_end --gradient_accumulation_steps 32 --num_train_epochs 500 --adafactor false --learning_rate 1e-3 --do_train --do_eval --do_predict --output_dir output/$RUN_NAME$SEED --overwrite_output_dir --per_device_train_batch_size 8 --per_device_eval_batch_size 8 --eval_accumulation_steps 4 --ddp_find_unused_parameters true --verbose true > $RUN_NAME$SEED.log 2>&1 &
If you find this repository helpful, please cite it as
@inproceedings{promptgen2022,
title={Generative Visual Prompt: Unifying Distributional Control of Pre-Trained Generative Models},
author={Chen Henry Wu and Saman Motamed and Shaunak Srivastava and Fernando De la Torre},
booktitle={Thirty-Sixth Conference on Neural Information Processing Systems},
year={2022},
url={https://openreview.net/forum?id=Gsbnnc--bnw}
}
We use the X11 License. This license is identical to the MIT License, but with an extra sentence that prohibits using the copyright holders' names (Carnegie Mellon University in our case) for advertising or promotional purposes without written permission.
Issues are welcome if you have any questions about the code. If you would like to discuss the method, please contact Chen Henry Wu.