Project | OpenReview | arXiv | Talk | Slides
Pytorch implementation of our method for high-resolution (e.g. 1024x1024) and cross-domain video synthesis.
A Good Image Generator Is What You Need for High-Resolution Video Synthesis
Yu Tian1, Jian Ren2, Menglei Chai2, Kyle Olszewski2, Xi Peng3, Dimitris N. Metaxas1, Sergey Tulyakov2
1Rutgers Univeristy, 2Snap Inc., 3University of Delaware
In ICLR 2021, Spotlight.
UCF-101: image generator, video data, motion generator
FaceForensics: image generator, video data, motion generator
Sky-Timelapse: image generator, video data, motion generator
(FFHQ, VoxCeleb): FFHQ image generator, VoxCeleb, motion generator
(AFHQ, VoxCeleb): AFHQ image generator, VoxCeleb, motion generator
(Anime, VoxCeleb): Anime image generator, VoxCeleb, motion generator
(FFHQ-1024, VoxCeleb): FFHQ-1024 image generator, VoxCeleb, motion generator
(LSUN-Church, TLVDB): LSUN-Church image generator, TLVDB
Calculated pca stats are saved here.
Organise the video dataset as follows:
Video dataset
|-- video1
|-- img_0000.png
|-- img_0001.png
|-- img_0002.png
|-- ...
|-- video2
|-- img_0000.png
|-- img_0001.png
|-- img_0002.png
|-- ...
|-- video3
|-- img_0000.png
|-- img_0001.png
|-- img_0002.png
|-- ...
|-- ...
Collect the PCA components from a pre-trained image generator.
python get_stats_pca.py --batchSize 4000 \
--save_pca_path pca_stats/ucf_101 \
--pca_iterations 250 \
--latent_dimension 512 \
--img_g_weights /path/to/ucf_101_image_generator \
--style_gan_size 256 \
--gpu 0
Train the model
python -W ignore train.py --name ucf_101 \
--time_step 2 \
--lr 0.0001 \
--save_pca_path pca_stats/ucf_101 \
--latent_dimension 512 \
--dataroot /path/to/ucf_101 \
--checkpoints_dir checkpoints/ucf_101 \
--img_g_weights /path/to/ucf_101_image_generator \
--multiprocessing_distributed --world_size 1 --rank 0 \
--batchSize 16 \
--workers 8 \
--style_gan_size 256 \
--total_epoch 100 \
Inference
python -W ignore evaluate.py \
--save_pca_path pca_stats/ucf_101 \
--latent_dimension 512 \
--style_gan_size 256 \
--img_g_weights /path/to/ucf_101_image_generator \
--load_pretrain_path /path/to/checkpoints \
--load_pretrain_epoch the_epoch_for_testing (should >= 0) \
--results results/ucf_101 \
--num_test_videos 10 \
Collect the PCA components from a pre-trained image generator.
sh script/faceforensics/run_get_stats_pca.sh
Train the model
sh script/faceforensics/run_train.sh
Inference
sh script/faceforensics/run_evaluate.sh
Collect the PCA components from a pre-trained image generator.
sh script/sky_timelapse/run_get_stats_pca.sh
Train the model
sh script/sky_timelapse/run_train.sh
Inference
sh script/sky_timelapse/run_evaluate.sh
Collect the PCA components from a pre-trained image generator.
python get_stats_pca.py --batchSize 4000 \
--save_pca_path pca_stats/ffhq_256 \
--pca_iterations 250 \
--latent_dimension 512 \
--img_g_weights /path/to/ffhq_image_generator \
--style_gan_size 256 \
--gpu 0
Train the model
python -W ignore train.py --name ffhq_256-voxel \
--time_step 2 \
--lr 0.0001 \
--save_pca_path pca_stats/ffhq_256 \
--latent_dimension 512 \
--dataroot /path/to/voxel_dataset \
--checkpoints_dir checkpoints \
--img_g_weights /path/to/ffhq_image_generator \
--multiprocessing_distributed --world_size 1 --rank 0 \
--batchSize 16 \
--workers 8 \
--style_gan_size 256 \
--total_epoch 25 \
--cross_domain \
Inference
python -W ignore evaluate.py \
--save_pca_path pca_stats/ffhq_256 \
--latent_dimension 512 \
--style_gan_size 256 \
--img_g_weights /path/to/ffhq_image_generator \
--load_pretrain_path /path/to/checkpoints \
--load_pretrain_epoch the_epoch_for_testing (should >= 0) \
--results results/ffhq_256 \
--num_test_videos 10 \
Collect the PCA components from a pre-trained image generator.
sh script/ffhq-vox/run_get_stats_pca_1024.sh
Train the model
sh script/ffhq-vox/run_train_1024.sh
Inference
sh script/ffhq-vox/run_evaluate_1024.sh
Collect the PCA components from a pre-trained image generator.
sh script/afhq-vox/run_get_stats_pca.sh
Train the model
sh script/afhq-vox/run_train.sh
Inference
sh script/afhq-vox/run_evaluate.sh
Collect the PCA components from a pre-trained image generator.
sh script/anime-vox/run_get_stats_pca.sh
Train the model
sh script/anime-vox/run_train.sh
Inference
sh script/anime-vox/run_evaluate.sh
Collect the PCA components from a pre-trained image generator.
sh script/lsun_church-tlvdb/run_get_stats_pca.sh
Train the model
sh script/lsun_church-tlvdb/run_train.sh
Inference
sh script/lsun_church-tlvdb/run_evaluate.sh
If you wish to resume interupted training or fine-tune a pre-trained model, run (use UCF-101 as an example):
python -W ignore train.py --name ucf_101 \
--time_step 2 \
--lr 0.0001 \
--save_pca_path pca_stats/ucf_101 \
--latent_dimension 512 \
--dataroot /path/to/ucf_101 \
--checkpoints_dir checkpoints \
--img_g_weights /path/to/ucf_101_image_generator \
--multiprocessing_distributed --world_size 1 --rank 0 \
--batchSize 16 \
--workers 8 \
--style_gan_size 256 \
--total_epoch 100 \
--load_pretrain_path /path/to/checkpoints \
--load_pretrain_epoch 0
--w_residual
controls the step of motion residual, default value is 0.2, we recommand <= 0.5
--n_pca
# of PCA basis, used in the motion residual calculation, default value is 384 (out of 512 dim of StyleGAN2 w space), we recommand >= 256
--q_len
size of queue to save logits used in constrastive loss, default value is 4,096
--video_frame_size
spatial size of video frames for training, all synthesized video clips will be down-sampled to this size before feeding to the video discriminator, default value is 128, larger size may lead to better motion modeling
--cross_domain
activate for cross-domain video synthesis, default value is False
--w_match
weight for feature matching loss, default value is 1.0, large value improves content matching
In inference, you can generate long sequence by LSTM unrolling with --n_frames_G
python -W ignore evaluate.py \
--save_pca_path pca_stats/ffhq_256 \
--latent_dimension 512 \
--style_gan_size 256 \
--img_g_weights /path/to/ffhq_image_generator \
--load_pretrain_path /path/to/checkpoints \
--load_pretrain_epoch 0 \
--n_frames_G 32
In inference, you can generate long sequence by interpolation with --interpolation
python -W ignore evaluate.py \
--save_pca_path pca_stats/ffhq_256 \
--latent_dimension 512 \
--style_gan_size 256 \
--img_g_weights /path/to/ffhq_image_generator \
--load_pretrain_path /path/to/checkpoints \
--load_pretrain_epoch 0 \
--interpolation
If you use the code for your work, please cite our paper.
@inproceedings{
tian2021a,
title={A Good Image Generator Is What You Need for High-Resolution Video Synthesis},
author={Yu Tian and Jian Ren and Menglei Chai and Kyle Olszewski and Xi Peng and Dimitris N. Metaxas and Sergey Tulyakov},
booktitle={International Conference on Learning Representations},
year={2021},
url={https://openreview.net/forum?id=6puCSjH3hwA}
}
This code borrows StyleGAN2 Image Generator, BigGAN Discriminator, PatchGAN Discriminator.