Skip to content

Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model

License

Notifications You must be signed in to change notification settings

PKU-YuanGroup/WF-VAE

Repository files navigation

If you like our project, please give us a star ⭐ on GitHub for latest update.

hf

License Hits GitHub repo stars

💡 I also have other projects that may interest you ✨.

Open-Sora-Plan
github github

📰 News

  • [2024.11.27] 🔥🔥🔥 We have published our report, which provides comprehensive training details and includes additional experiments.
  • [2024.11.25] 🔥🔥🔥 We have released our 16-channel WF-VAE-L model along with the training code. Welcome to download it from Huggingface.

😮 Highlights

WF-VAE utilizes a multi-level wavelet transform to construct an efficient energy pathway, enabling low-frequency information from video data to flow into latent representation. This method achieves competitive reconstruction performance while markedly reducing computational costs.

💡 Simpler Architecture, Faster Encoding

  • This architecture substantially improves speed and reduces training costs in large-scale video generation models and data processing workflows.

🔥 Competitive Reconstruction Performance with SOTA VAEs

  • Our experiments demonstrate competitive performance of our model against SOTA VAEs.

🚀 Main Results

Reconstruction

WF-VAE CogVideoX
WF-VAE CogVideoX

Efficiency

We conduct efficiency tests at 33-frame videos using float32 precision on an H100 GPU. All models operated without block-wise inference strategies. Our model demonstrated performance comparable to state-of-the-art VAEs while significantly reducing encoding costs.

🛠️ Requirements and Installation

git clone https://github.com/PKU-YuanGroup/WF-VAE
cd WF-VAE
conda create -n wfvae python=3.10 -y
conda activate wfvae
pip install -r requirements.txt

🤖 Reconstructing Video or Image

To reconstruct a video or an image, execute the following commands:

Video Reconstruction

CUDA_VISIBLE_DEVICES=1 python scripts/recon_single_video.py \
    --model_name WFVAE \
    --from_pretrained "Your VAE" \
    --video_path "Video Path" \
    --rec_path rec.mp4 \
    --device cuda \
    --sample_rate 1 \
    --num_frames 65 \
    --height 512 \
    --width 512 \
    --fps 30 \
    --enable_tiling

Image Reconstruction

CUDA_VISIBLE_DEVICES=1 python scripts/recon_single_image.py \
    --model_name WFVAE \
    --from_pretrained "Your VAE" \
    --image_path assets/gt_5544.jpg \
    --rec_path rec.jpg \
    --device cuda \
    --short_size 512 

For further guidance, refer to the example scripts: examples/rec_single_video.sh and examples/rec_single_image.sh.

🗝️ Training & Validating

The training & validating instruction is in TRAIN_AND_VALIDATE.md.

👍 Acknowledgement

✏️ Citation

If you find our paper and code useful in your research, please consider giving a star ⭐ and citation 📝.

@misc{li2024wfvaeenhancingvideovae,
      title={WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model}, 
      author={Zongjian Li and Bin Lin and Yang Ye and Liuhan Chen and Xinhua Cheng and Shenghai Yuan and Li Yuan},
      year={2024},
      eprint={2411.17459},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2411.17459}, 
}

🔒 License

This project is released under the Apache 2.0 license as found in the LICENSE file.

About

Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages