Skip to content

[NeurIPS 2024] NaRCan: Natural Refined Canonical Image with Integration of Diffusion Prior for Video Editing

Notifications You must be signed in to change notification settings

koi953215/NaRCan

Repository files navigation

NaRCan: Natural Refined Canonical Image with Integration of Diffusion Prior for Video Editing

NeurIPS 2024

_ _ Hugging Face Spaces visitors

NaRCan a video editing framework, integrates a hybrid deformation field network with diffusion priors to address the challenge of maintaining the canonical image as a natural image.


Overview

Installation

# clone this repo
git clone https://github.com/koi953215/NaRCan.git
cd NaRCan

# create environment
conda create -n narcan python=3.10
conda activate narcan
pip install -r requirements.txt

Preprocessing (LoRA Fine-tuning)

Now, we need to use a technique similar to RealFill to finetune the diffusion model, which is the diffusion prior in our pipeline. (You can visit this repo for more details about the RealFill environment and operations.)

First, switch to the RealFill folder.

cd realfill

And initialize an 🤗Accelerate environment with:

accelerate config

Or for a default accelerate configuration without answering questions about your environment

accelerate config default

Or if your environment doesn't support an interactive shell e.g. a notebook

from accelerate.utils import write_basic_config
write_basic_config()

Uniformly sample 5~10 frames from your dataset (scene) and place them in the ref folder. Next, put any single frame in the target folder and name it target.png (in practice, select the middle frame of your scene).

Note: please organize your dataset using the following folder structure.

data
└─── <your-scene-name>
    ├─── ref
    │    └─── [any number of images]
    └─── target
         ├─── target.png
         └─── mask.png

Open scripts/train.sh and make the following modifications.

export TRAIN_DIR="../data/<your-scene-name>"
export OUTPUT_DIR="../pth_file/<your-scene-name>-model"

After completing the above steps, we can begin fine-tuning our model. (Fine-tuning requires a large amount of GPU memory. If your GPU has limited memory, please refer to the RealFill GitHub, which provides detailed instructions on how to train on a low-memory GPU.)

bash scripts/train.sh

If you want, you can also run the following command to evaluate whether the fine-tuning was successful. The images generated by this command should be very similar to your target.png

bash scripts/test.sh

Train a new model

Now please return to the main NaRCan directory and organize your dataset using the following folder structure.

data
└─── <your-scene-name>
    └─── <your-scene-name>_all
         └─── [your video frames]

The following command will help you complete data preprocessing

python create_separation.py -n <your-scene-name>

Start training the model

python models/homography.py -n <your-scene-name>
python train.py -n <your-scene-name> -dp <diffusion-path>

Test reconstruction

If you want to view the reconstruction results, please use the following command

python test.py -n <your-scene-name>

Test video translation

If you are not using Separated NaRCan (meaning you only have one canonical image), please skip the grid trick steps.

The canonical image will be stored in the output/<your-scene-name>/separate_n/original_canonical. At this point, if there are multiple canonical images, we need to use the grid trick technique to ensure our edited canonical images maintain sufficient consistency after style transfer.

First, we need to combine multiple canonical images into a single grid

python make_grid.py -n <your-scene-name>

After obtaining the merge_canonical.png through the above steps, use your preferred text prompts to transfer it using ControlNet.

Once you have the transferred canonical image, place it in output/<your-scene-name>/separate_<n>/edited_canonical (Please note that the file name still needs to be maintained as merge_canonical.png).

Finally, please execute the following command

python split_grid.py -n <your-scene-name>
python test_canonical.py -n <your-scene-name>

Citation

Please cite us if our work is useful for your research.

@article{chen2024narcan,
  title={NaRCan: Natural Refined Canonical Image with Integration of Diffusion Prior for Video Editing},
  author={Chen, Ting-Hsuan and Chan, Jiewen and Shiu, Hau-Shiang and Yen, Shih-Han and Yeh, Chang-Han and Liu, Yu-Lun},
  journal={Advances in Neural Information Processing Systems},
  year={2024}
}

Acknowledgement

This research was funded by the National Science and Technology Council, Taiwan, under Grants NSTC 112-2222-E-A49-004-MY2. The authors are grateful to Google, NVIDIA, and MediaTek Inc. for generous donations. Yu-Lun Liu acknowledges the Yushan Young Fellow Program by the MOE in Taiwan.

About

[NeurIPS 2024] NaRCan: Natural Refined Canonical Image with Integration of Diffusion Prior for Video Editing

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published