NeurIPS 2024
NaRCan a video editing framework, integrates a hybrid deformation field network with diffusion priors to address the challenge of maintaining the canonical image as a natural image.
# clone this repo
git clone https://github.com/koi953215/NaRCan.git
cd NaRCan
# create environment
conda create -n narcan python=3.10
conda activate narcan
pip install -r requirements.txt
Now, we need to use a technique similar to RealFill to finetune the diffusion model, which is the diffusion prior in our pipeline. (You can visit this repo for more details about the RealFill environment and operations.)
First, switch to the RealFill folder.
cd realfill
And initialize an 🤗Accelerate environment with:
accelerate config
Or for a default accelerate configuration without answering questions about your environment
accelerate config default
Or if your environment doesn't support an interactive shell e.g. a notebook
from accelerate.utils import write_basic_config
write_basic_config()
Uniformly sample 5~10 frames from your dataset (scene) and place them in the ref
folder. Next, put any single frame in the target
folder and name it target.png
(in practice, select the middle frame of your scene).
Note: please organize your dataset using the following folder structure.
data
└─── <your-scene-name>
├─── ref
│ └─── [any number of images]
└─── target
├─── target.png
└─── mask.png
Open scripts/train.sh
and make the following modifications.
export TRAIN_DIR="../data/<your-scene-name>"
export OUTPUT_DIR="../pth_file/<your-scene-name>-model"
After completing the above steps, we can begin fine-tuning our model. (Fine-tuning requires a large amount of GPU memory. If your GPU has limited memory, please refer to the RealFill GitHub, which provides detailed instructions on how to train on a low-memory GPU.)
bash scripts/train.sh
If you want, you can also run the following command to evaluate whether the fine-tuning was successful. The images generated by this command should be very similar to your target.png
bash scripts/test.sh
Now please return to the main NaRCan
directory and organize your dataset using the following folder structure.
data
└─── <your-scene-name>
└─── <your-scene-name>_all
└─── [your video frames]
The following command will help you complete data preprocessing
python create_separation.py -n <your-scene-name>
Start training the model
python models/homography.py -n <your-scene-name>
python train.py -n <your-scene-name> -dp <diffusion-path>
If you want to view the reconstruction results, please use the following command
python test.py -n <your-scene-name>
If you are not using Separated NaRCan (meaning you only have one canonical image), please skip the grid trick steps.
The canonical image will be stored in the output/<your-scene-name>/separate_n/original_canonical
. At this point, if there are multiple canonical images, we need to use the grid trick technique to ensure our edited canonical images maintain sufficient consistency after style transfer.
First, we need to combine multiple canonical images into a single grid
python make_grid.py -n <your-scene-name>
After obtaining the merge_canonical.png
through the above steps, use your preferred text prompts to transfer it using ControlNet.
Once you have the transferred canonical image, place it in output/<your-scene-name>/separate_<n>/edited_canonical
(Please note that the file name still needs to be maintained as merge_canonical.png
).
Finally, please execute the following command
python split_grid.py -n <your-scene-name>
python test_canonical.py -n <your-scene-name>
Please cite us if our work is useful for your research.
@article{chen2024narcan,
title={NaRCan: Natural Refined Canonical Image with Integration of Diffusion Prior for Video Editing},
author={Chen, Ting-Hsuan and Chan, Jiewen and Shiu, Hau-Shiang and Yen, Shih-Han and Yeh, Chang-Han and Liu, Yu-Lun},
journal={Advances in Neural Information Processing Systems},
year={2024}
}
This research was funded by the National Science and Technology Council, Taiwan, under Grants NSTC 112-2222-E-A49-004-MY2. The authors are grateful to Google, NVIDIA, and MediaTek Inc. for generous donations. Yu-Lun Liu acknowledges the Yushan Young Fellow Program by the MOE in Taiwan.