- [12/2024] Extension paper has been accepted by IJCV.
- [02/2024] Dataset link has been updated with hugginface.
- [09/2023] Arxiv extension paper released.
- [07/2022] Pretrained models are uploaded.
- [07/2022] Project page and dataset are released.
- [07/2022] Code is released.
This is the official implementation of Detecting and Recovering Sequential DeepFake Manipulation. We introduce a novel research problem: Detecting Sequential DeepFake Manipulation (Seq-DeepFake), which focus on detecting the sequences of multi-step facial manipulations. To faciliatate the study of Seq-Deepfake, we provide a large-scale Sequential Deepfake Dataset, and propose a concise yet effective Seq-DeepFake Transformer (SeqFakeFormer).
The framework of the proposed method:
git clone https://github.com/rshao/SeqDeepFake.git
cd SeqDeepFake
We recommend using Anaconda to manage the python environment:
conda create -n seqdeepfake python=3.6
conda activate seqdeepfake
conda install -c pytorch pytorch=1.6.0 torchvision=0.7.0 cudatoolkit==10.1.243
conda install pandas
conda install tqdm
conda install pillow
pip install tensorboard==2.4.1
We contribute the first large-scale Sequential DeepFake Dataset, Seq-Deepfake, including ~85k sequentially manipulated face images, each annotated with its ground-truth manipulation sequence.
The images are generated based on the following two different facial manipulation methods, with 28 / 26 types of manipulation sequences (including original), repectively. The lengths of all manipulation sequences range from 1~5.
- Sequential facial components manipulation (based on CelebAMask-HQ and StyleMapGAN)
- Sequential facial attributes manipulation (based on FFHQ and Talk-To-Edit)
Here are some sample images and statistics:
Each image in the dataset is annotated with a list of length 5, indicating the ground-truth manipulation sequence. The labels in the sequence are defined as follows:
For Sequential facial components manipulation:
0: 'NA', 1: 'nose', 2: 'eye', 3: 'eyebrow', 4: 'lip', 5: 'hair'
Note: 'NA' means no manipulation is taken in this step.
For Sequential facial attributes manipulation:
0: 'NA', 1: 'Bangs', 2: 'Eyeglasses', 3: 'Beard', 4: 'Smiling', 5: 'Young'
Note: 'NA' means no manipulation is taken in this step.
Note that label 0
serves as the placeholder for sequential manipulations shorter than 5 steps. For example, the annotation for manipulation sequence nose-eye-lip
would be: [1, 2, 4, 0, 0]
. Original images are annotated with [0, 0, 0, 0, 0]
.
You can download the Seq-Deepfake dataset through this link: [Dataset]
After unzip all sub files, the structure of the dataset should be as follows:
./
├── facial_attributes
│ ├── annotations
│ | ├── train.csv
│ | ├── test.csv
│ | └── val.csv
│ └── images
│ ├── train
│ │ ├── Bangs-Eyeglasses-Smiling-Young
│ │ | ├── xxxxxx.jpg
| | | ...
| | | └── xxxxxx.jpg
| | ...
│ │ ├── Young-Smiling-Eyeglasses
│ │ | ├── xxxxxx.jpg
| | | ...
| | | └── xxxxxx.jpg
│ │ └── original
│ │ ├── xxxxxx.jpg
| | ...
| | └── xxxxxx.jpg
│ ├── test
│ │ % the same structure as in train
│ └── val
│ % the same structure as in train
└── facial_components
├── annotations
| ├── train.csv
| ├── test.csv
| └── val.csv
└── images
├── train
│ ├── eyebrow-eye-hair-nose-lip
│ | ├── xxxxxx.jpg
| | ...
| | └── xxxxxx.jpg
| ...
│ ├── nose-eyebrow-lip-eye-hair
│ | ├── xxxxxx.jpg
| | ...
| | └── xxxxxx.jpg
│ └── original
│ ├── xxxxxx.jpg
| ...
| └── xxxxxx.jpg
├── test
│ % the same structure as in train
└── val
% the same structure as in train
Modify train.sh
and run:
sh train.sh
Please refer to the following instructions about some arguments:
Args | Description |
---|---|
CONFIG | Path of the network and optimization configuration file. |
DATA_DIR | Directory to the downloaded dataset. |
DATASET_NAME | Name of the selected manipulation type. Choose from 'facial_components' and 'facial_attributes'. |
RESULTS_DIR | Directory to save logs and checkpoints. |
You can change the network and optimization configurations by adding new configuration files under the directory ./configs/
.
We also provide slurm script that supports multiple GPUs training:
sh train_slurm.sh
where PARTITION
and NODE
should be modified according to your own environment. The number of GPUs to be used can be set through the NUM_GPU
argument.
Modify test.sh
and run:
sh test.sh
For the arguments in test.sh
, please refer to the training instructions above, plus the following ones:
Args | Description |
---|---|
TEST_TYPE | The evaluation metrics to use. Choose from 'fixed' and 'adaptive'. |
LOG_NAME | Should be set according to the log_name of your trained checkpoint to be tested. |
We also provide slurm script for testing:
sh test_slurm.sh
Here we list the performance of three SOTA deepfake detection methods and our method. Please refer to our paper for more details.
Method | Reference | Fixed-Acc |
Adaptive-Acc |
---|---|---|---|
DRN | Wang et al. | 66.06 | 45.79 |
MA | Zhao et al. | 71.31 | 52.94 |
Two-Stream | Luo et al. | 71.92 | 53.89 |
SeqFakeFormer | Shao et al. | 72.65 | 55.30 |
Method | Reference | Fixed-Acc |
Adaptive-Acc |
---|---|---|---|
DRN | Wang et al. | 64.42 | 43.20 |
MA | Zhao et al. | 67.58 | 47.48 |
Two-Stream | Luo et al. | 66.77 | 46.38 |
SeqFakeFormer | Shao et al. | 68.86 | 49.63 |
We also provide the pretrained models that generate our results in the benchmark table:
Model | Description |
---|---|
pretrained-r50-c | Trained on facial_components with resnet50 backbone. |
pretrained-r50-a | Trained on facial_attributes with resnet50 backbone. |
In order to try the pre-trained checkpoints, please:
-
download from the links in the table, unzip the file and put them under the
./results
folder with the following structure:results └── resnet50 ├── facial_attributes │ └── pretrained-r50-a │ └── snapshots │ ├── best_model_adaptive.pt │ └── best_model_fixed.pt └── facial_components └── pretrained-r50-c └── snapshots ├── best_model_adaptive.pt └── best_model_fixed.pt
-
In
test.sh
, modifyDATA_DIR
to the root of your Seq-DeepFake dataset. ModifyLOGNAME
andDATASET_NAME
to'pretrained-r50-c'
,'facial_components'
or'pretrained-r50-a'
,'facial_attributes'
, respectively. -
Run
test.sh
.
If you find this work useful for your research, please kindly cite our paper:
@inproceedings{shao2022seqdeepfake,
title={Detecting and Recovering Sequential DeepFake Manipulation},
author={Shao, Rui and Wu, Tianxing and Liu, Ziwei},
booktitle={European Conference on Computer Vision (ECCV)},
year={2022}
}