This is the official GitHub repository for our team's contribution (ADMIS) to
- A summary paper will be published in the proceedings of the IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR) 2024
We use a latent diffusion model (LDM) based on IDiff-Face to synthesize faces. The LDM is conditioned using identity embeddings as contexts, extracted from faces by a pretrained ElasticFace recognition model.
We use the CASIA-WebFace dataset to train an IDiff-Face diffusion model. The download link for our pretrained diffusion model weight is:
We use 10K identities x 50 images dataset in SubTask 1.1/2.1 and 30k identities x 50 images in SubTask 2.1/2.2. We provide the pre-generated synthetic 10K identities dataset at:
We train the recognition model based on TFace. The pretrained IR-50 model trained on the Synthetic dataset can be accessible in:
The evaluation results of our pretrained face recognition models on widely used benchmark:
Backbone | Head | Dataset | Id | LFW | CFP-FP | CPLFW | AGEDB | CALFW | Average |
---|---|---|---|---|---|---|---|---|---|
IR-50 | ArcFace | CASIA_WebFace | 10.5K | 99.43 | 97.40 | 90.23 | 94.80 | 93.55 | 95.08 |
IR-50 | ArcFace | IDiff-Face | 10K | 97.10 | 82.00 | 76.65 | 78.40 | 86.32 | 84.09 |
IR-50 | ArcFace | DCFace | 10K | 98.60 | 88.21 | 83.33 | 88.18 | 91.38 | 89.94 |
IR-50 | ArcFace | Syn_10k (ours) | 10K | 99.17 | 92.79 | 87.67 | 89.42 | 91.43 | 92.09 |
IR-50 | ArcFace | Syn_30k (ours) | 30K | 99.52 | 94.66 | 89.75 | 91.78 | 93.13 | 93.77 |
Our method can mainly be divided into identity conditioned LDM training, context enhanced sampling, and recognition model training. Specifically, identity conditioned LDM training and context enhanced sampling are implemented based on the IDiff-Face repository. We make some modifications to its dataset and sampling codes. The implementation of recognition model training is fundamentally based on TFace repository, with only minor modifications applied to the 'transform' method to incorporate some additional cropping enhancements.
-
Install environment: Please refer to 'How to use the code' to set up the environment.
-
Download the data and pretrained models required for training LDM: The training embeddings used as contexts during training and their corresponding images have to be downloaded from the link and placed under
dataset/CASIA
. The pre-trained autoencoder for the latent diffusion training is obtained from the pre-trainedfhq256
LDM from Rombach et al. please follow their license distribution. For training, make sure the tree of the directory is as follows:CVPR24_FRCSyn_ADMIS ├── dataset │ ├── CASIA │ │ ├── elasticface_embeddings # context file and image index file │ │ ├── CASIA_namelist.txt # for training │ │ └── images # decompressed CASIA-WebFace images │ ... ├── generative_model_training │ ├── ckpt │ │ ├── autoencoder │ │ │ ├── first_stage_decoder_state_dict.pt # for training │ │ │ └── first_stage_encoder_state_dict.pt # for training │ │ ... │ ... ...
-
Start training: It has to be ensured that the
dataset: CASIA_file
option is set and that the paths in the corresponding subconfigurationgenerative_model_training/configs/dataset/CASIA_file.yaml
are pointing to the training images and pre-extracted embeddings. The model training can be initiated by executing:cd generative_model_training python main.py
To synthesize new faces with unseen identities, IDiff-Face suggests a noise embedding sampled from Gaussian distribution could serve as the LDM’s context. However, we observe that such synthesized faces exhibit weak identity consistency. We employ another unconditional DDPM, pretrained on the FFHQ dataset, to help generate high-quality contexts.
- Prepare contexts:
To facilitate ease of use, we have directly supplied the pre-generated context faces along with the context embeddings processed via the Elasticface model. Please download them from this link and place them in
dataset/context_database
. For sampling, make sure the tree of the directory is as follows:CVPR24_FRCSyn_ADMIS ├── dataset │ ├── context_database │ │ ├── elasticface_embeddings # context file │ │ └── images # decompressed context faces images │ ... ...
- Run sampling script:
If you choose to utilize our pretrained LDM checkpoint, please download the Pre-trained LDM (25% CPD) and make sure the tree of the directory is as follows:
Then the sampling process can be initiated by executing:
CVPR24_FRCSyn_ADMIS ├── generative_model_training │ ├── ckpt │ │ ├── ADMIS_FRCSyn_ckpt │ │ │ └── ema_averaged_model_200000.ckpt # for sampling │ │ ... │ ... ...
cd generative_model_training python sample.py
- ID augmentation: We employ the oversampling strategy from DCFace, by mixing up the context face (augmented 5 times) with its corresponding synthesized faces. Please run:
cd generative_model_training python id_augment.py
-
Prepare TFR format data: To convert raw image to tfrecords, generate a new data dir including some tfrecord files and a index_map file, please run:
cd recognition_model_training python3 tools/img2tfrecord.py --img_list YOUR_IMAGE_ROOT --tfrecords_dir SAVE_ROOT --tfrecords_name SAVE_NAME
-
Train: Modified the
DATA_ROOT
andINDEX_ROOT
intrain.yaml
,DATA_ROOT
is the parent dir for tfrecord dir,INDEX_ROOT
is the parent dir for index file.cd recognition_model_training bash local_train.sh
-
Test: Detail implementations and steps see Test in TFace repository.
If you have any more questions, please contact [email protected].
This repo is modified and adapted on these great repositories, we thank these authors a lot for their great efforts.