Chunlu Li, Andreas Morel-Forster, Thomas Vetter, Bernhard Egger*, and Adam Kortylewski*
This work enables a model-based face autoencoder to segment occlusions accurately for 3D face reconstruction and provides state-of-the-art occlusion segmentation results and the face reconstruction is robust to occlusions. It requires only weak supervision for the face reconstruction subnetwork and can be trained end-to-end efficiently. The effectiveness of this method is verified on the Celeb A HQ dataset, the AR dataset, and the NoW Challenge.
● [Update 20230331] Docker image with trained model available now!
- Docker with pre-trained model coming soon.
-
The ArcFace for perceptual-level loss.
-
Better tuned hyper-parameters for higher reconstruction accuracy.
-
Test and evaluation code released. 3D shape (.obj mesh), rendered faces, and estimated masks available. Evaluation indices (accuracy, precision, F1 socre, and recall rate) available.
This method provides reliable occlusion segmentation masks and the training of the segmentation network does not require any additional supervision.
This method produces accurate 3D face model fitting results which are robust to occlusions.
[New!] Our method, named 'FOCUS' (Face-autoencoder and OCclUsion Segmentation), reaches the SOTA on the NoW Challenge!
The results of the state-of-the-art methods on the NoW face benchmark is as follows:
Rank | Method | Median(mm) | Mean(mm) | Std(mm) |
---|---|---|---|---|
1. | FOCUS (Ours) | 1.04 | 1.30 | 1.10 |
2. | DECA[Feng et al., SIGGRAPH 2021] | 1.09 | 1.38 | 1.18 |
3. | Deep3DFace PyTorch [Deng et al., CVPRW 2019] | 1.11 | 1.41 | 1.21 |
4. | RingNet [Sanyal et al., CVPR 2019] | 1.21 | 1.53 | 1.31 |
5. | Deep3DFace [Deng et al., CVPRW 2019] | 1.23 | 1.54 | 1.29 |
6. | 3DDFA-V2 [Guo et al., ECCV 2020] | 1.23 | 1.57 | 1.39 |
7. | MGCNet [Shang et al., ECCV 2020] | 1.31 | 1.87 | 2.63 |
8. | PRNet [Feng et al., ECCV 2018] | 1.50 | 1.98 | 1.88 |
9. | 3DMM-CNN [Tran et al., CVPR 2017] | 1.84 | 2.33 | 2.05 |
For more details about the evaluation, check Now Challenge website.
This method follows a step-wise manner and is easy to implement.
To train and/or test this work, you need to:
-
Prepare .csv files for the training set, validation set, and testing set.
The .csv files should contain rows of [filename + landmark coordinates].
We recommend using the 68 2D landmarks detected by 2D-and-3D-face-alignment.
-
To evaluate the accuracy of the estimated masks, ground truth occlusion segmentation masks are required. Please name the target image as 'image_name.jpg' and ground truth masks as 'image_name_visible_skin_mask.png'.
The image directory should follow the structure below:
./image_root ├── Dataset # Database folder containing the train set, validation set, and test set. ├──1.jpg # Target image ├──1_visible_skin_mask.png # GT masks for testing. (optional for training) └──... ├── train_landmarks.csv # .csv file for the train set. ├── test_landmarks.csv # .csv file for the test set. ├── val_landmarks.csv # .csv file for the validation set. └── all_landmarks.csv # .csv file for the whole dataset. (optional)
- Our implementation employs the BFM 2017. Please copy 'model2017-1_bfm_nomouth.h5' to './basel_3DMM'.
We depend on ArcFace to compute the perceptual features for the target images and the rendered image.
-
Download the trained model.
-
Place ms1mv3_arcface_r50_fp16.zip and backbone.pth under ./Occlusion_Robust_MoFA/models/.
-
To install the ArcFace, please run the following code:
cd ./Occlusion_Robust_MoFA
git clone https://github.com/deepinsight/insightface.git
cp -r ./insightface/recognition/arcface_torch/* ./models/
- Overwrite './models/backbones/iresnet.py' with the file in our repository.
The structure of the directory 'models' should be:
./models
├── ms1mv3_arcface_r50_fp16
├──backbone.pth
└──... # Trained model downloaded.
├── backbones
├──*iresnet.py # Overwritten by our code.
└──...
└── ... # files/directories downloaded from ArcFace repo.
We recommend using anaconda or miniconda to create virtue environment and install the packages. You can set up the environment with the following commands:
conda create -n FOCUS python=3.6
conda activate FOCUS
pip install -r requirements.txt
To train the proposed network, please follow the steps:
- Enter the directory
cd ./Occlusion_Robust_MoFA
- Unsupervised Initialization
python Step1_Pretrain_MoFA.py --img_path ./image_root/Dataset
- Generate UNet Training Set
python Step2_UNet_trainset_generation.py --img_path ./image_root/Dataset
- Pretrain Unet
python Step3_Pretrain_Unet.py
- Joint Segmentation and Reconstruction
python Step4_UNet_MoFA_EM.py --img_path ./image_root/Dataset
-
Test-time adaptation (Optional)
To bridge the domain gap between training and testing data to reach higher performance on the test dataset, test-time adaptation is available with the following command:
python Step4_UNet_MoFA_EM.py --img_path ./image_root/Dataset_adapt --pretrained_model iteration_num
To test the model saved as './MoFA_UNet_Save/model-path/model-name', use the command below:
python Demo.py --img_path ./image_root/Dataset --pretrained_model_test ./MoFA_UNet_Save/model-path/model-name.model --test_mode pipeline_name --test_path test_dataset_root --save_path save_path --landmark_list_name landmark_filename_optional.csv
- .csv files are no longer required in the docker version. Instead, the landmarks are automatically detected.
- Fixed the naming of some variables.
- Misfit prior is also included in the docker image.
- Pull.
sudo docker pull chunluli/focus:1.2
- Run a container with your data directory /DataDir mounted.
docker run -v /DataDir:/FOCUS/data -itd chunluli/focus:1.2 /bin/bash
docker attach containerID
- Run the following command to see how to use the codes:
python show_instructions.py
More information can be found in dockerhub.
Please cite the following papers if this model helps your research:
@inproceedings{li2023robust,
title={Robust Model-based Face Reconstruction through Weakly-Supervised Outlier Segmentation},
author={Li, Chunlu and Morel-Forster, Andreas and Vetter, Thomas and Egger, Bernhard and Kortylewski, Adam},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={372--381},
year={2023}
}
This code is built on top of the MoFA re-implementation from Tatsuro Koizumi and the data processing is on top of the Deep3D. If you establish your own work based on our work, please also cite the following papers:
@inproceedings{koizumi2020look,
title={“Look Ma, no landmarks!”--Unsupervised, model-based dense face alignment},
author={Koizumi, Tatsuro and Smith, William AP},
booktitle={European Conference on Computer Vision},
pages={690--706},
year={2020},
organization={Springer}
}
@inproceedings{deng2019accurate,
title={Accurate 3D Face Reconstruction with Weakly-Supervised Learning: From Single Image to Image Set},
author={Yu Deng and Jiaolong Yang and Sicheng Xu and Dong Chen and Yunde Jia and Xin Tong},
booktitle={IEEE Computer Vision and Pattern Recognition Workshops},
year={2019}
}