Wenbo Wang*, Hsuan-I Ho*, Chen Guo, Boxiang Rong, Artur Grigorev, Jie Song, Juan Jose Zarate, Otmar Hilliges
- 4D-DRESS Dataset.
- 4D-Human-Parsing Code.
Git clone this repo:
git clone -b main --single-branch https://github.com/eth-ait/4d-dress.git
cd 4d-dress
Create conda environment from environment.yaml:
conda env create -f environment.yml
conda activate 4ddress
Or create a conda environment via the following commands:
conda create -n 4ddress python==3.8
conda activate 4ddress
bash env_install.sh
Install image-based parser: Graphonomy.
Download checkpoint inference.pth from here and save to 4dhumanparsing/checkpoints/graphonomy/.
git clone https://github.com/Gaoyiminggithub/Graphonomy.git 4dhumanparsing/lib/Graphonomy
mkdir 4dhumanparsing/checkpoints
mkdir 4dhumanparsing/checkpoints/graphonomy
Install optical flow predictor: RAFT.
Download checkpoint raft-things.pth and save to 4dhumanparsing/checkpoints/raft/models/.
git clone https://github.com/princeton-vl/RAFT.git 4dhumanparsing/lib/RAFT
wget -P 4dhumanparsing/checkpoints/raft/ https://dl.dropboxusercontent.com/s/4j4z58wuv8o0mfz/models.zip
unzip 4dhumanparsing/checkpoints/raft/models.zip -d 4dhumanparsing/checkpoints/raft/
Install segment anything model: SAM.
Download checkpoint sam_vit_h_4b8939.pth and save to 4dhumanparsing/checkpoints/sam/.
pip install git+https://github.com/facebookresearch/segment-anything.git
wget -P 4dhumanparsing/checkpoints/sam/ https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
Install and compile graph-cut optimizer: pygco:
git clone https://github.com/yujiali/pygco.git 4dhumanparsing/lib/pygco
cd 4dhumanparsing/lib/pygco
wget -N -O gco-v3.0.zip https://vision.cs.uwaterloo.ca/files/gco-v3.0.zip
unzip -o gco-v3.0.zip -d ./gco_source
make all
cd ../../..
Please download the 4D-DRESS dataset and place the folders according to the following structure:
4D-DRESS
└── < Subject ID > (00***)
└── < Outfit > (Inner, Outer)
└── < Sequence ID > (Take*)
├── basic_info.pkl: {'scan_frames', 'rotation', 'offset', ...}
├── Meshes_pkl
│ ├── atlas-fxxxxx.pkl: uv texture map as pickle file (1024, 1024, 3)
│ └── mesh-fxxxxx.pkl: {'vertices', 'faces', 'colors', 'normals', 'uvs'}
├── SMPL
│ ├── mesh-fxxxxx_smpl.pkl: SMPL params
│ └── mesh-fxxxxx_smpl.ply: SMPL mesh
├── SMPLX
│ ├── mesh-fxxxxx_smplx.pkl: SMPLX params
│ └── mesh-fxxxxx_smplx.ply: SMPLX mesh
├── Semantic
│ ├── labels
│ │ └── label-fxxxxx.pkl, {'scan_labels': (nvt, )}
│ ├── clothes: let user extract
│ │ └── cloth-fxxxxx.pkl, {'upper': {'vertices', 'faces', 'colors', 'uvs', 'uv_path'}, ...}
├── Capture
│ ├── cameras.pkl: {'cam_id': {"intrinsics", "extrinsics", ...}}
│ ├── < Camera ID > (0004, 0028, 0052, 0076)
│ │ ├── images
│ │ │ └── capture-f*****.png: captured image (1280, 940, 3)
│ │ ├── masks
│ │ │ └── mask-f*****.png: rendered mask (1280, 940)
│ │ ├── labels: let user extract
│ │ │ └── label-f*****.png: rendered label (1280, 940, 3)
└── └── └── └── overlap-f*****.png: overlapped label (1280, 940, 3)
Visualize 4D-DRESS sequences using aitviewer.
python dataset/visualize.py --subj 00122 --outfit Outer --seq Take9
Extract labeled cloth meshes and render multi-view pixel labels using vertex annotations.
python dataset/extract_garment.py --subj 00122 --outfit Outer --seq Take9
4D Human Parsing Method. We first render current and previous frame scans into multi-view images and labels. 3.1) Then collect multi-view parsing results from the image parser, optical flows, and segmentation masks. 3.2) Finally, we project multi-view labels to 3D vertices and optimize vertex labels using the Graph Cut algorithm with vertex-wise unary energy and edge-wise binary energy. 3.3) The manual rectification labels can be easily introduced by checking the multi-view rendered labels.
First, run the image parser, optical flow, and Segment Anything models on the entire 4D scan sequence, and parse the first frame:
python 4dhumanparsing/multi_view_parsing.py --subj 00122 --outfit Outer --seq Take9
Second, run graph-cut optimization and introduce manual rectification on the entire 4D scan sequence:
python 4dhumanparsing/multi_surface_parsing.py --subj 00122 --outfit Outer --seq Take9
You can introduce new labels, like socks and belts, during the 4D human parsing process.
First, run the image parser, optical flow, and Segment Anything models, parse the first frame with new_label=sock:
python 4dhumanparsing/multi_view_parsing.py --subj 00135 --outfit Inner --seq Take1 --new_label sock
Second, run graph-cut optimization and introduce manual rectification on all frames with new_label=sock:
python 4dhumanparsing/multi_surface_parsing.py --subj 00135 --outfit Inner --seq Take1 --new_label sock
Tracking and parsing small regions like socks and belts may need more manual rectification efforts.
You can apply our 4D human parsing method on other 4D human datasets, like BUFF, X-Humans, and Actors-HQ.
For instance, you can modify our DatasetUtils within 4dhumanparsing/lib/utility/dataset.py to XHumansUtils.
And then, like before, run image parser, optical flow, and segment anything models on X-Humans sequence:
python 4dhumanparsing/multi_view_parsing.py --dataset XHumans --subj 00017 --outfit test --seq Take10
After which, run graph-cut optimization and introduce manual rectification on all frames:
python 4dhumanparsing/multi_surface_parsing.py --dataset XHumans --subj 00017 --outfit test --seq Take10
- Yin et. al, "Hi4D: 4D Instance Segmentation of Close Human Interaction", CVPR 2023
- Shen et. al, "X-Avatar: Expressive Human Avatars", CVPR 2023
- Antić et. al, "CloSe: A 3D Clothing Segmentation Dataset and Model", 3DV 2024
If you find our code, dataset, and paper useful, please cite as
@inproceedings{wang20244ddress,
title={4D-DRESS: A 4D Dataset of Real-world Human Clothing with Semantic Annotations},
author={Wang, Wenbo and Ho, Hsuan-I and Guo, Chen and Rong, Boxiang and Grigorev, Artur and Song, Jie and Zarate, Juan Jose and Hilliges, Otmar},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2024}
}