This repository contains the official PyTorch implementation of the following CVPR 2024 paper:
Title: Unleashing Network Potentials for Semantic Scene Completion PDF
Author: Fengyun Wang, Qianru Sun, Dong Zhang, and Jinhui Tang,
Affiliation: NJUST, SMU, HKUST
Semantic scene completion (SSC) aims to predict complete 3D voxel occupancy and semantics from a single-view RGB-D image, and recent SSC methods commonly adopt multi-modal inputs. However, our investigation reveals two limitations: ineffective feature learning from single modalities and overfitting to limited datasets. To address these issues, this paper proposes a novel SSC framework - Adversarial Modality Modulation Network (AMMNet) - with a fresh perspective of optimizing gradient updates. The proposed AMMNet introduces two core modules: a cross-modal modulation enabling the interdependence of gradient flows between modalities, and a customized adversarial training scheme leveraging dynamic gradient competition. Specifically, the cross-modal modulation adaptively re-calibrates the features to better excite representation potentials from each single modality. The adversarial training employs a minimax game of evolving gradients, with customized guidance to strengthen the generator's perception of visual fidelity from both geometric completeness and semantic correctness. Extensive experimental results demonstrate that AMMNet outperforms state-of-the-art SSC methods by a large margin, providing a promising direction for improving the effectiveness and generalization of SSC methods.
AMMNet consists of three components: an image encoder for RGB input, a TSDF encoder for TSDF input, and a decoder for final prediction. It has two novel modules: cross-modal modulations after the encoders and decoder to recalibrate features, and a discriminator that distinguishes real/fake voxels to mitigate overfitting issues.
The NYU dataset:
ImageEncoder | Model Zoo | Visual Results |
---|---|---|
Segformer-B2 | Google Drive / Baidu Netdisk with code:sovq | Google Drive / Baidu Netdisk with code:p4e7 |
DeepLabv3 | TODO... | Google Drive / Baidu Netdisk with code:0pww |
The NYUCAD dataset:
ImageEncoder | Model Zoo | Visual Results |
---|---|---|
Segformer-B2 | Google Drive / Baidu Netdisk with code:7mlm | Google Drive / Baidu Netdisk with code:1fa9 |
DeepLabv3 | TODO... | Google Drive / Baidu Netdisk with code:biug |
- Pytorch 1.10.1
- cudatoolkit 11.1
- mmcv 1.5.0
- mmsegmentation 0.27.0
conda create -n CleanerS python=3.7 -y
conda activate CleanerS
conda install pytorch==1.10.1 torchvision==0.11.2 torchaudio==0.10.1 cudatoolkit=11.1 -c pytorch -c conda-forge
pip install mmcv-full==1.5.0 -f https://download.openmmlab.com/mmcv/dist/cu111/torch1.10/index.html
pip install mmsegmentation==0.27.0
conda install scikit-learn
pip install pyyaml timm tqdm EasyConfig multimethod easydict termcolor shortuuid imageio
We follow the project of 3D-Sketch for dataset preparation.
After preparing, the your_SSC_Dataset
folder should look like this:
-- your_SSC_Dataset
| NYU
|-- TSDF
|-- Mapping
| |-- trainset
| |-- |-- RGB
| |-- |-- depth
| |-- |-- GT
| |-- testset
| |-- |-- RGB
| |-- |-- depth
| |-- |-- GT
| NYUCAD
|-- TSDF
| |-- trainset
| |-- |-- depth
| |-- testset
| |-- |-- depth
- on Segformer-B2
- Download the pretrained Segformer-B2, mit_b2.pth or our backup copy (Google Drive / Baidu Netdisk with code:8uem).;
- Run
run.sh
for training the AMMNet.
- on DeepLabv3
- Download the semantic segmentation results from CVSformer or our backup copy (Google Drive / Baidu Netdisk with code:ulpa).
- For the segmentation results, add one more layer to match the feature channel:
self.feature2d_proc = nn.Sequential( nn.Conv3d(14, 3, kernel_size=3, padding=1, bias=False), norm_layer(3, momentum=bn_momentum), nn.ReLU(), nn.Conv3d(3, 64, kernel_size=3, padding=1, bias=False), norm_layer(64, momentum=bn_momentum), nn.ReLU(), nn.Conv3d(64, feature, kernel_size=3, padding=1, bias=False), norm_layer(feature, momentum=bn_momentum), nn.ReLU(inplace=False), )
- The rest part should be easy to modify...(TODO...)
- Run
run.sh
for training the AMMNet
- Download our weights and then put it in the
./checkpoint
folder. - Run
python test_NYU.py --pretrained_path ./checkpoint/xxx_ckpt.pth
. The visualized results will be in the./visual_pred/xxx
folder. - Reproduce the results on the test set of the NYUCAD dataset, you should modify the
NYU.py
withCAD_mode=True
, and then go through steps 1 and 2.
If this work is helpful for your research, please consider citing:
citation
- switchable 2DNet for both Segformer-B2 and DeepLabv3
This code is based on 3D-Sketch and our previous work CleanerS.