Accepted to CVPR2023. 🔥
Pytorch pretraining & downstream-training code for PiMAE. We propose a MAE-based self-supervised pre-training framework that promotes 3D and 2D interaction to improve model performance on downstream object detection tasks.
📣 check out another lattest work from us, I2P-MAE, which obtains superior 3D representations from 2D pre-trained models via Image-to-Point Masked Autoencoders. 📣
We provide our pretrained weights (on SUNRGBD) and finetuned transformer-based baseline models (on SUNRGBD, ScanNetV2, KITTI), including 3DETR, DETR, MonoDETR.
task | name | Dataset | AP(gain) | download | |
---|---|---|---|---|---|
0 | Pretrain | PiMAE | SUNRGBD | - | model | |
1 | 3D Object Detection | 3DETR | SUNRGBD | [AP25] 59.4(+1.4) | model | logs |
2 | 3D Object Detection | 3DETR | ScanNetV2 | [AP25] 62.6(+0.5) | model | logs |
5 | Monocular 3D Object Detection | MonoDETR | KITTI | [Easy] 26.6(+3.5) | model | logs |
6 | 2D Object Detection | DETR | ScanNetV2 | [AP50] 46.5(+6.7) | model | logs |
We have provided a easy tutorial to use PiMAE's pre-trained 3D extractor. You can easily modify the code to fit in your model.
Get our pretrained models from here and place it as ./Pretrain/pimae.pth
.
Install minimum required dependencies then simply run the tutorial code by:
pip install torch torchvision
python Pretrain/tutorial_load.py
First, clone this repository into your local machine.
git clone https://github.com/BLVLab/PiMAE.git
Next, install required dependencies.
cd Pretrain
sh install.sh
We follow the VoteNet to preprocess our data. The instructions for preparing SUN RGB-D are here.
Remember to Edit the dataset paths in Pretrain/datasets/sunrgbd.py
.
python main.py --config cfgs/pretrain_JD_pc2img.yaml --exp_name pimae
To get reconstruction visualization like this.
python main_vis.py \
--test \
--ckpts ./experiments/pretrain/cfgs/pimae/ckpt-last.pth \
--config ./experiments/pretrain/cfgs/pimae/config.yaml \
--exp_name vis_pimae \
Follow 3DETR codebase to prepare the training data (SUNRGBD & ScanNetV2).
Install required dependencies by
cd Downstream/3detr
sh install.sh
Run the training code (you can specify training configure in the script)
sh run.sh
Follow MonoDETR codebase to prepare the training data (KITTI). Install required dependencies by
cd Downstream/MonoDETR
sh install.sh
Run the code for training and testing (remember to check monodetr.yaml
where we specify path to pimae weights).
bash train.sh configs/monodetr.yaml > logs/monodetr.log # training
bash test.sh configs/monodetr.yaml # testing
Follow DETR to prepare data and required dependencies. Then train it by
cd Downstream/detr/d2
python train_net.py --config configs/detr_256_6_6_torchvision.yaml --num-gpus 8
This repository is based on 3DETR, MonoDETR, DETR, timm, MAE repositories, we thank them for their great work.
If you find this repository helpful, please consider citing our work:
@inproceedings{chen2023pimae,
title={PiMAE: Point Cloud and Image Interactive Masked Autoencoders for 3D Object Detection},
author = {Chen, Anthony and Zhang, Kevin and Zhang, Renrui and Wang, Zihan and Lu, Yuheng and Guo, Yandong and Zhang, Shanghang},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year={2023}
}