Code release for our paper
Understanding 3D Object Interaction from a Single Image
Shengyi Qian, David F. Fouhey
ICCV 2023
[Project Page
] [arXiv
] [demo
]
Please check the project page for more details and consider citing our paper if it is helpful:
@inproceedings{qian2023understanding,
title={Understanding 3D Object Interaction from a Single Image},
author={Qian, Shengyi and Fouhey, David F},
booktitle = {ICCV},
year={2023}
}
If you are interested in the inference-only code, you can also try our demo code on hugging face.
We are using anaconda to set up the python environment. It is tested on python 3.9 and pytorch 2.0.1. pytorch3d is only required for 3D visualization.
# python
conda create -n monoarti python=3.9
conda activate monoarti
# pytorch
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
# other packages
pip install accelerate
pip install submitit
pip install hydra-core --upgrade --pre
pip install hydra-submitit-launcher --upgrade
pip install pycocotools
pip install packaging plotly imageio imageio-ffmpeg matplotlib h5py opencv-python
pip install tqdm wandb visdom
# (optional, for 3D visualization) pytorch3d
pip install "git+https://github.com/facebookresearch/pytorch3d.git"
Create checkpoints
to store pretrained checkpoints.
mkdir checkpoints
If necessary, download our pretrained SAM model and put it at checkpoints/checkpoint_20230515.pth
.
The dataset is released in the project page. Please download and set the dataset root here.
The dataset should be organized like this
- `3doi_data`
- `3doi_v1`
- `images`
- `omnidata_filtered`
To test the model on any 3DOI or other dataset, run
python test.py --config-name sam_inference checkpoint_path=checkpoints/checkpoint_20230515.pth output_dir=vis
To create video animation, run
python test.py --config-name sam_inference checkpoint_path=checkpoints/checkpoint_20230515.pth output_dir=vis test.mode='export_video'
To train our model with segment-anything backbone,
python train.py --config-name sam
To train our model with DETR backbone,
python train.py --config-name detr
We reuse the code of ViewSeg, DETR and Segment-Anything.