Erik Sandström1,2*
·
Keisuke Tateno2
·
Michael Oechsle2
·
Michael Niemeyer2
Luc Van Gool1,4,5
·
Martin R. Oswald1,6
·
Federico Tombari2,3
1 ETH Zurich, 2 Google, 3 TUM, 4 KU Leuven, 5 INSAIT, 6 University of Amsterdam
(* This work was conducted during an internship at Google)
This is not an officially endorsed Google product.
Splat-SLAM produces more accurate dense geometry and rendering results compared to existing methods. This is thanks to our deformable 3DGS representation and DSPO layer for camera pose and depth estimation. 1Zhang et al. 2024. 2Matsuki et al. 2023.
Splat-SLAM Architecture. We use a keyframe based frame to frame tracker based on dense optical flow connected to a pose graph for global consistency. For dense mapping, we resort to a 3DGS representation, suitable for extracting both dense geometry and rendering from.
Table of Contents
- Clone the repo using the
--recursive
flag
git clone --recursive https://github.com/google-research/Splat-SLAM.git
cd splat-slam-private
- Creating a new conda environment.
conda create --name splat-slam python=3.10
conda activate splat-slam
- Install CUDA 11.7 using conda and pytorch 1.12
conda install conda-forge::cudatoolkit-dev=11.7.0
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
Now make sure that "which python" points to the correct python executable. Also test that cuda is available python -c "import torch; print(torch.cuda.is_available())"
- Update depth rendering hyperparameter in thirparty library
By default, the gaussian rasterizer does not render gaussians that are closer than 0.2 (meters) in front of the camera. In our monocular setting, where the global scale is ambiguous, this can lead to issues during rendering. Therefore, we adjust this threshold to 0.001 instead of 0.2. Change the value at this line, i.e. it should read
if (p_view.z <= 0.001f)// || ((p_proj.x < -1.3 || p_proj.x > 1.3 || p_proj.y < -1.3 || p_proj.y > 1.3)))
- Install the remaining dependencies.
python -m pip install -e thirdparty/lietorch/
python -m pip install -e thirdparty/diff-gaussian-rasterization-w-pose/
python -m pip install -e thirdparty/simple-knn/
python -m pip install -e thirdparty/evaluate_3d_reconstruction_lib/
- Check installation.
python -c "import torch; import lietorch; import simple_knn; import
diff_gaussian_rasterization; print(torch.cuda.is_available())"
- Now install the droid backends and the other requirements
python -m pip install -e .
python -m pip install -r requirements.txt
python -m pip install pytorch-lightning==1.9 --no-deps
- Download pretrained model.
Download the pretained models from Google Drive, unzip them inside the
pretrained
folder. Themiddle_fine.pt
decoder will not be used and can be removed.
[Directory structure of pretrained (click to expand)]
.
└── pretrained
├── .gitkeep
├── droid.pth
├── middle_fine.pt
└── omnidata_dpt_depth_v2.ckpt
Download the data as below and the data is saved into the ./datasets/Replica
folder. Note that the Replica data is generated by the authors of iMAP (but hosted by the authors of NICE-SLAM). Please cite iMAP if you use the data.
bash scripts/download_replica.sh
To be able to evaluate the reconstruction error, download the ground truth Replica meshes where unseen region have been culled.
bash scripts/download_cull_replica_mesh.sh
bash scripts/download_tum.sh
Please change the input_folder
path in the scene specific config files to point to where the data is stored.
Please follow the data downloading procedure on the ScanNet website, and extract color/depth frames from the .sens
file using this code.
[Directory structure of ScanNet (click to expand)]
Please change the input_folder
path in the scene specific config files to point to where the data is stored.
DATAROOT
└── scannet
└── scene0000_00
└── frames
├── color
│ ├── 0.jpg
│ ├── 1.jpg
│ ├── ...
│ └── ...
├── depth
│ ├── 0.png
│ ├── 1.png
│ ├── ...
│ └── ...
├── intrinsic
└── pose
├── 0.txt
├── 1.txt
├── ...
└── ...
We use the following sequences:
scene0000_00
scene0054_00
scene0059_00
scene0106_00
scene0169_00
scene0181_00
scene0207_00
scene0233_00
For running Splat-SLAM, each scene has a config folder, where the input_folder
,output
paths need to be specified. Below, we show some example run commands for one scene from each dataset.
To run Splat-SLAM on the office0
scene, run the following command.
python run.py configs/Replica/office0.yaml
After reconstruction, the trajectory error will be evaluated and so will the mesh accuracy along with the rendering metrics.
To run Splat-SLAM on the freiburg3_office
scene, run the following command.
python run.py configs/TUM_RGBD/freiburg3_office.yaml
After reconstruction, the trajectory error will be evaluated automatically.
To run Splat-SLAM on the scene0000_00
scene, run the following command.
python run.py configs/Scannet/scene0000.yaml
After reconstruction, the trajectory error will be evaluated automatically.
Our Splat-SLAM pipeline uses two processes, one for tracking and one for mapping, and it is possible to run tracking only without mapping/rendering. Add --only_tracking
in each of the above commands.
python run.py configs/Replica/office0.yaml --only_tracking
python run.py configs/TUM_RGBD/freiburg3_office.yaml --only_tracking
python run.py configs/Scannet/scene0000.yaml --only_tracking
Our codebase is partially based on GlORIE-SLAM, GO-SLAM, DROID-SLAM and MonoGS. We thank the authors for making these codebases publicly available. Our work would not have been possible without your great efforts!
There may be minor differences between the released codebase and the results reported in the paper. Further, we note that the GPU hardware has an influence, despite running the same seed and conda environment.
If you find our code or paper useful, please cite
@article{sandstrom2024splat,
title={Splat-SLAM: Globally Optimized RGB-only SLAM with 3D Gaussians},
author={Sandstr{\"o}m, Erik and Tateno, Keisuke and Oechsle, Michael and Niemeyer, Michael and Van Gool, Luc and Oswald, Martin R and Tombari, Federico},
journal={arXiv preprint arXiv:2405.16544},
year={2024}
}
Contact Erik Sandström for questions, comments and reporting bugs.