This is the official PyTorch implementation for [Towards Scale-Aware, Robust, and Generalizable Unsupervised Monocular Depth Estimation by Integrating IMU Motion Dynamics], ECCV2022
If you find this work useful in your research, please consider citing our paper:
@inproceedings{zhang2022towards,
title={Towards scale-aware, robust, and generalizable unsupervised monocular depth estimation by integrating IMU motion dynamics},
author={Zhang, Sen and Zhang, Jing and Tao, Dacheng},
booktitle={European Conference on Computer Vision},
pages={143--160},
year={2022},
organization={Springer}
}
- Download both the raw (unsync) and the sync kitti datasets from https://www.cvlibs.net/datasets/kitti/raw_data.php. For each sequence, you will have two folders
XXX_extract/
andXXX_sync
, e.g.2011_10_03/2011_10_03_drive_0042_extract
and2011_10_03/2011_10_03_drive_0042_sync
- The experiments are performed using the data from the sync kitti dataset (
XXX_sync/
). Since the imu (oxt/
) in the sync dataset is sampled at the same frequency of the images, we need to perform a matching preprocessing step using the imu data in the raw dataset to get the corresponding imu data at the original frequency.
- You can achieve this by using
python match_kitti_imu.py
- What you need to do: (1) Modify
line 71-76
to get the sequence names of your own setting (2) Modifyline 89-90
to your own path to the raw and the sycn datasets - The matched results will be saved in
matched_oxts\
under each sequence folderXXX_sync
- A 5ms drift is allowed for current matching process. You can modify
line 153
if you are not happy about this setting - Note that we directly match the imu data using the timestamps, while ignoring potential time asynchronization between the imu and the camera timing systems.
-
Since the unsync dataset is quite large to download, we also provide our preprocessed imu files in the following link: https://pan.baidu.com/s/1971KrQEHw5kVRy_Y4Lj5FA pwd:80pz
-
For the image preprocessing, we follow the practice in https://github.com/nianticlabs/monodepth2 to convert the image format from png to jpg for a smaller image size:
find kitti_data/ -name '*.png' | parallel 'convert -quality 92 -sampling-factor 2x2,1x1,1x1 {.}.png {.}.jpg && rm {}'
- Since I only did preprocessing for once at the beginning of this project, please remind me by raising a new issue if I miss anything here
This codebase is developed under PyTorch-1.4.0, CUDA-10.0, and Ubuntu-18.04.1.
You can train our full model with:
python train.py --data_path YOUR_PATH_TO_DATA --use_ekf --num_layers 50
To use ResNet-18 rather than ResNet-50 as the backbone, you can change --num_layer
to 18
To disable the ekf fusion and use the IMU-related losses only, you can simply remove --use_ekf
To use loss weights other than the default setting, you can manipulate with the options, e.g.,
--imu_warp_weight 0.5 --imu_consistency_weight 0.01
--velo_weight 0.001 --gravity_weight 0.001
You can evaluate on the KITTI test set with:
python evaluate_depth.py --num_layer 50 --load_weights_folder YOUR_PATH_TO_MODEL_WEIGHTS --post_process
By default, we report the learnt scale without the median scaling trick. Use --eval_mono
if you want to test the performance with median scaling
For evaluation without post processing, simply remove --post_process
.
To evaluate the models with ResNet-18 backbone, change --num_layer
to 18
accordingly.
To evaluate the models on Make3D, use evaluate_make3d.py
with the same arguments as evaluate_depth.py
. But you need to change the variable main_path
in read_make3d()
to your own path that contains test images of Make3D.
The full pretrained models corresponding to the results in our ECCV paper can be downloaded from the following links:
DynaDepth R18: https://pan.baidu.com/s/1ksP2m-6rQ_PkBTLmjAAuLQ pwd:xc5h
DynaDepth R50: https://pan.baidu.com/s/1X7OAOKFZ4fw3crOx6bn4ZA pwd:c3kj
This repo is built upon the excellent works of monodepth2, deep_ekf_vio, and liegroups. The borrowed codes are licensed under their original license respectively.