Depth estimation and 3D model reconstruction from aerial imagery is an important task in photogrammetry, remote sensing, and computer vision. To compare the performance of different image-based approaches, this study presents a benchmark for UAV-based aerial imagery using the UseGeo dataset. The contributions include the release of various evaluation routines on GitHub, as well as a comprehensive comparison of baseline approaches, such as methods for offline multi-view 3D reconstruction resulting in point clouds and triangle meshes, online multi-view depth estimation, as well as single-image depth estimation using self-supervised deep learning. With the release of our evaluation routines, we aim to provide a universal protocol for the evaluation of depth estimation and 3D reconstruction methods on the UseGeo dataset. The conducted experiments and analyses show that each method excels in a different category: the depth estimation from COLMAP outperforms that of the other approaches, ACMMP achieves the lowest error and highest completeness for point clouds, while OpenMVS produces triangle meshes with the lowest error. Among the online methods for depth estimation, the approach from the Plane-Sweep Library outperforms the FaSS-MVS approach, while the latter achieves the lowest processing time. And even though the particularly challenging nature of the dataset and the small amount of training data leads to a significantly higher error in the results of the self-supervised single-image depth estimation approach, it outperforms all other approaches in terms of processing time and frame rate. In our evaluation, we have also considered modern learning-based approaches that can be used for image-based 3D reconstruction, such as NeRFs. However, due to the significantly lower quality of the resulting 3D models, we have only included a qualitative comparison of such methods against conventional approaches in the scope of this work.
Link to paper: https://doi.org/10.1016/j.ophoto.2024.100065.
If you use this project for your research, please cite:
@Article{Hermann2024usegeo,
author = {M. Hermann and M. Weinmann and F. Nex and E.K. Stathopoulou and F. Remondino and B. Jutzi and B. Ruf},
title = {Depth estimation and 3D reconstruction from UAV-borne imagery: Evaluation on the UseGeo dataset},
journal = {ISPRS Open Journal of Photogrammetry and Remote Sensing},
pages = {100065},
year = {2024},
issn = {2667-3932},
doi = {https://doi.org/10.1016/j.ophoto.2024.100065},
url = {https://www.sciencedirect.com/science/article/pii/S2667393224000085},
}
Link to the dataset: https://github.com/3DOM-FBK/usegeo
For the evaluation we provide three main scripts. eval_depth_maps.py compares one or multiple depth maps with the corresponding ground truth and can also adjust them by median scaling if the estimate has no metric scale. The second script eval_pointcloud.py evaluates point clouds by calculating the point-to-point distance between corresponding points in the estimation and ground truth. In addition, ICP can be used to refine the alignment once the point clouds have already been roughly aligned. Alternatively, the path to a transformation matrix can be specified. The third script eval_mesh.py, which evaluates triangle meshes, behaves similarly. Here, however, the distance of a ground truth point to a triangle of the mesh is used as a metric. A way to color code the absolute error for visualization is available for both point clouds and triangle meshes. In addition, two utility scripts are available. convert_range_maps_to_depth_maps.py converts the ground truth published as range maps to conventional depth maps. filter_ground_truth_pointcloud.py represents our approach to adjust the ground truth by removing LIDAR points that are not visible in the images.
In addition, the repository contains the configurations of the Self Supervised Depth Esimation (SSDE) approach. For the training, we divided the flights into coherent sequences. The sequences are broken down in the files SSDE_image_sequences_data_set_X.txt. The images excluded from training and testing are located in the file SSDE_excluded_images.txt. Our modeles are trained on the subsets A and B, and evaluated on the subset C.
To install all dependencies:
pip install -r requirements.txt
The evaluation of the methods is divided into the evaluation of depth maps, point clouds and triangle meshes. For the evaluation of point clouds and meshes, the ground truth point cloud is used directly, whereas the ground truth depth maps are obtained by projecting the point clouds into the images.
To evaluate the depth maps, the supplied ground truth is used, which was generated by projecting the points of the LiDAR point cloud into the image plane. However, since the ground truth is in the form of range maps, we first transform these into depth maps.
Dataset-1 | ↓L1-abs | ↑Acc0.5 | ↑Cpl0.5 | ↑Acc0.1 | ↑Cpl0.1 | ↑Acc0.05 | ↑Cpl0.05 |
---|---|---|---|---|---|---|---|
FaSS-MVSGPP | 0.7486 | 0.7620 | 0.5965 | 0.3612 | 0.2819 | 0.1943 | 0.1517 |
PSLSplitOcc. | 4.5773 | 0.6291 | 0.6291 | 0.3042 | 0.3042 | 0.1663 | 0.1663 |
PSLSplitOcc. GPP | 0.4884 | 0.8458 | 0.4659 | 0.4187 | 0.2291 | 0.2299 | 0.1257 |
PSLBestKOcc. | 2.3804 | 0.7611 | 0.7611 | 0.3735 | 0.3735 | 0.2009 | 0.2009 |
PSLBestKOcc. GPP | 0.3217 | 0.8894 | 0.5885 | 0.4565 | 0.3014 | 0.2468 | 0.1630 |
SSDEResNet18 | 4.9545 | 0.1233 | 0.1233 | 0.0248 | 0.0248 | 0.0124 | 0.0124 |
SSDEResNet50 | 2.8260 | 0.1332 | 0.1332 | 0.0269 | 0.0269 | 0.0134 | 0.0134 |
SSDEPackNet01 | 2.7866 | 0.1362 | 0.1362 | 0.0275 | 0.0275 | 0.0138 | 0.0138 |
Dataset-2
Dataset-2 | ↓L1-abs | ↑Acc0.5 | ↑Cpl0.5 | ↑Acc0.1 | ↑Cpl0.1 | ↑Acc0.05 | ↑Cpl0.05 |
---|---|---|---|---|---|---|---|
FaSS-MVSGPP | 0.7681 | 0.6549 | 0.5110 | 0.2455 | 0.1912 | 0.1310 | 0.1021 |
PSLSplitOcc. | 5.2157 | 0.5842 | 0.5842 | 0.2592 | 0.2592 | 0.1395 | 0.1395 |
PSLSplitOcc. GPP | 0.5870 | 0.7925 | 0.4238 | 0.3562 | 0.1891 | 0.1920 | 0.1018 |
PSLBestKOcc. | 2.9696 | 0.7103 | 0.7103 | 0.3178 | 0.3178 | 0.1686 | 0.1686 |
PSLBestKOcc. GPP | 0.3949 | 0.8437 | 0.5395 | 0.3896 | 0.2478 | 0.2074 | 0.1318 |
Dataset-3
Dataset-3 | ↓L1-abs | ↑Acc0.5 | ↑Cpl0.5 | ↑Acc0.1 | ↑Cpl0.1 | ↑Acc0.05 | ↑Cpl0.05 |
---|---|---|---|---|---|---|---|
FaSS-MVSGPP | 0.7624 | 0.6474 | 0.5163 | 0.2330 | 0.1855 | 0.1217 | 0.0969 |
PSLSplitOcc. | 5.3428 | 0.5963 | 0.5963 | 0.2425 | 0.2425 | 0.1288 | 0.1288 |
PSLSplitOcc. GPP | 0.5795 | 0.7723 | 0.4424 | 0.3183 | 0.1826 | 0.1695 | 0.0973 |
PSLBestKOcc. | 3.1116 | 0.7029 | 0.7029 | 0.2904 | 0.2904 | 0.1529 | 0.1529 |
PSLBestKOcc. GPP | 0.4137 | 0.8227 | 0.5409 | 0.3516 | 0.2309 | 0.1859 | 0.1220 |
Dataset-1 | ↓L1-abs | ↑Acc0.5 | ↑Cpl0.5 | ↑Acc0.1 | ↑Cpl0.1 | ↑Acc0.05 | ↑Cpl0.05 |
---|---|---|---|---|---|---|---|
COLMAPSFM+MVS | 0.3724 | 0.8807 | 0.8430 | 0.4720 | 0.4533 | 0.2653 | 0.2550 |
COLMAPMVS | 0.3500 | 0.8890 | 0.8526 | 0.5395 | 0.5183 | 0.3212 | 0.3086 |
COLMAPMVS+8K | 0.2765 | 0.9181 | 0.8254 | 0.6476 | 0.5837 | 0.4258 | 0.3842 |
OpenMVS | 0.3507 | 0.8689 | 0.8205 | 0.4419 | 0.4186 | 0.2347 | 0.2227 |
ACMMP | 0.7408 | 0.8718 | 0.8692 | 0.5695 | 0.5680 | 0.3617 | 0.3608 |
Dataset-2
Dataset-2 | ↓L1-abs | ↑Acc0.5 | ↑Cpl0.5 | ↑Acc0.1 | ↑Cpl0.1 | ↑Acc0.05 | ↑Cpl0.05 |
---|---|---|---|---|---|---|---|
COLMAPSFM+MVS | 0.4397 | 0.8514 | 0.8193 | 0.4454 | 0.4298 | 0.2464 | 0.2379 |
COLMAPMVS | 0.4238 | 0.8581 | 0.8240 | 0.4771 | 0.4588 | 0.2808 | 0.2702 |
COLMAPMVS+8K | 0.0044 | 0.9950 | 0.9537 | 0.9832 | 0.9424 | 0.9255 | 0.8873 |
OpenMVS | 0.4482 | 0.8221 | 0.7751 | 0.3321 | 0.3136 | 0.1672 | 0.1578 |
ACMMP | 0.6360 | 0.8447 | 0.8432 | 0.5197 | 0.5189 | 0.3198 | 0.3194 |
Dataset-3
Dataset-3 | ↓L1-abs | ↑Acc0.5 | ↑Cpl0.5 | ↑Acc0.1 | ↑Cpl0.1 | ↑Acc0.05 | ↑Cpl0.05 |
---|---|---|---|---|---|---|---|
COLMAPSFM+MVS | 0.4489 | 0.8273 | 0.7969 | 0.3800 | 0.3670 | 0.2045 | 0.1976 |
COLMAPMVS | 0.4307 | 0.8396 | 0.8066 | 0.4173 | 0.4014 | 0.2328 | 0.2240 |
COLMAPMVS+8K | 0.3166 | 0.8925 | 0.7721 | 0.5688 | 0.4938 | 0.3485 | 0.3023 |
OpenMVS | 0.4413 | 0.8017 | 0.7579 | 0.2867 | 0.2714 | 0.1380 | 0.1306 |
ACMMP | 0.6177 | 0.8271 | 0.8252 | 0.4510 | 0.4502 | 0.2648 | 0.2644 |
Regarding the reconstruction of a dense point cloud, we have only considered offline based methods in this work.
Dataset-1 | ↓L1-abs in m | ↓RMSE in m | ↑Cpl. | No. points |
---|---|---|---|---|
COLMAPSFM+MVS | 0.0778 | 0.0912 | 0.5165 | 13,753,122 |
COLMAPMVS | 0.0609 | 0.0700 | 0.5911 | 13,787,242 |
COLMAPMVS+8K | 0.0453 | 0.0510 | 0.6743 | 190,358,894 |
OpenMVS | 0.0765 | 0.0898 | 0.5682 | 23,014,725 |
ACMMP | 0.0473 | 0.0541 | 0.6331 | 53,033,375 |
Dataset-2
Dataset-2 | ↓L1-abs in m | ↓RMSE in m | ↑Cpl. | No. points |
---|---|---|---|---|
COLMAPSFM+MVS | 0.1599 | 0.1983 | 0.3339 | 20,215,643 |
COLMAPMVS | 0.0690 | 0.0812 | 0.5965 | 20,332,931 |
COLMAPMVS+8K | 0.0445 | 0.0492 | 0.7041 | 271,763,480 |
OpenMVS | 0.0976 | 0.1159 | 0.5469 | 34,001,002 |
ACMMP | 0.0491 | 0.0569 | 0.6436 | 75,096,267 |
Dataset-3
Dataset-3 | ↓L1-abs in m | ↓RMSE in m | ↑Cpl. | No. points |
---|---|---|---|---|
COLMAPSFM+MVS | 0.1294 | 0.1611 | 0.3897 | 17,087,339 |
COLMAPMVS | 0.0782 | 0.0921 | 0.5418 | 17,199,568 |
COLMAPMVS+8K | 0.0514 | 0.0581 | 0.6510 | 210,699,535 |
OpenMVS | 0.1061 | 0.1247 | 0.4865 | 28,418,405 |
ACMMP | 0.0566 | 0.0664 | 0.5840 | 55,845,638 |
Since only COLMAP and OpenMVS support the reconstruction of a triangle mesh, the evaluation focuses on these two methods. To investigate the improvement in quality of the OpenMVS refinement step, we evaluate the reconstruction step by itself in addition to the final result.
Dataset-1 | ↓L1-abs in m | ↓RMSE in m | ↑Cpl. | No. triangles |
---|---|---|---|---|
COLMAPSFM+MVS | 0.1011 | 0.1295 | 0.5223 | 59,195,510 |
COLMAPMVS | 0.0794 | 0.1049 | 0.5976 | 57,747,344 |
COLMAPMVS+8K | 0.0754 | 0.0980 | 0.6607 | 109,727,922 |
OpenMVSno refine | 0.0816 | 0.1173 | 0.5282 | 7,450,170 |
OpenMVS | 0.0261 | 0.0531 | 0.5918 | 1,467,494 |
Dataset-2
Dataset-2 | ↓L1-abs in m | ↓RMSE in m | ↑Cpl. | No. triangles |
---|---|---|---|---|
COLMAPSFM+MVS | 0.2221 | 0.2791 | 0.3239 | 78,134,162 |
COLMAPMVS | 0.0984 | 0.1336 | 0.5981 | 77,404,013 |
COLMAPMVS+8K | 0.0868 | 0.1172 | 0.6966 | 120,631,107 |
OpenMVSno refine | 0.1127 | 0.1558 | 0.4896 | 11,637,590 |
OpenMVS | 0.0704 | 0.1218 | 0.5425 | 2,394,468 |
Dataset-3
Dataset-3 | ↓L1-abs in m | ↓RMSE in m | ↑Cpl. | No. triangles |
---|---|---|---|---|
COLMAPSFM+MVS | 0.2044 | 0.2571 | 0.3265 | 55,706,150 |
COLMAPMVS | 0.1144 | 0.1515 | 0.5442 | 65,912,047 |
COLMAPMVS+8K | 0.1015 | 0.1332 | 0.6337 | 112,077,677 |
OpenMVSno refine | 0.1243 | 0.1687 | 0.4339 | 10,329,882 |
OpenMVS | 0.0714 | 0.1219 | 0.4850 | 2,195,850 |
Since all areas covered by the camera are also covered by the LiDAR ground truth, it is possible to use the nearest ground truth point for each point or triangle to calculate the accuracy of the point clouds and meshes. For the completeness score, however, we need to assign a correspondence in the estimate to each point in the ground truth point cloud, which is problematic because the LiDAR-scan extends well beyond the area covered by images. For this reason, the results obtained in terms of completeness appear rather low. To get a more realistic value we have additionally filtered the ground truth by removing all points at the edge that are not visible in the images.
The red area shows the provided lidar point clouds, which extend beyond the area covered by the images. The blue area represents our adjusted ground truth.
This code is licensed under the MIT license. Note that this text refers only to the license for the code itself, independent of its thirdparty dependencies, which are separately licensed.
MIT License
Copyright (c) 2023 UseGeoEvaluation
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.