hloc package

SebastianEger · Jul 16, 2020 · dd45263 · dd45263
commit dd45263
Show file tree

Hide file tree

Showing 46 changed files with 241,360 additions and 0 deletions.
diff --git a/.gitattributes b/.gitattributes
@@ -0,0 +1 @@
+*.ipynb linguist-documentation
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,2 @@
+__pycache__
+*.pyc
diff --git a/.gitmodules b/.gitmodules
@@ -0,0 +1,6 @@
+[submodule "third_party/d2net"]
+	path = third_party/d2net
+	url = https://github.com/mihaidusmanu/d2-net.git
+[submodule "third_party/SuperGluePretrainedNetwork"]
+	path = third_party/SuperGluePretrainedNetwork
+	url = https://github.com/magicleap/SuperGluePretrainedNetwork.git
diff --git a/README.md b/README.md
@@ -0,0 +1,167 @@
+# hloc - the hierarchical localization toolbox
+
+This is `hloc`, a modular toolbox for state-of-the-art 6-DoF visual localization. It implements [Hierarchical Localization](https://arxiv.org/abs/1812.03506), leveraging image retrieval and feature matching, and is fast, accurate, and scalable. This codebase won the indoor/outdoor [localization challenge at CVPR 2020](https://sites.google.com/view/vislocslamcvpr2020/home), in combination with [SuperGlue](https://psarlin.com/superglue/), our graph neural network for feature matching.
+
+With `hloc`, you can:
+
+- Reproduce [our CVPR 2020 winning results](https://www.visuallocalization.net/workshop/cvpr/2020/) on outdoor (Aachen) and indoor (InLoc) datasets
+- Run Structure-from-Motion with SuperPoint+SuperGlue to localize with your own datasets
+- Evaluate your own local features or image retrieval for visual localization
+- Implement new localization pipelines and debug them easily 🔥
+
+<p align="center">
+  <a href="https://arxiv.org/abs/1812.03506"><img src="doc/hloc.png" width="60%"/></a>
+  <br /><em>Hierachical Localization uses both image retrieval and feature matching</em>
+</p>
+
+##
+
+## Installation
+
+`hloc` requires Python >=3.6, PyTorch >=1.1, and [COLMAP](https://colmap.github.io/index.html). Other minor dependencies are listed in `requirements.txt`.  For pose estimation, we use [pycolmap](https://github.com/mihaidusmanu/pycolmap), which can be installed as:
+
+```
+pip install git+https://github.com/mihaidusmanu/pycolmap
+```
+
+This codebase includes external local features as git submodules – don't forget to pull submodules with `git submodule update --init --recursive`. Your local features are based on TensorFlow? No problem! See [below](#using-your-own-local-features-or-matcher) for the steps.
+
+## General pipeline
+
+The toolbox is composed of scripts, which roughly perform the following steps:
+
+1. Extract SuperPoint local features for all database and query images
+2. Build a reference 3D SfM model
+   1. Find covisible database images, with retrieval or a prior SfM model
+   2. Match these database pairs with SuperGlue
+   3. Triangulate a new SfM model with COLMAP
+3. Find database images relevant to each query, using retrieval
+4. Match the query images with SuperGlue
+5. Run the localization
+6. Visualize and debug
+
+The localization can then be evaluated on [visuallocalization.net](https://www.visuallocalization.net/) for the supported datasets. When 3D Lidar scans are available, such as for the indoor dataset InLoc, step 2. can be skipped.
+
+Strcture of the toolbox:
+
+- `hloc/*.py` : top-level scripts
+- `hloc/extractors/` : interfaces for feature extractors
+- `hloc/matchers/` : interfaces for feature matchers
+
+## Tasks
+
+We provide step-by-step guides to localize with Aachen, InLoc, and to generate reference poses for your own data using SfM. Just download the datasets and you're reading to go!
+
+### Aachen – outdoor localization
+
+Have a look at [`pipeline_Aachen.ipynb`](./pipeline_Aachen.ipynb) for a step-by-step guide on localizing with Aachen. Play with the visualization, try new local features or matcher, and have fun! Don't like notebooks? You can also run all scripts from the command line.
+
+<p align="center">
+  <a href="./pipeline_Aachen.ipynb"><img src="doc/loc_aachen.svg" width="70%"/></a>
+</p>
+
+### InLoc – indoor localization
+
+The notebook [`pipeline_InLoc.ipynb`](./pipeline_InLoc.ipynb) shows the steps for localizing with InLoc. It's much simpler since a 3D SfM model is not needed.
+
+<p align="center">
+  <a href="./pipeline_InLoc.ipynb"><img src="doc/loc_inloc.svg" width="70%"/></a>
+</p>
+
+### SfM reconstruction from scratch
+
+We show in [`pipeline_SfM.ipynb`](./pipeline_SfM.ipynb) how to run 3D reconstruction for an unordered set of images. This generates reference poses, and a nice sparse 3D model suitable for localization with the same pipeline as Aachen.
+
+## Results
+
+`hloc` currently supports [SuperPoint](https://arxiv.org/abs/1712.07629) and [D2-Net](https://arxiv.org/abs/1905.03561) local feature extractors; and [SuperGlue](https://arxiv.org/abs/1911.11763) and Nearest Neighbor matchers. Using [NetVLAD](https://arxiv.org/abs/1511.07247) for retrieval, we obtain the following best results:
+
+| Methods                | Aachen day         | Aachen night       | Retrieval      |
+| ---------------------- | ------------------ | ------------------ | -------------- |
+| SuperPoint + SuperGlue | 89.6 / 95.4 / 98.8 | 86.7 / 93.9 / 100  | NetVLAD top 50 |
+| D2Net (SS) + NN        | 84.6 / 91.4 / 97.1 | 83.7 / 90.8 / 100  | NetVLAD top 30 |
+| SuperPoint + NN        | 85.4 / 93.3 / 97.2 | 75.5 / 86.7 / 92.9 | NetVLAD top 30 |
+
+| Methods                | InLoc DUC1         | InLoc DUC2         | Retrieval      |
+| ---------------------- | ------------------ | ------------------ | -------------- |
+| SuperPoint + SuperGlue | 46.5 / 65.7 / 78.3 | 52.7 / 72.5 / 79.4 | NetVLAD top 40 |
+| D2Net (SS) + NN        | 39.9 / 57.6 / 67.2 | 36.6 / 53.4 / 61.8 | NetVLAD top 20 |
+| SuperPoint + NN        | 39.9 / 55.6 / 67.2 | 37.4 / 57.3 / 70.2 | NetVLAD top 20 |
+
+Check out [visuallocalization.net/benchmark](https://www.visuallocalization.net/benchmark) for more details and additional baselines.
+
+## BibTex Citation
+
+If you report any of the above results in a publication, or use any of the tools provided here, please consider citing both [Hierarchical Localization](https://arxiv.org/abs/1812.03506) and [SuperGlue](https://arxiv.org/abs/1911.11763) papers:
+
+```
+@inproceedings{sarlin2019coarse,
+  title     = {From Coarse to Fine: Robust Hierarchical Localization at Large Scale},
+  author    = {Paul-Edouard Sarlin and
+               Cesar Cadena and
+               Roland Siegwart and
+               Marcin Dymczyk},
+  booktitle = {CVPR},
+  year      = {2019}
+}
+
+@inproceedings{sarlin2020superglue,
+  title     = {{SuperGlue}: Learning Feature Matching with Graph Neural Networks},
+  author    = {Paul-Edouard Sarlin and
+               Daniel DeTone and
+               Tomasz Malisiewicz and
+               Andrew Rabinovich},
+  booktitle = {CVPR},
+  year      = {2020},
+}
+```
+
+## Going further
+
+### Debugging and Visualization
+
+<details>
+<summary>[Click to expand]</summary>
+
+Each localization run generates a pickle log file. For each query, it contains the selected database images, their matches, and information from the pose solver, such as RANSAC inliers. It can thus be parsed to gather statistics and analyze failure modes or difficult scenarios. 
+
+We also provide some visualization tools in [`hloc/visualization.py`](./hloc/visualization.py) to visualize some attributes of the 3D SfM model, such as visibility of the keypoints, their track length, or estimated sparse depth (like below).
+
+<p align="center">
+  <a href="./pipeline_Aachen.ipynb"><img src="doc/depth_aachen.svg" width="60%"/></a>
+</p>
+</details>
+
+### Using your own local features or matcher
+
+<details>
+<summary>[Click to expand]</summary>
+
+If your code is based on PyTorch: simply add a new interface in [`hloc/extractors/`](hloc/extractors/) or [`hloc/matchers/`](hloc/matchers/). It needs to inherit from `hloc.utils.base_model.BaseModel`, take as input a data dictionary, and output a prediction dictionary. Have a look at `hloc/extractors/superpoint.py` for an example. You can additionally define a standard configuration in [`hloc/extract_features.py`](hloc/extract_features.py) or [`hloc/match_features.py`](hloc/match_features.py) - it can then be called directly from the command line.
+
+If your code is based on TensorFlow: you will need to either modify `hloc/extract_features.py` and `hloc/match_features.py`, or export yourself the features and matches to HDF5 files, described below.
+
+In a feature file, each key corresponds to the relative path of an image w.r.t. the dataset root (e.g. `db/1.jpg` for Aachen), and has one dataset per prediction (e.g. `keypoints` and `descriptors`, with shape Nx2 and DxN). 
+
+In a match file, each key corresponds to the string `path0.replace('/', '-')+'_'+path1.replace('/', '-')` and has a dataset `matches0` with shape N. It indicates, for each keypoint in the first image, the index of the matching keypoint in the second image, or `-1` if the keypoint is unmatched.
+</details>
+
+### Using your own image retrieval
+
+<details>
+<summary>[Click to expand]</summary>
+
+For now `hloc` does not have an interface for image retrieval. You will need to export the global descriptors into an HDF5 file, in which each key corresponds to the relative path of an image w.r.t. the dataset root, and contains a dataset `global_descriptor` with size D. You can then export the images pairs with [`hloc/pairs_from_retrieval.py`](hloc/pairs_from_retrieval.py).
+</details>
+
+## Contributions welcome!
+
+External contributions are very much welcome. This is a non-exaustive list of features that might be valuable additions:
+
+- [ ] visualization of the raw predictions (features and matches)
+- [ ] interfaces for image retrieval (e.g. [DIR](https://github.com/almazan/deep-image-retrieval), [NetVLAD](https://github.com/uzh-rpg/netvlad_tf_open))
+- [ ] other local features
+- [ ] more localization datasets (e.g. RobotCar Seasons and CMU Seasons)
+- [ ] covisibility clustering for InLoc
+
+Created and maintained by [Paul-Edouard Sarlin](https://psarlin.com/).
diff --git a/datasets/.gitignore b/datasets/.gitignore
@@ -0,0 +1,4 @@
+# Ignore everything in this directory
+*
+# Except this file
+!.gitignore
diff --git a/doc/depth_aachen.svg b/doc/depth_aachen.svg
diff --git a/doc/hloc.png b/doc/hloc.png
diff --git a/doc/loc_aachen.svg b/doc/loc_aachen.svg
diff --git a/doc/loc_inloc.svg b/doc/loc_inloc.svg
diff --git a/hloc/__init__.py b/hloc/__init__.py
@@ -0,0 +1,7 @@
+import logging
+import sys
+
+logging.basicConfig(stream=sys.stdout,
+                    format='[%(asctime)s %(levelname)s] %(message)s',
+                    datefmt='%m/%d/%Y %H:%M:%S',
+                    level=logging.INFO)
diff --git a/hloc/colmap_from_nvm.py b/hloc/colmap_from_nvm.py
@@ -0,0 +1,195 @@
+import argparse
+import sqlite3
+from tqdm import tqdm
+from collections import defaultdict
+import numpy as np
+from pathlib import Path
+import logging
+
+from .utils.read_write_model import Camera, Image, Point3D, CAMERA_MODEL_NAMES
+from .utils.read_write_model import write_model
+
+
+def recover_database_images_and_ids(database_path):
+    connection = sqlite3.connect(str(database_path))
+    cursor = connection.cursor()
+
+    images = {}
+    cameras = {}
+    cursor.execute("SELECT name, image_id, camera_id FROM images;")
+    for name, image_id, camera_id in cursor:
+        images[name] = image_id
+        cameras[name] = camera_id
+
+    cursor.close()
+    connection.close()
+
+    logging.info(
+        f'Found {len(images)} images and {len(cameras)} cameras in database.')
+    return images, cameras
+
+
+def quaternion_to_rotation_matrix(qvec):
+    qvec = qvec / np.linalg.norm(qvec)
+    w, x, y, z = qvec
+    R = np.array([
+        [1 - 2 * y * y - 2 * z * z, 2 * x * y - 2 * z * w, 2 * x * z + 2 * y * w],
+        [2 * x * y + 2 * z * w, 1 - 2 * x * x - 2 * z * z, 2 * y * z - 2 * x * w],
+        [2 * x * z - 2 * y * w, 2 * y * z + 2 * x * w, 1 - 2 * x * x - 2 * y * y]])
+    return R
+
+
+def camera_center_to_translation(c, qvec):
+    R = quaternion_to_rotation_matrix(qvec)
+    return (-1) * np.matmul(R, c)
+
+
+def read_nvm_model(
+        nvm_path, intrinsics_path, image_ids, camera_ids, skip_points=False):
+
+    with open(intrinsics_path, 'r') as f:
+        raw_intrinsics = f.readlines()
+
+    logging.info(f'Reading {len(raw_intrinsics)} cameras...')
+    cameras = {}
+    for intrinsics in raw_intrinsics:
+        intrinsics = intrinsics.strip('\n').split(' ')
+        name, camera_model, width, height = intrinsics[:4]
+        params = [float(p) for p in intrinsics[4:]]
+        camera_model = CAMERA_MODEL_NAMES[camera_model]
+        assert len(params) == camera_model.num_params
+        camera_id = camera_ids[name]
+        camera = Camera(
+            id=camera_id, model=camera_model.model_name,
+            width=int(width), height=int(height), params=params)
+        cameras[camera_id] = camera
+
+    nvm_f = open(nvm_path, 'r')
+    line = nvm_f.readline()
+    while line == '\n' or line.startswith('NVM_V3'):
+        line = nvm_f.readline()
+    num_images = int(line)
+    assert num_images == len(cameras)
+
+    logging.info(f'Reading {num_images} images...')
+    image_idx_to_db_image_id = []
+    image_data = []
+    i = 0
+    while i < num_images:
+        line = nvm_f.readline()
+        if line == '\n':
+            continue
+        data = line.strip('\n').split(' ')
+        image_data.append(data)
+        image_idx_to_db_image_id.append(image_ids[data[0]])
+        i += 1
+
+    line = nvm_f.readline()
+    while line == '\n':
+        line = nvm_f.readline()
+    num_points = int(line)
+
+    if skip_points:
+        logging.info(f'Skipping {num_points} points.')
+        num_points = 0
+    else:
+        logging.info(f'Reading {num_points} points...')
+    points3D = {}
+    image_idx_to_keypoints = defaultdict(list)
+    i = 0
+    pbar = tqdm(total=num_points, unit='pts')
+    while i < num_points:
+        line = nvm_f.readline()
+        if line == '\n':
+            continue
+
+        data = line.strip('\n').split(' ')
+        x, y, z, r, g, b, num_observations = data[:7]
+        obs_image_ids, point2D_idxs = [], []
+        for j in range(int(num_observations)):
+            s = 7 + 4*j
+            img_index, kp_index, kx, ky = data[s:s+4]
+            image_idx_to_keypoints[int(img_index)].append(
+                (int(kp_index), float(kx), float(ky), i))
+            db_image_id = image_idx_to_db_image_id[int(img_index)]
+            obs_image_ids.append(db_image_id)
+            point2D_idxs.append(kp_index)
+
+        point = Point3D(
+            id=i,
+            xyz=np.array([x, y, z], float),
+            rgb=np.array([r, g, b], int),
+            error=1.,  # fake
+            image_ids=np.array(obs_image_ids, int),
+            point2D_idxs=np.array(point2D_idxs, int))
+        points3D[i] = point
+
+        i += 1
+        pbar.update(1)
+    pbar.close()
+
+    logging.info('Parsing image data...')
+    images = {}
+    for i, data in enumerate(image_data):
+        # Skip the focal length. Skip the distortion and terminal 0.
+        name, _, qw, qx, qy, qz, cx, cy, cz, _, _ = data
+        qvec = np.array([qw, qx, qy, qz], float)
+        c = np.array([cx, cy, cz], float)
+        t = camera_center_to_translation(c, qvec)
+
+        if i in image_idx_to_keypoints:
+            # NVM only stores triangulated 2D keypoints: add dummy ones
+            keypoints = image_idx_to_keypoints[i]
+            point2D_idxs = np.array([d[0] for d in keypoints])
+            tri_xys = np.array([[x, y] for _, x, y, _ in keypoints])
+            tri_ids = np.array([i for _, _, _, i in keypoints])
+
+            num_2Dpoints = max(point2D_idxs) + 1
+            xys = np.zeros((num_2Dpoints, 2), float)
+            point3D_ids = np.full(num_2Dpoints, -1, int)
+            xys[point2D_idxs] = tri_xys
+            point3D_ids[point2D_idxs] = tri_ids
+        else:
+            xys = np.zeros((0, 2), float)
+            point3D_ids = np.full(0, -1, int)
+
+        image_id = image_ids[name]
+        image = Image(
+            id=image_id,
+            qvec=qvec,
+            tvec=t,
+            camera_id=camera_ids[name],
+            name=name,
+            xys=xys,
+            point3D_ids=point3D_ids)
+        images[image_id] = image
+
+    return cameras, images, points3D
+
+
+def main(nvm, intrinsics, database, output, skip_points=False):
+    assert nvm.exists(), nvm
+    assert intrinsics.exists(), intrinsics
+    assert database.exists(), database
+
+    image_ids, camera_ids = recover_database_images_and_ids(database)
+
+    logging.info('Reading the NVM model...')
+    model = read_nvm_model(
+        nvm, intrinsics, image_ids, camera_ids, skip_points=skip_points)
+
+    logging.info('Writing the COLMAP model...')
+    output.mkdir(exist_ok=True)
+    write_model(*model, path=str(output), ext='.bin')
+    logging.info('Done.')
+
+
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--nvm', required=True, type=Path)
+    parser.add_argument('--intrinsics', required=True, type=Path)
+    parser.add_argument('--database', required=True, type=Path)
+    parser.add_argument('--output', required=True, type=Path)
+    parser.add_argument('--skip_points', action='store_true')
+    args = parser.parse_args()
+    main(**args.__dict__)