Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update ScanNet #21

Merged
merged 3 commits into from
Jul 21, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 4 additions & 32 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
# ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection

**News**:
* :fire: July, 2021. We update `ScanNet` image preprocessing both [here](https://github.com/saic-vul/imvoxelnet/pull/21) and in [mmdetection3d](https://github.com/open-mmlab/mmdetection3d/pull/696).
* :fire: June, 2021. `ImVoxelNet` for `KITTI` is now [supported](https://github.com/open-mmlab/mmdetection3d/tree/master/configs/imvoxelnet) in [mmdetection3d](https://github.com/open-mmlab/mmdetection3d).

This repository contains implementation of the monocular/multi-view 3D object detector ImVoxelNet, introduced in our paper:
Expand Down Expand Up @@ -38,7 +39,7 @@ We support three benchmarks based on the **SUN RGB-D** dataset.
you should follow the instructions in [sunrgbd](data/sunrgbd).
* For the [PerspectiveNet](https://papers.nips.cc/paper/2019/hash/b87517992f7dce71b674976b280257d2-Abstract.html)
benchmark with 30 object categories, the same instructions can be applied;
you only need to pass `--dataset sunrgbd_monocular` when running `create_data.py`.
you only need to set `dataset` argument to `sunrgbd_monocular` when running `create_data.py`.
* The [Total3DUnderstanding](https://github.com/yinyunie/Total3DUnderstanding)
benchmark implies detecting objects of 37 categories along with camera pose and room layout estimation.
Download the preprocessed data as
Expand All @@ -49,38 +50,9 @@ We support three benchmarks based on the **SUN RGB-D** dataset.
python tools/data_converter/sunrgbd_total.py
```

**ScanNet.** Please follow instructions in [scannet](data/scannet).
Note that `create_data.py` works with point clouds, not RGB images; thus, you should do some preprocessing before running `create_data.py`.
1. First, you should obtain RGB images. We recommend using a script from [SensReader](https://github.com/ScanNet/ScanNet/tree/master/SensReader/python).
2. Then, copy the camera pose `.txt` files and `.jpg` images to the `scannet/sens_reader` folder.
3. Copy axis alignment matrix `.txt` files to the `scannet/txts` folder.
4. Move the results of `batch_load_scannet_data.py` to the `scannet/mmdetection3d` folder. Final directory structure:
```
scannet
├── sens_reader
│ ├── scans
│ │ ├── scene0000_00
│ │ │ ├── out
│ │ │ │ ├── frame-000001.color.jpg
│ │ │ │ ├── frame-000001.pose.txt
│ │ │ │ ├── frame-000002.color.jpg
│ │ │ │ ├── ...
│ │ ├── ...
├── txts
│ ├── scene0000_00.txt
│ ├── ...
├── mmdetection3d
│ ├── scene0000_00_bbox.npy
│ ├── scene0000_00_ins_label.npy
│ ├── scene0000_00_sem_label.npy
│ ├── scene0000_00_vert.npy
│ ├── scene0000_01_bbox.npy
│ ├── ...
```
Now, you may run `create_data.py` with `--dataset scannet_monocular`.

For **ScanNet** please follow instructions in [scannet](data/scannet).
For **KITTI** and **nuScenes**, please follow instructions in [getting_started.md](docs/getting_started.md).
For `nuScenes`, set `--dataset nuscenes_monocular`.
For `nuScenes`, set `dataset` argument to `nuscenes_monocular`.

### Getting Started

Expand Down
32 changes: 27 additions & 5 deletions data/scannet/README.md
Original file line number Diff line number Diff line change
@@ -1,23 +1,30 @@
### Prepare ScanNet Data
### Prepare ScanNet Data for Indoor Detection or Segmentation Task

We follow the procedure in [votenet](https://github.com/facebookresearch/votenet/).

1. Download ScanNet v2 data [HERE](https://github.com/ScanNet/ScanNet). Link or move the 'scans' folder to this level of directory.
1. Download ScanNet v2 data [HERE](https://github.com/ScanNet/ScanNet). Link or move the 'scans' folder to this level of directory. If you are performing segmentation tasks and want to upload the results to its official [benchmark](http://kaldir.vc.in.tum.de/scannet_benchmark/), please also link or move the 'scans_test' folder to this directory.

2. In this directory, extract point clouds and annotations by running `python batch_load_scannet_data.py`. Add the `--max_num_point 50000` flag if you only use the ScanNet data for the detection task. It will downsample the scenes to less points.

3. In this directory, extract RGB image with poses by running `python extract_posed_images.py`. This step is optional. Skip it if you don't plan to use multi-view RGB images. Add `--max-images-per-scene -1` to disable limiting number of images per scene. ScanNet scenes contain up to 5000+ frames per each. After extraction, all the .jpg images require 2 Tb disk space. The recommended 300 images per scene require less then 100 Gb. For example multi-view 3d detector ImVoxelNet samples 50 and 100 images per training and test scene.

2. In this directory, extract point clouds and annotations by running `python batch_load_scannet_data.py`.
4. Enter the project root directory, generate training data by running

3. Enter the project root directory, generate training data by running
```bash
python tools/create_data.py scannet --root-path ./data/scannet --out-dir ./data/scannet --extra-tag scannet
```

The overall process could be achieved through the following script

```bash
python batch_load_scannet_data.py
python extract_posed_images.py
cd ../..
python tools/create_data.py scannet --root-path ./data/scannet --out-dir ./data/scannet --extra-tag scannet
```

The directory structure after pre-processing should be as below

```
scannet
├── scannet_utils.py
Expand All @@ -26,11 +33,26 @@ scannet
├── scannet_utils.py
├── README.md
├── scans
├── scannet_train_instance_data
├── scans_test
├── scannet_instance_data
├── points
│ ├── xxxxx.bin
├── instance_mask
│ ├── xxxxx.bin
├── semantic_mask
│ ├── xxxxx.bin
├── seg_info
│ ├── train_label_weight.npy
│ ├── train_resampled_scene_idxs.npy
│ ├── val_label_weight.npy
│ ├── val_resampled_scene_idxs.npy
├── posed_images
│ ├── scenexxxx_xx
│ │ ├── xxxxxx.txt
│ │ ├── xxxxxx.jpg
│ │ ├── intrinsic.txt
├── scannet_infos_train.pkl
├── scannet_infos_val.pkl
├── scannet_infos_test.pkl

```
124 changes: 83 additions & 41 deletions data/scannet/batch_load_scannet_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,58 +16,81 @@
from load_scannet_data import export
from os import path as osp

SCANNET_DIR = 'scans'
DONOTCARE_CLASS_IDS = np.array([])
OBJ_CLASS_IDS = np.array(
[3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 14, 16, 24, 28, 33, 34, 36, 39])


def export_one_scan(scan_name, output_filename_prefix, max_num_point,
label_map_file, scannet_dir):
def export_one_scan(scan_name,
output_filename_prefix,
max_num_point,
label_map_file,
scannet_dir,
test_mode=False):
mesh_file = osp.join(scannet_dir, scan_name, scan_name + '_vh_clean_2.ply')
agg_file = osp.join(scannet_dir, scan_name,
scan_name + '.aggregation.json')
seg_file = osp.join(scannet_dir, scan_name,
scan_name + '_vh_clean_2.0.010000.segs.json')
# includes axisAlignment info for the train set scans.
meta_file = osp.join(scannet_dir, scan_name, f'{scan_name}.txt')
mesh_vertices, semantic_labels, instance_labels, instance_bboxes, \
instance2semantic = export(mesh_file, agg_file, seg_file,
meta_file, label_map_file, None)

mask = np.logical_not(np.in1d(semantic_labels, DONOTCARE_CLASS_IDS))
mesh_vertices = mesh_vertices[mask, :]
semantic_labels = semantic_labels[mask]
instance_labels = instance_labels[mask]

num_instances = len(np.unique(instance_labels))
print(f'Num of instances: {num_instances}')

bbox_mask = np.in1d(instance_bboxes[:, -1], OBJ_CLASS_IDS)
instance_bboxes = instance_bboxes[bbox_mask, :]
print(f'Num of care instances: {instance_bboxes.shape[0]}')

N = mesh_vertices.shape[0]
if N > max_num_point:
choices = np.random.choice(N, max_num_point, replace=False)
mesh_vertices = mesh_vertices[choices, :]
semantic_labels = semantic_labels[choices]
instance_labels = instance_labels[choices]
mesh_vertices, semantic_labels, instance_labels, unaligned_bboxes, \
aligned_bboxes, instance2semantic, axis_align_matrix = export(
mesh_file, agg_file, seg_file, meta_file, label_map_file, None,
test_mode)

if not test_mode:
mask = np.logical_not(np.in1d(semantic_labels, DONOTCARE_CLASS_IDS))
mesh_vertices = mesh_vertices[mask, :]
semantic_labels = semantic_labels[mask]
instance_labels = instance_labels[mask]

num_instances = len(np.unique(instance_labels))
print(f'Num of instances: {num_instances}')

bbox_mask = np.in1d(unaligned_bboxes[:, -1], OBJ_CLASS_IDS)
unaligned_bboxes = unaligned_bboxes[bbox_mask, :]
bbox_mask = np.in1d(aligned_bboxes[:, -1], OBJ_CLASS_IDS)
aligned_bboxes = aligned_bboxes[bbox_mask, :]
assert unaligned_bboxes.shape[0] == aligned_bboxes.shape[0]
print(f'Num of care instances: {unaligned_bboxes.shape[0]}')

if max_num_point is not None:
max_num_point = int(max_num_point)
N = mesh_vertices.shape[0]
if N > max_num_point:
choices = np.random.choice(N, max_num_point, replace=False)
mesh_vertices = mesh_vertices[choices, :]
if not test_mode:
semantic_labels = semantic_labels[choices]
instance_labels = instance_labels[choices]

np.save(f'{output_filename_prefix}_vert.npy', mesh_vertices)
np.save(f'{output_filename_prefix}_sem_label.npy', semantic_labels)
np.save(f'{output_filename_prefix}_ins_label.npy', instance_labels)
np.save(f'{output_filename_prefix}_bbox.npy', instance_bboxes)


def batch_export(max_num_point, output_folder, train_scan_names_file,
label_map_file, scannet_dir):
if not test_mode:
np.save(f'{output_filename_prefix}_sem_label.npy', semantic_labels)
np.save(f'{output_filename_prefix}_ins_label.npy', instance_labels)
np.save(f'{output_filename_prefix}_unaligned_bbox.npy',
unaligned_bboxes)
np.save(f'{output_filename_prefix}_aligned_bbox.npy', aligned_bboxes)
np.save(f'{output_filename_prefix}_axis_align_matrix.npy',
axis_align_matrix)


def batch_export(max_num_point,
output_folder,
scan_names_file,
label_map_file,
scannet_dir,
test_mode=False):
if test_mode and not os.path.exists(scannet_dir):
# test data preparation is optional
return
if not os.path.exists(output_folder):
print(f'Creating new data folder: {output_folder}')
os.mkdir(output_folder)

train_scan_names = [line.rstrip() for line in open(train_scan_names_file)]
for scan_name in train_scan_names:
scan_names = [line.rstrip() for line in open(scan_names_file)]
for scan_name in scan_names:
print('-' * 20 + 'begin')
print(datetime.datetime.now())
print(scan_name)
Expand All @@ -78,7 +101,7 @@ def batch_export(max_num_point, output_folder, train_scan_names_file,
continue
try:
export_one_scan(scan_name, output_filename_prefix, max_num_point,
label_map_file, scannet_dir)
label_map_file, scannet_dir, test_mode)
except Exception:
print(f'Failed export scan: {scan_name}')
print('-' * 20 + 'done')
Expand All @@ -88,14 +111,18 @@ def main():
parser = argparse.ArgumentParser()
parser.add_argument(
'--max_num_point',
default=50000,
default=None,
help='The maximum number of the points.')
parser.add_argument(
'--output_folder',
default='./scannet_train_instance_data',
default='./scannet_instance_data',
help='output folder of the result.')
parser.add_argument(
'--scannet_dir', default='scans', help='scannet data directory.')
'--train_scannet_dir', default='scans', help='scannet data directory.')
parser.add_argument(
'--test_scannet_dir',
default='scans_test',
help='scannet data directory.')
parser.add_argument(
'--label_map_file',
default='meta_data/scannetv2-labels.combined.tsv',
Expand All @@ -104,10 +131,25 @@ def main():
'--train_scan_names_file',
default='meta_data/scannet_train.txt',
help='The path of the file that stores the scan names.')
parser.add_argument(
'--test_scan_names_file',
default='meta_data/scannetv2_test.txt',
help='The path of the file that stores the scan names.')
args = parser.parse_args()
batch_export(args.max_num_point, args.output_folder,
args.train_scan_names_file, args.label_map_file,
args.scannet_dir)
batch_export(
args.max_num_point,
args.output_folder,
args.train_scan_names_file,
args.label_map_file,
args.train_scannet_dir,
test_mode=False)
batch_export(
args.max_num_point,
args.output_folder,
args.test_scan_names_file,
args.label_map_file,
args.test_scannet_dir,
test_mode=True)


if __name__ == '__main__':
Expand Down
Loading