🎉 This project has been a pleasure, allowing me to repay technical debt, learn how to locate bugs during model deployment, gain experience with GitHub Actions, and explore CUDA programming. I greatly appreciate the valuable feedback from others that has contributed to improving the project. I hope that this project will be of use to you.
-
Use more newer dependencies and APIs. Specifically, we deploy the RangeNet repository in an environment with TensorRT 8+, Ubuntu 20.04+, remove Boost dependency, manage TensorRT objects and GPU memory with smart pointers, and provide ROS demo.
-
Faster Performance. Resolve the issue of reduced segmentation accuracy when using FP16 (issue#9), achieving a significant speed boost without sacrificing accuracy. Preprocess data using CUDA. Perform KNN post-processing with libtorch ( refer to here).
We provide a Docker installation, please see more in docker/README.md
Step 1: Download and Extract libtorch
Note
Using the Torch library from Conda was observed to slow down the post-processing stage from 6 ms to 30 ms.
$ wget -c https://download.pytorch.org/libtorch/cu113/libtorch-cxx11-abi-shared-with-deps-1.10.2%2Bcu113.zip -O libtorch.zip
$ unzip libtorch.zip
Step 2: Set up the deep learning environment (install NVIDIA driver, CUDA, TensorRT, cuDNN). The tested configurations are listed below. At least 3000 MB of GPU memory is required.
Ubuntu | GPU | TensorRT | CUDA | cuDNN | — |
---|---|---|---|---|---|
20.04 | TITAN RTX | 8.2.3 | CUDA 11.4.r11.4 | cuDNN 8.2.4 | ✔️ |
20.04 | NVIDIA GeForce RTX 3060 | 8.4.1.5 | CUDA 11.3.r11.3 | cuDNN 8.0.5 | ✔️ |
20.04 | NVIDIA GeForce RTX 3060 NVIDIA GeForce RTX 4070 |
10.6.0.26 | CUDA 11.1.105 | cuDNN 8.0.5.39 | ✔️ |
20.04 | NVIDIA GeForce RTX 3060 NVIDIA GeForce RTX 4070 |
10.6.0.26 | CUDA 12.4.r12.4 | cuDNN 9.1.0.70-1 | ✔️ |
22.04 | NVIDIA GeForce RTX 3060 | 8.2.5.1 | CUDA 11.3.r11.3 | cuDNN 8.8.0 | ✔️ |
22.04 | NVIDIA GeForce RTX 3060 | 8.4.1.5 | CUDA 11.3.r11.3 | cuDNN 8.8.0 | ✔️ |
22.04 | NVIDIA GeForce RTX 3060 | 8.4.3.1 | CUDA 11.3.r11.3 | cuDNN 8.8.0 | ✔️ |
22.04 | NVIDIA GeForce RTX 3060 | 8.6.1.6 | CUDA 11.3.r11.3 | cuDNN 8.8.0 | ✔️ |
22.04 | NVIDIA GeForce RTX 3060 | 10.6.0.26 | CUDA 11.3.r11.3 | cuDNN 8.8.0 | ✔️ |
Note
You must choose the appropriate version of CUDA based on your Compute Capability. For example, if your want to use Compute Capability 89, you must choose CUDA 11.8+.
You can see Compute Capability
in https://developer.nvidia.com/cuda-gpus#compute.
GPU Hardware Architecture | Compute Capability | Relevant GPUs | Minimum CUDA Version |
---|---|---|---|
Ampere Architecture | 86 | RTX 3060,RTX3070,RTX 3080,RTX 3090 | CUDA 11.1 |
Ada Lovelace Architecture | 89 | RTX 4090, RTX 4080 | CUDA 11.8 |
Note
You must choose the appropriate version of CUDA based on your nvidia-driver.
nvidia-driver Version | Maximum CUDA Version |
---|---|
545 | CUDA 12.3 |
550 | CUDA 12.4 |
Add the following environment variables to ~/.bashrc:
# Example configuration:
# >>> Deep Learning Configuration >>>
# Import CUDA environment
CUDA_PATH=/usr/local/cuda/bin
CUDA_LIB_PATH=/usr/local/cuda/lib64
# Import TensorRT environment
export TENSORRT_DIR=${HOME}/Application/TensorRT-8.4.1.5/
TENSORRT_PATH=${TENSORRT_DIR}/bin
TENSORRT_LIB_PATH=${TENSORRT_DIR}/lib
# Import libtorch environment
export Torch_DIR=${HOME}/Application/libtorch/share/cmake/Torch
export PATH=${PATH}:${CUDA_PATH}:${TENSORRT_PATH}
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${CUDA_LIB_PATH}:${TENSORRT_LIB_PATH}
Step 3: (Optional, if ROS components are needed). Please install ROS1 (Noetic) or ROS2 (Humble).
# Install ROS
$ ...
# Install extra dependency
$ sudo apt install ros-${ROS_DISTRO}-pcl-ros
Step 4: Install apt-related and Python packages
$ sudo apt install build-essential python3-dev python3-pip apt-utils git cmake libboost-all-dev libyaml-cpp-dev libopencv-dev python3-empy libfmt-dev
$ pip install catkin_tools trollius numpy
Step 5: Clone the Repository
$ git clone https://github.com/Natsu-Akatsuki/RangeNet-TensorRT ~/rangenet/src/rangenet/
Step 6: Import model files and datasets.
# Download model files
$ wget -c https://github.com/Natsu-Akatsuki/RangeNet-TensorRT/releases/download/v0.0.0-alpha/model.onnx -O ~/rangenet/src/rangenet/model/model.onnx
Download datasets: see Baidu Cloud.
Directory Structure
.
├── model
│ ├── arch_cfg.yaml
│ ├── data_cfg.yaml
│ └── model.onnx
├── data
└── ├── 000000.pcd
├── kitti_2011_09_30_drive_0027_synced
└── kitti_2011_09_30_drive_0027_synced.bag
Note
The first run may take some time to generate the TensorRT optimized engine.
Note
Since we use set(CMAKE_CUDA_STANDARD 17), a feature introduced in CMake 3.18, it requires at least version 3.18. Unfortunately, the default CMake version in Ubuntu 20.04 is 3.16.3. Therefore, we provide a workaround to use a higher version of CMake with minimal effort.
$ pip3 install --user cmake==3.18
$ echo 'export PATH=${HOME}/.local/bin:${PATH}' >> ~/.bashrc
🔧 Usage 1: Run data in ROS1 or ROS2
# >>> ROS1 >>>
$ cd ~/rangetnet/
# USE -Wno-dev to suppress PCL WARNING
$ catkin build --cmake-args -Wno-dev
$ source devel/setup.bash
$ roslaunch rangenet_pp ros1_rangenet.launch
$ roslaunch rangenet_pp ros1_bag.launch
# >>> ROS2 >>>
$ cd ~/rangetnet/
$ colcon build --symlink-install
$ source install/setup.bash
$ ros2 launch rangenet_pp ros2_rangenet.launch
$ ros2 launch rangenet_pp ros2_bag.launch
🔧 Usage 2: Predict single-frame point clouds (PCD format)
[!note] PCD point cloud fields must be xyzi, and the intensity field should be normalized (0-1).
# Modify the parameters in config/infer.yaml
$ cd ~/rangenet/src/rangenet/
$ mkdir build
$ cd build
# To display inference time: cmake -DPERFORMANCE_LOG=ON .. && make
$ unset ROS_VERSION && cmake -Wno-dev .. && make -j4
$ ./demo
Step | Time |
---|---|
Preprocessing | 1.51363 ms |
Inference | 21.8513 ms |
Postprocessing | 4.98176 ms |
❓ Issue 1: [libprotobuf ERROR google/protobuf/text_format.cc:298] Error parsing text-format onnx2trt_onnx.ModelProto: 1:1:
The ONNX model is incomplete. Please Re-download the model.
❓ Issue 2: Segmentation fault [Process finished with exit code 139 (interrupted by signal 11:SIGSEGV)] when visualizing single point cloud frames in Ubuntu 22.04 using PCL.
Use PCL library version 1.13.0+. Please provide variable PCL_DIR
in cmake/ThirdParty.cmake
. See more in Here.
- Test ROS1 demo
- Resolve issue#8 (2023.07.01)
- Add English documentation (2024.11.19)
- Explain why using FP16 leads to precision degradation [See more in Here] (2024.11.28)
- Provide a Docker environment (2024.11.30)
- Add Pybind11 implementation
- Resolve non-reproducibility
- Refactor code to follow coding standards and improve readability