MAVRL: Learn to Fly in Cluttered Environments with Varying Speed
Many existing obstacle avoidance algorithms overlook the crucial balance between safety and agility, especially in environments of varying complexity. In our study, we introduce an obstacle avoidance pipeline based on reinforcement learning. This pipeline enables drones to adapt their flying speed according to the environmental complexity. Moreover, to improve the obstacle avoidance performance in cluttered environments, we propose a novel latent space. The latent space in this representation is explicitly trained to retain memory of previous depth map observations. Our findings confirm that varying speed leads to a superior balance of success rate and agility in cluttered environments. Additionally, our memory-augmented latent representation outperforms the latent representation commonly used in reinforcement learning. Finally, after minimal fine-tuning, we successfully deployed our network on a real drone for enhanced obstacle avoidance.
Please refer to AvoidBench and check the dependency of installation. Run the following commands to setup:
# install Open3D
sudo apt update
sudo apt install git libtool build-essential cmake
git clone --recursive -b v0.9.0 https://github.com/isl-org/Open3D.git
cd Open3D
mkdir build
cd build
cmake ..
make -j
sudo make install
sudo apt update
sudo apt install libzmqpp-dev libopencv-dev unzip python3-catkin-tools
sudo apt install libgoogle-glog-dev protobuf-compiler ros-noetic-octomap-msgs ros-noetic-octomap-ros python3-vcstool
git clone [email protected]:tudelft/AvoidBench.git
cd AvoidBench/src/avoidbench/unity_scene/
wget https://data.4tu.nl/file/a21231b6-f867-40df-962d-27f9dc25f57a/f61dfc92-7659-4637-a355-e119a9ec4ac5
unzip -o AvoidBench.zip
rm AvoidBench.zip
echo "export AVOIDBENCH_PATH=path_to_this_project/AvoidBench/src/avoidbench" >> ~/.bashrc
Get the mavrl ros package:
cd AvoidBench/src
git clone [email protected]:tudelft/mavrl.git
Create conda environment:
cd mavrl
conda env create -f environment.yaml
Install reinforcement learning environment:
cd avoidbench/avoidlib/build
cmake ..
make -j
pip install .
Our pipeline comprises three main components: the VAE, LSTM, and PPO. The training process is as following:
- We begin by training a basic PPO policy, while the VAE and LSTM components are initially set to random. This foundational policy allows the drone to navigate to the target in environments without obstacles.
- This initial policy is utilized to gather a dataset, focused primarily on capturing a multitude of depth image sequences without the concern of collisions. Subsequently, we use this dataset for the training of the VAE, bypassing the LSTM phase in this step.
- Once the VAE is trained, we maintain the encoder in a fixed state and proceed to train the LSTM using the dataset generated by the initial policy.
- After training both the VAE and LSTM, we freeze them and retrain the PPO, adapting it to environments of varying complexity.
Start a terminal and run unity standalone
cd AvoidBench/src/avoidbench/unity_scene/
./AvoidBench/AvoidBench.x86_64
Start another terminal to train an initial policy:
cd AvoidBench/src/mavrl/
python train_policy.py --retrain 0 --train 1 --scene_id 1 # scene_id=0: indoor warehouse, scene_id=1: outdoor forest
We suggest to train around 200 iterations and use the last weight file as initial policy. Then use the initial policy to collect datasets for perception part (need to tun unity standalone first):
python collect_data.py --trial 1 --iter 200 --scene_id 1
where trial=1 and iter=200 means to load the weight from saved/RecurrentPPO_1/Policy/iter_00200.pth
. Set different --scene_id
to get both indoor and outdoor data.
Train Variational AutoEncoder (VAE) (This traning process don't need unity standalone):
python trainvae.py
Make sure you have built a folder exp_vae
in mavrl
. You can also download the VAE weight file from here that we already trained.
Then load VAE and train LSTM (This traning process don't need unity standalone):
python train_lstm_without_env.py --trial 1 --iter 200 --recon 1 1 0 --lstm_exp LSTM_110_0
where trial=1 and iter=200 means to load the weight from saved/RecurrentPPO_1/Policy/iter_00200.pth
. The argument recon
deicde if reconstruct past, current, and future depth or not. --recon 1 1 0
means reconstructing past and current depth. lstm_exp
defines the output folder name of LSTM training.
Load VAE and LSTM training result and retrain the policy (need to tun unity standalone first):
python train_policy.py --retrain 1 --trial 1 --iter 1950 --scene_id 1 --nocontrol 1
where trial=1 and iter=1950 means to load the weight from saved/RecurrentPPO_1/Policy/iter_01950.pth
, make sure to change the output folder name LSTM_xxx_x_0
to RecurrrntPPO_x
before retrain the policy. The argument ````--nocontrol``` decide if load the initial policy or train the policy from random set policy network.
Start a terminal and run unity standalone
cd AvoidBench/src/avoidbench/unity_scene/
./AvoidBench/AvoidBench.x86_64
Run the evaluation environments
python test_ppo.py --trial 2 --iter 20 --scene_id 1
Before run the network for benchmarking, please mention the checkpoint which you trained before in the config file config.yaml
ros:
input_update_freq: 10
use_depth: true
velocity_frame: wf
seq_len: 1
goal_obs_dim: 7
trial: 2 # if you want to load the checkpoint for the file of 'Recurrent_2'
iter: 200 # if you want to load the checkpoint named as 'iter_00600.pth'
pre_steps: 4
Compile AvoidBench
cd AvoidBench
catkin build
Start AvoidBench:
source devel/setup.bash
# run the launch file
roslaunch avoid_manage rotors_gazebo.launch
Open another terminal, start controller from agilicous:
source devel/setup.bash
roslaunch mavrl flying_hummingbird.launch
Open another terminal, run network inference:
source devel/setup.bash
roscd mavrl
python avoider_vel_cmd.py