This work shows ways to reuse policies trained to solve a set of training tasks, specified by linear temporal logic (LTL), to solve novel LTL tasks in a zero-shot manner. Please see the following paper for more details.
Skill Transfer for Temporally-Extended Task Specifications [Liu, Shah, Rosen, Konidaris, Tellex 2022]
You might clone this repository by running:
git clone https://github.com/jasonxyliu/ltl_transfer.git
Training state-centric policies with LPOPL requires Python3.5 with three libraries: numpy, tensorflow, and sympy. Python 3.7 should also work.
Transfer Learning requires dill, NetworkX, Matplotlib, and mpi4py if use on a cluster.
Visualization requires pillow
Install all dependencies in a conda environment by running the following command
conda create -n ltl_transfer python=3.7 numpy sympy dill networkx matplotlib pillow tensorflow=1 # tensorflow 1.15
To learn state-centric policies
python3 run_experiments.py --algo=lpopl --train_type=mixed --train_size=50 --map=0 --prob=0.7 --total_steps=800000
To compile transition-centric options and perform zero-shot transfer on a local machine
python run_experiments.py --algo=zero_shot_transfer --train_type=mixed --train_size=50 --test_type=soft --map=0 --prob=0.7 --relabel_method=local
Reduce RELABEL_CHUNK_SIZE
to 21 in transfer.py
if run the above Python script slows down your machine too much. It controls how many parallel processes are running at a time.
To compile transition-centric options and perform zero-shot transfer on a cluster
python run_experiments.py --algo=zero_shot_transfer --train_type=mixed --train_size=50 --test_type=soft --map=0 --prob=0.7 --relabel_method=cluster
To visualize initiation set classifiers
python visualize_classifiers.py --algo=lpopl --tasks_id=4 --map_id=0 --ltl_id=12 --simple_vis
You might generate new random maps using the code in src/map_generator.py. The only parameter required is the random seed to be used. The resulting map will be displayed in the console along with the number of steps that an optimal policy would need to solve the "sequence", "interleaving", and "safety" tasks (this value is computed using value iteration and might take a few minutes):
python3 map_generator.py --create_map --seed=0
It is also possible to automatically look for adversarial maps for the Hierarchical RL baseline. To do so, we generate num_eval_maps random maps and rank them according to the difference between the reward obtained by an optimal policy and the reward obtained by an optimal myopic policy. The code will display the random seeds of the top num_adv_maps ranked maps. (You might then display those maps using the --create_map flag.)
python3 map_generator.py --adversarial --num_adv_maps=5 --num_eval_maps=1000
Our implementation is developed on top of the LPOPL codebase
Please let us know if you spot any bug or have any question. We are happy to help!