😊 SMILING: Score-Matching Imitation LearnING

This repository contains implementations of the algorithms presented in Diffusing States and Matching Scores: A New Framework for Imitation Learning

🛠️ Installation

Please run

conda create -n smiling python=3.10.14 -y
conda activate smiling
pip install -r requirements.txt

💻 Experiments

For all experiments, first activate the installed environment:

conda activate smiling

Obtain Expert Demonstrations

You can either download the expert policies and demos we used in our experiments, or train your own expert policies and generate demos yourself.

Option 1 (Recommended): Download Our Expert Policies and Demonstrations

The expert policies and demos we used can be downloaded here. After downloading, simply place the downloaded models folder in the root directory of this repo.

Option 2: Train Your Own Expert Policies

If you prefer to train your own expert policies, run

# To train expert policies for the DeepMind Control Suite envs (e.g., ball_in_cup_catch):
python train.py env=ball_in_cup_catch agent=sac

# To train expert policies for the HumanoidBench envs (e.g., humanoidbench_h1-crawl-v0):
python train.py env=humanoidbench_h1-crawl-v0 agent=dreamerv3

Training is set to run for 1e8 steps by default. You can stop it at any time once the expert performance is good enough, or you can set the expert_num_train_steps argument to reduce the number of steps. You can also train expert policies for other environments by changing the env argument. The list of environments can be found in the env list section.

Then, to generate state-only demonstrations, run

# To generate expert demos for DeepMind Control Suite envs (e.g., ball_in_cup_catch):
python gen.py env=ball_in_cup_catch agent=sac expert_path=[PATH_TO_YOUR_EXPERT]

# To generate expert demos for the HumanoidBench envs (e.g., humanoidbench_h1-crawl-v0):
python gen.py env=humanoidbench_h1-crawl-v0 agent=dreamerv3 expert_path=[PATH_TO_YOUR_EXPERT]

Replace [PATH_TO_YOUR_EXPERT] with the directory of your expert policy. For example, if your expert policy is saved in models/ball_in_cup_catch_08.25.16.44.10.520979/100000, set [PATH_TO_YOUR_EXPERT] to ball_in_cup_catch_08.25.16.44.10.520979/100000 (omit the models/). The demonstrations will be saved in the same directory as the expert policy. To generate demonstrations with both states and actions, simply add use_action=true to the commands above.

Environment List

We used the following environments in our experiments:

DeepMind Control Suite: ball_in_cup_catch, cheetah_run, humanoid_walk
HumanoidBench: humanoidbench_h1-sit_simple-v0, humanoidbench_h1-crawl-v0, humanoidbench_h1-pole-v0

We only support using SAC for DeepMind Control Suite and DreamerV3 for HumanoidBench and do not support other combinations (e.g., using SAC for HumanoidBench). This applied to all experiments in this repository.

Note

The humanoid_bench folder in this repo is copied from the original repo, with the only modification being that we updated the tasks.py file to fix the episode length to 1,000 steps without early termination. Note that we didn't test our algorithm with flexible episode lengths (i.e., with early termination). Also, you are encouraged to install HumanoidBench directly from their original repo and manually apply this modification to acknowledge their contribution and also ensure you have the latest version.

Run SMILING

Once the expert demonstrations are ready, we can proceed to run our algorithm. In the following steps, we assume you are using our generated expert policies and demonstrations. To switch to your own expert policies, simply modify the expert_path argument accordingly.

To reproduce our results of learning from state-only demonstrations, run

# For DeepMind Control Suite envs (e.g., ball_in_cup_catch):
python main.py env=ball_in_cup_catch expert_path=ball_in_cup_catch agent=sac alg=smiling

# For HumanoidBench envs (e.g., humanoidbench_h1-crawl-v0):
python main.py env=humanoidbench_h1-crawl-v0 expert_path=humanoidbench_crawl agent=dreamerv3 alg=smiling agent.run.num_envs=2 alg.t_sample_cnt=100

Training is set to run for 1e5 epochs by default (equivalent to 1e8 env steps). You can stop it at any time, or reduce the number of epochs by setting a smaller value for the epochs argument.

To reproduce the results of learning from state-action demonstrations, simply add use_action=true to the commands above.

To reproduce the results of linear score function, simply add use_linear=true to the commands. For example:

python main.py env=cheetah_run expert_path=cheetah_run agent=sac alg=smiling use_linear=true

To run the code with different seeds, simply set the seed argument. Our reported results are averaged over seeds 0, 1, 2, 3, and 4.

Run Baselines

DAC is already implemented in our repo. To run it, simply replace alg=smiling with alg=dac in the commands. For example, to reproduce DAC's results of learning from state-only demonstrations, run

# For DeepMind Control Suite envs (e.g., ball_in_cup_catch):
python main.py env=ball_in_cup_catch expert_path=ball_in_cup_catch agent=sac alg=dac

# For HumanoidBench envs (e.g., humanoidbench_h1-crawl-v0):
python main.py env=humanoidbench_h1-crawl-v0 expert_path=humanoidbench_crawl agent=dreamerv3 alg=dac agent.run.num_envs=2

To reproduce the results of DAC learning from state-action demonstrations or using linear discriminator, simply add use_action=true or use_linear=true to the commands.

For IQ-Learn, we used the official implementation, which can be found here.

💐 Acknowledgements

Our SAC implementation is based on pytorch_sac, with only minor modifications made to integrate it into our framework. DreamerV3 is built upon the implementation provided in humanoid-bench, which is derived from dreamerv3.

Our DAC is based on the official repository: google-research/dac. The results for IQ-Learn were generated using the official implementation from IQ-Learn.

📝 Citation

@article{wu2024diffusing,
  title={Diffusing States and Matching Scores: A New Framework for Imitation Learning},
  author={Wu, Runzhe and Chen, Yiding and Swamy, Gokul and Brantley, Kiant{\'e} and Sun, Wen},
  journal={arXiv preprint arXiv:2410.13855},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
agent		agent
algorithm		algorithm
config		config
dmc2gym		dmc2gym
fig		fig
humanoid_bench		humanoid_bench
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
eval.py		eval.py
gen.py		gen.py
logger.py		logger.py
main.py		main.py
replay_buffer.py		replay_buffer.py
requirements.txt		requirements.txt
train.py		train.py
utils.py		utils.py
video.py		video.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

😊 SMILING: Score-Matching Imitation LearnING

🛠️ Installation

💻 Experiments

Obtain Expert Demonstrations

Option 1 (Recommended): Download Our Expert Policies and Demonstrations

Option 2: Train Your Own Expert Policies

Environment List

Run SMILING

Run Baselines

💐 Acknowledgements

📝 Citation

About

Releases

Packages

Languages

License

ziqian2000/SMILING

Folders and files

Latest commit

History

Repository files navigation

😊 SMILING: Score-Matching Imitation LearnING

🛠️ Installation

💻 Experiments

Obtain Expert Demonstrations

Option 1 (Recommended): Download Our Expert Policies and Demonstrations

Option 2: Train Your Own Expert Policies

Environment List

Run SMILING

Run Baselines

💐 Acknowledgements

📝 Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages