This is a pytorch implementation of SfBC: Offline Reinforcement Learning via High-Fidelity Generative Behavior Modeling.
* For diffusion-based offline RL, we recommend trying our subsequent work, QGPO(paper; Github). Compared with SfBC, QGPO has improved computational efficiency and noticeably better performance.
- See conda requirements in
requirements.yml
Train the behavior model:
$ python3 train_behavior.py
Train the critic model and plot evaluation scores with tensorboard:
$ python3 train_critic.py
Evaluation only:
$ python3 evaluation.py
If you find this code release useful, please reference in your paper:
@inproceedings{
chen2023offline,
title={Offline Reinforcement Learning via High-Fidelity Generative Behavior Modeling},
author={Huayu Chen and Cheng Lu and Chengyang Ying and Hang Su and Jun Zhu},
booktitle={The Eleventh International Conference on Learning Representations },
year={2023},
}
- Contact us at: [email protected] if you have any question.