This repository provides a survey on the applications of deep generative models for offline reinforcement learning and imitation learning. We cover multiple deep generative models, including VAEs, GANs, Normalizing Flows, Transformers, and Diffusion Models.
The paper has been accepted in TMLR with a Survey Certification: https://openreview.net/pdf?id=Mm2cMDl9r5
The content of the paper has been significantly improved during the submission process compared to the initial version. Please consider citing this paper:
@article{chen2024deep,
title={Deep Generative Models for Offline Policy Learning: Tutorial, Survey, and Perspectives on Future Directions},
author={Chen, Jiayu and Ganguly, Bhargav and Xu, Yang and Mei, Yongsheng and Lan, Tian and Aggarwal, Vaneet},
journal={arXiv preprint arXiv:2402.13777},
year={2024}
}
- [ICLR 2014] Auto-Encoding Variational Bayes
- [Neurips 2015] A Recurrent Latent Variable Model for Sequential Data
- [Neurips 2015] Learning Structured Output Representation using Deep Conditional Generative Models
- [Neurips 2015] A Recurrent Latent Variable Model for Sequential Data
- [ICLR 2016] beta-vae: Learning basic visual concepts with a constrained variational framework
- [Neurips 2017] Neural Discrete Representation Learning
- [ICLR 2017] DEEP VARIATIONAL INFORMATION BOTTLENECK
- [CVPR 2018] Cross-modal Deep Variational Hand Pose Estimation
- [FTML 2019] An Introduction to Variational Autoencoders
- [IEEE FG 2020] Gated Variational AutoEncoders: Incorporating Weak Supervision to Encourage Disentanglement
- [AAAI 2022] State Deviation Correction for Offine Reinforcement Learning
- [ICRA 2018] Vision-Based Multi-Task Manipulation for Inexpensive Robots Using End-To-End Learning from Demonstration
- [ICCAS 2019] Path Tracking Control Using Imitation Learning with Variational Auto-Encoder
- [ICML 2019] CompILE: Compositional Imitation Learning and Execution
- [CoRL 2019] Task-Conditioned Variational Autoencoders for Learning Movement Primitives
- [Neurips 2019] Causal Confusion in Imitation Learning
- [ArXiv 2019] Trajectory VAE for multi-modal imitation
- [CoRL 2020] Generalization Guarantees for Imitation Learning
- [CVPR 2020] Imitative Non-Autoregressive Modeling for Trajectory Forecasting and Imputation
- [IROS 2020] Learning Visuomotor Policies for Aerial Navigation Using Cross-Modal Representations
- [ICML 2020] Intrinsic Reward Driven Imitation Learning via Generative Model
- [ArXiv 2020] Complex Skill Acquisition Through Simple Skill Imitation Learning
- [IROS 2021] Self-Supervised Disentangled Representation Learning for Third-Person Imitation Learning
- [Neurips 2021] Object-Aware Regularization for Addressing Causal Confusion in Imitation Learning
- [Neurips 2021] An Empirical Investigation of Representation Learning for Imitation
- [RSS 2021] Language Conditioned Imitation Learning over Unstructured Data
- [IROS 2022] SKILL-IL: Disentangling Skill and Knowledge in Multitask Imitation Learning
- [CoRL 2022] Learning and Retrieval from Prior Data for Skill-based Imitation Learning
- [ICML 2022] Bayesian Imitation Learning for End-to-End Mobile Manipulation
- [IEEE CDC 2023] Initial State Interventions for Deconfounded Imitation Learning
- [RSS 2023] Behavior Retrieval: Few-Shot Imitation Learning by Querying Unlabeled Datasets
- [ICML 2019] Off-Policy Deep Reinforcement Learning without Exploration
- [CoRL 2020] PLAS: Latent Action Space for Offline Reinforcement Learning
- [Neurips 2021] Offline Meta Reinforcement Learning – Identifiability Challenges and Effective Data Collection Strategies
- [Neurips 2021] Offline Reinforcement Learning with Reverse Model-based Imagination
- [ICLR 2021] Opal: Offline primitive discovery for accelerating offline reinforcement learning
- [ACML 2021] BRAC+: Improved Behavior Regularized Actor Critic for Offline Reinforcement Learning
- [IJCAI 2021] Boosting Offline Reinforcement Learning with Residual Generative Modeling
- [ICLR 2021] Risk-Averse Offline Reinforcement Learning
- [L4DC 2021] Offline Reinforcement Learning from Images with Latent Space Models
- [AAAI 2022] Constraints Penalized Q-learning for Safe Offline Reinforcement Learning
- [AAAI 2022] Offine Reinforcement Learning as Anti-exploration
- [Neurips 2022] Supported Policy Optimization for Offline Reinforcement Learning
- [ICML 2022] Koopman Q-learning: Offline Reinforcement Learning via Symmetries of Dynamics
- [IRAL 2022] Hierarchical Planning Through Goal-Conditioned Offline Reinforcement Learning
- [Neurips 2022] Mildly Conservative Q-Learning for Offline Reinforcement Learning
- [ICCAS 2022] Selective Data Augmentation for Improving the Performance of Offline Reinforcement Learning
- [CoRL 2023] Action-Quantized Offline Reinforcement Learning for Robotic Skill Learning
- [CoRL 2023] Generalization with Lossy Affordances: Leveraging Broad Offline Data for Learning Visuomotor Tasks
- [CoRL 2023] Latent Plans for Task-Agnostic Offline Reinforcement Learning
- [IEEE TCDS 2023] UAC: Offline Reinforcement Learning with Uncertain Action Constraint
- [CoRL 2023] Expansive Latent Planning for Sparse Reward Offline Reinforcement Learning
- [ArXiv 2023] On the Importance of the Policy Structure in Offline Reinforcement Learning
- [Neurips 2016] f-gan: Training generative neural samplers using variational divergence minimization
- [Neurips 2016] InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets
- [Neurips 2017] Triple Generative Adversarial Nets
- [ICML 2017] Wasserstein Generative Adversarial Networks
- [ICCV 2017] DualGAN: Unsupervised Dual Learning for Image-to-Image Translation
- [IEEE TKDE 2023] A Review on Generative Adversarial Networks: Algorithms, Theory, and Applications
- [ArXiv 2016] A Connection Between Generative Adversarial Networks, Inverse Reinforcement Learning, and Energy-Based Models
- [ICLR 2018] Learning robust rewards with adversarial inverse reinforcement learning
- [ICML 2019] Multi-Agent Adversarial Inverse Reinforcement Learning
- [ICLR 2019] Adversarial imitation via variational inverse reinforcement learning
- [Neurips 2019] Meta-Inverse Reinforcement Learning with Probabilistic Context Variables
- [Neurips 2019] SMILe : Scalable Meta Inverse Reinforcement Learning through Context-Conditional Policies
- [CoRL 2020] f-IRL: Inverse Reinforcement Learning via State Marginal Matching
- [ICML 2020] Off-Policy Adversarial Inverse Reinforcement Learning
- [ArXiv 2020] oIRL: Robust Adversarial Inverse Reinforcement Learning with Temporally Extended Actions
- [IEEE RAL 2021] Adversarial Inverse Reinforcement Learning With Self-Attention Dynamics Model
- [ICML 2023] Multi-task Hierarchical Adversarial Inverse Reinforcement Learning
- [IEEE TNNLS 2023] Hierarchical Adversarial Inverse Reinforcement Learning
- [Neurips 2016] Generative adversarial imitation learning
- [ICML 2017] End-to-End Differentiable Adversarial Imitation Learning
- [Neurips 2017] Robust Imitation of Diverse Behaviors
- [Neurips 2017] Infogail: Interpretable imitation learning from visual demonstrations
- [Neurips 2017] Multi-Modal Imitation Learning from Unstructured Demonstrations using Generative Adversarial Nets
- [Neurips 2018] Multi-Agent Generative Adversarial Imitation Learning
- [Neurips 2018] A Bayesian Approach to Generative Adversarial Imitation Learning
- [AAAI 2018] OptionGAN: Learning Joint Reward-Policy Options Using Generative Adversarial Inverse Reinforcement Learning
- [AAMAS 2018] Burn-In Demonstrations for Multi-Modal Imitation Learning
- [ArXiv 2018] Generative Adversarial Self-Imitation Learning
- [ICDM 2019] Unveiling Taxi Drivers’ Strategies via cGAIL — Conditional Generative Adversarial Imitation Learning
- [AISTATS 2019] Risk-Sensitive Generative Adversarial Imitation Learning
- [ICLR 2019] Directed-Info GAIL: Learning Hierarchical Policies from Unsegmented Demonstrations using Directed Information
- [ICLR 2019] Variational Discriminator Bottleneck: Improving Imitation Learning, Inverse RL, and GANs by Constraining Information Flow
- [AAMAS 2019] Independent Generative Adversarial Self-Imitation Learning in Cooperative Multiagent Systems
- [AAMAS 2019] Adversarial Imitation Learning from State-only Demonstrations
- [AISTATS 2019] Sample-Efficient Imitation Learning via Generative Adversarial Nets
- [ArXiv 2019] Situated GAIL: Multitask imitation using task-conditioned adversarial inverse reinforcement learning
- [ACM KDD 2020] xGAIL: Explainable Generative Adversarial Imitation Learning for Explainable Human Decision Analysis
- [CoRL 2020] Task-Relevant Adversarial Imitation Learning
- [ICLR 2020] State Alignment-based Imitation Learning
- [IJCAI 2020] Triple-GAIL: A Multi-Modal Imitation Learning Framework with Generative Adversarial Nets
- [Neurips 2020] f-GAIL: Learning f-Divergence for Generative Adversarial Imitation Learning
- [ICML 2020] Generative Adversarial Imitation Learning with Neural Network Parameterization: Global Optimality and Convergence Rate
- [Neurocomputing 2020] Deterministic generative adversarial imitation learning
- [ArXiv 2020] ADAIL: Adaptive Adversarial Imitation Learning
- [ICML 2021] Adversarial Option-Aware Hierarchical Imitation Learning
- [ICML 2022] Discriminator-Weighted Offline Imitation Learning from Suboptimal Demonstrations
- [Neurocomputing 2022] An imitation learning framework for generating multi-modal trajectories from unstructured demonstrations
- [ArXiv 2022] Latent Policies for Adversarial Imitation Learning
- [Neurips 2023] Ess-InfoGAIL: Semi-supervised Imitation Learning from Imbalanced Demonstrations
- [ICML 2022] Regularizing a Model-based Policy Stationary Distribution to Stabilize Offline Reinforcement Learning
- [Neurips 2022] A Unified Framework for Alternating Offline Model Training and Policy Learning
- [Neurips 2022] Model-based Trajectory Stitching for Improved Offline Reinforcement Learning
- [Neurips 2022] Data-Driven Offline Decision-Making via Invariant Representation Learning
- [Neurips 2022] Dual Generator Offline Reinforcement Learning
- [Neurips 2022] S2P: State-conditioned Image Synthesis for Data Augmentation in Offline Reinforcement Learning
- [ArXiv 2022] A Behavior Regularized Implicit Policy for Offline Reinforcement Learning
- [Neurips 2023] GOPlan: Goal-conditioned Offline Reinforcement Learning by Planning with Learned Models
- [Information Sciences 2023] Safe batch constrained deep reinforcement learning with generative adversarial network
- [ArXiv 2023] Model-based Offline Policy Optimization with Adversarial Network
- [ICML 2015] Variational Inference with Normalizing Flows
- [ICLR 2017] Density estimation using real nvp
- [IEEE TPAMI 2021] Deep Generative Modelling: A Comparative Review of VAEs, GANs, Normalizing Flows, Energy-Based and Autoregressive Models
- [IEEE TPAMI 2021] Normalizing Flows: An Introduction and Review of Current Methods
- [JMLR 2021] Normalizing Flows for Probabilistic Modeling and Inference
- [ICLR 2019] Generative predecessor models for sample-efficient imitation learning
- [IROS 2020] ImitationFlow: Learning Deep Stable Stochastic Dynamic Systems by Normalizing Flows
- [ArXiv 2020] Imitative Planning using Conditional Normalizing Flow
- [Neurips 2021] IL-flOw: Imitation Learning from Observation using Normalizing Flows
- [IEEE ITSC 2021] Learning Normalizing Flow Policies Based on Highway Demonstrations
- [ICML 2023] A Coupled Flow Approach to Imitation Learning
- [Applied Intelligence 2023] Imitation Learning by State-Only Distribution Matching
- [ArXiv 2023] Stabilized Likelihood-based Imitation Learning via Denoising Continuous Normalizing Flow
- [ArXiv 2023] State-Only Imitation Learning by Trajectory Distribution Matching
- [ArXiv 2023] Universal Value Density Estimation for Imitation Learning and Goal-Conditioned Reinforcement Learning
- [Neurips 2022] Uncertainty-Aware Reinforcement Learning for Risk-Sensitive Player Evaluation in Sports Game
- [AAAI 2023] Flow to Control: Offine Reinforcement Learning with Lossless Primitive Discovery
- [ArXiv 2023] APAC: Authorized Probability-controlled Actor-Critic For Offline Reinforcement Learning
- [ArXiv 2023] Let Offline RL Flow: Training Conservative Agents in the Latent Space of Normalizing Flows
- [CoRL 2019] Leveraging exploration in off-policy algorithms via normalizing flows
- [UAI 2020] Randomized Value Functions via Multiplicative Normalizing Flows
- [ICRA 2021] Shaping Rewards for Reinforcement Learning with Imperfect Demonstrations using Generative Models
- [ICML 2022] SAFER: Data-Efficient and Safe Reinforcement Learning via Skill Acquisition
- [Neurips 2022] CEIP: Combining Explicit and Implicit Priors for Reinforcement Learning with Demonstrations
- [Neurips 2017] Attention Is All You Need
- [AI Open 2022] A survey of transformers
- [IEEE TPAMI 2022] A Survey on Vision Transformer
- [IJCAI 2023] Transformers in Time Series: A Survey
- [ArXiv 2023] Transformer in Reinforcement Learning for Decision-Making: A Survey
- [IROS 2021] Transformer-based deep imitation learning for dual-arm robot manipulation
- [CoRL 2021] Transformers for One-Shot Visual Imitation
- [ArXiv 2021] Creating Multimodal Interactive Agents with Imitation and Self-Supervised Learning
- [Neurips 2022] Behavior Transformers:Cloning k modes with one stone
- [CoRL 2022] Transformer Adapters for Robot Learning
- [CoRL 2022] VIOLA: Imitation Learning for Vision-Based Manipulation with Object Proposal Priors
- [ICLR 2022] Generalized Decision Transformer for Offline Hindsight Information Matching
- [ICLR 2022] Silver-Bullet-3D at ManiSkill 2021: Learning-from-Demonstrations and Heuristic Rule-based Methods for Object Manipulation
- [ICRA 2022] Memory-based gaze prediction in deep imitation learning for robot manipulation
- [IEEE RAL 2022] What Matters in Language Conditioned Robotic Imitation Learning Over Unstructured Data
- [AAAI 2023] Improving Long-Horizon Imitation through Instruction Prediction
- [ICCV 2023] Is Imitation All You Need? Generalized Decision-Making with Dual-Phase Training
- [CoRL 2023] PERCEIVER-ACTOR: A Multi-Task Transformer for Robotic Manipulation
- [CVPR 2023] A New Path: Scaling Vision-and-Language Navigation with Synthetic Instructions and Imitation Learning
- [ACM ICIKM 2023] A Hierarchical Imitation Learning-based Decision Framework for Autonomous Driving
- [IEEE RAL 2023] Training Robots Without Robots: Deep Imitation Learning for Master-to-Robot Policy Transfer
- [ArXiv 2023] Pretraining for Language Conditioned Imitation with Transformers
- [ArXiv 2023] Detrive: Imitation Learning with Transformer Detection for End-to-End Autonomous Driving
- [ICML 2020] Stabilizing Transformers for Reinforcement Learning
- [ICML 2020] Working Memory Graphs
- [ICML 2021] Demonstration-Conditioned Reinforcement Learning for Few-Shot Imitation
- [ICML 2021] Decision Transformer: Reinforcement Learning via Sequence Modeling
- [ICML 2021] Catformer: Designing Stable Transformers via Sensitivity Analysis
- [ICLR 2021] Efficient Transformers in Reinforcement Learning using Actor-Learner Distillation
- [ICLR 2021] UPDeT: Universal Multi-agent RL via Policy Decoupling with Transformers
- [Neurips 2021] Offline Reinforcement Learning as One Big Sequence Modeling Problem
- [ArXiv 2021] Transfer learning with causal counterfactual reasoning in Decision Transformers
- [ArXiv 2021] Offline pre-trained multi-agent decision transformer: One big sequence model tackles all smac tasks
- [ICML 2022] Online Decision Transformer
- [ICML 2022] Prompting Decision Transformer for Few-Shot Policy Generalization
- [ICML 2022] Addressing Optimism Bias in Sequence Modeling for Reinforcement Learning
- [ICLR 2022] RvS: What is Essential for Offline RL via Supervised Learning?
- [Neurips 2022] Dynamics-Augmented Decision Transformer for Offline Dynamics Generalization
- [Neurips 2022] On the Effect of Pre-training for Transformer in Different Modality on Offline Reinforcement Learning
- [Neurips 2022] Multi-Game Decision Transformers
- [Neurips 2022] Offline Multi-Agent Reinforcement Learning with Knowledge Distillation
- [Neurips 2022] Transformer-based Working Memory for Multiagent Reinforcement Learning with Action Parsing
- [Neurips 2022] Multi-Agent Reinforcement Learning is A Sequence Modeling Problem
- [Neurips 2022] When does return-conditioned supervised learning work for offline reinforcement learning?
- [Neurips 2022] You Can’t Count on Luck: Why Decision Transformers and RvS Fail in Stochastic Environments
- [Neurips 2022] Bootstrapped Transformer for Offline Reinforcement Learning
- [Neurips 2022] CLaP: Conditional Latent Planners for Offline Reinforcement Learning
- [AAAI 2022] Transformer-based Value Function Decomposition for Cooperative Multi-agent Reinforcement Learning in StarCraft
- [ECCV 2022] StARformer: Transformer with State-Action-Reward Representations for Visual Reinforcement Learning
- [IEEE RAL 2022] Efficient Spatiotemporal Transformer for Robotic Reinforcement Learning
- [ArXiv 2022] TransDreamer: Reinforcement Learning with Transformer World Models
- [ArXiv 2022] Can Wikipedia Help Offline Reinforcement Learning?
- [ArXiv 2022] Semi-supervised Offline Reinforcement Learning with Pre-trained Decision Transformers
- [ArXiv 2022] Switch Trajectory Transformer with Distributional Value Approximation for Multi-Task Reinforcement Learning
- [ICML 2023] Future-conditioned Unsupervised Pretraining for Decision Transformer
- [ICML 2023] Q-learning Decision Transformer: Leveraging Dynamic Programming for Conditional Sequence Modelling in Offline RL
- [ICML 2023] Constrained Decision Transformer for Offline Safe Reinforcement Learning
- [ICML 2023] Semi-Supervised Offline Reinforcement Learning with Action-Free Trajectories
- [ICLR 2023] Dichotomy of Control: Separating What You Can Control from What You Cannot
- [IROS 2023] Hierarchical Decision Transformer
- [IJCAI 2023] Towards Long-delayed Sparsity: Learning a Better Transformer through Reward Redistribution
- [TMLR 2023] A Generalist Agent
- [CoRL 2023] Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions
- [UAI 2023] A Trajectory is Worth Three Sentences: Multimodal Transformer for Offline Reinforcement Learning
- [AI 2023] Transform networks for cooperative multi-agent deep reinforcement learning
- [ArXiv 2023] Self-Confirming Transformer for Locally Consistent Online Adaptation in Multi-Agent Reinforcement Learning
- [ArXiv 2023] Contextual Transformer for Offline Reinforcement Learning
- [ArXiv 2023] Skill Decision Transformer
- [ArXiv 2023] Environment Transformer and Policy Optimization for Model-Based Offline Reinforcement Learning
- [ArXiv 2023] Graph Decision Transformer
- [ArXiv 2023] Critic-Guided Decision Transformer for Offline Reinforcement Learning
- [ArXiv 2023] MCTransformer: Combining Transformers And Monte-Carlo Tree Search For Offline Reinforcement Learning
- [ArXiv 2022] Understanding Diffusion Models: A Unified Perspective
- [ACM Computing Surveys 2023] Diffusion Models: A Comprehensive Survey of Methods and Applications
- [Neurips 2023] Understanding Diffusion Objectives as the ELBO with Simple Data Augmentation
- [ICLR 2023] Imitating human behaviour with diffusion models
- [CoRL 2023] Waypoint-Based Imitation Learning for Robotic Manipulation
- [ArXiv 2023] Constrained-Context Conditional Diffusion Models for Imitation Learning
- [ArXiv 2023] Goal-Conditioned Imitation Learning using Score-based Diffusion Policies
- [ArXiv 2023] Diffusion Policy: Visuomotor Policy Learning via Action Diffusion
- [ArXiv 2023] Imitation Learning from Purified Demonstrations
- [ICLR 2024] Memory-Consistent Neural Networks for Imitation Learning
- [ArXiv 2024] Good Better Best: Self-Motivated Imitation Learning for noisy Demonstrations
- [ICML 2022] Planning with Diffusion for Flexible Behavior Synthesis
- [ICML 2023] Contrastive Energy Prediction for Exact Energy-Guided Diffusion Sampling in Offline Reinforcement Learning
- [ICML 2023] MetaDiffuser: Diffusion Model as Conditional Planner for Offline Meta-RL
- [ICML 2023] AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
- [ICML 2023] Discrete Diffusion Reward Guidance Methods for Offline Reinforcement Learning
- [ICML 2023] Hierarchical Diffusion for Offline Decision Making
- [ICLR 2023] Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning
- [ICLR 2023] Is Conditional Generative Modeling all you need for Decision-Making?
- [ICLR 2023] Offline Reinforcement Learning via High-Fidelity Generative Behavior Modeling
- [ICLR 2023] Sample Generations for Reinforcement Learning via Diffusion Models
- [CoRL 2023] Fighting Uncertainty with Gradients: Offline Reinforcement Learning via Diffusion Score Matching
- [Neurips 2023] Efficient Diffusion Policies for Offline Reinforcement Learning
- [Neurips 2023] EDGI: Equivariant Diffusion for Planning with Embodied Agents
- [Neurips 2023] Diffusion Model is an Effective Planner and Data Synthesizer for Multi-Task Reinforcement Learning
- [Neurips 2023] Conformal Prediction for Uncertainty-Aware Planning with Diffusion Dynamics Model
- [ArXiv 2023] DiffCPS: Diffusion Model based Constrained Policy Search for Offline Reinforcement Learning
- [ArXiv 2023] Boosting Continuous Control with Consistency Policy
- [ArXiv 2023] Consistency Models as a Rich and Efficient Policy Class for Reinforcement Learning
- [ArXiv 2023] Beyond Conservatism: Diffusion Policies in Offline Multi-agent Reinforcement Learning
- [ArXiv 2023] IDQL: Implicit Q-Learning as an Actor-Critic Method with Diffusion Policies
- [ArXiv 2023] Instructed Diffuser with Temporal Condition Guidance for Offline Reinforcement Learning
- [ArXiv 2023] MADIFF: Offline Multi-agent Learning with Diffusion Models
- [ArXiv 2023] SafeDiffuser: Safe Planning with Diffusion Probabilistic Models
- [ICLR 2024] Reasoning with Latent Diffusion in Offline Reinforcement Learning
- [IEEE RAL 2024] Diffusion Policies for Out-of-Distribution Generalization in Offline Reinforcement Learning