Skip to content

Commit

Permalink
Added handful-of-trials paper
Browse files Browse the repository at this point in the history
  • Loading branch information
shagunsodhani committed Aug 19, 2019
1 parent 165411b commit 1c4f643
Show file tree
Hide file tree
Showing 4 changed files with 57 additions and 2 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ I am trying a new initiative - a-paper-a-week. This repository will hold all tho

## List of papers

* [Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models](https://shagunsodhani.com/papers-I-read/Deep-Reinforcement-Learning-in-a-Handful-of-Trials-using-Probabilistic-Dynamics-Models)
* [Assessing Generalization in Deep Reinforcement Learning](https://shagunsodhani.com/papers-I-read/Assessing-Generalization-in-Deep-Reinforcement-Learning)
* [Quantifying Generalization in Reinforcement Learning](https://shagunsodhani.com/papers-I-read/Quantifying-Generalization-in-Reinforcement-Learning)
* [Set Transformer: A Framework for Attention-based Permutation-Invariant Neural Networks](https://shagunsodhani.com/papers-I-read/Set-Transformer-A-Framework-for-Attention-based-Permutation-Invariant-Neural-Networks)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ layout: post
title: Assessing Generalization in Deep Reinforcement Learning
comments: True
excerpt:
tags: ['2018', 'Deep Reinforcement Learning', 'Evaluating Generalization', Reinforcement Learning', AI, DRL, Evaluation, Generalization, RL]
tags: ['2018', 'Deep Reinforcement Learning', 'Evaluating Generalization', 'Reinforcement Learning', AI, DRL, Evaluation, Generalization, RL]

---

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
---
layout: post
title: Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models
comments: True
excerpt:
tags: ['2018', 'Deep Reinforcement Learning', 'Model-based', 'Neurips 2018', 'Reinforcement Learning', AI, DRL, MBRL, Neurips, RL]

---

## Introduction

* The paper proposes a new algorithm called as Probabilistic Ensemble with Trajectory sampling (PETS) that combines uncertainty aware deep learning models (ensemble of deep learning models that encode uncertainty) with sampling-based uncertainty propagation.

* PETS improves over other probabilistic MBRL approaches by isolating epistemic uncertainty (due to limited training data) and aleatoric uncertainty (inherent in the system).

* [Link to the paper]()

## Uncertainty-Aware Neural Network Dynamics Model

* Aleatoric uncertainty can be accounted for by learning a parameterized distribution (probabilistic neural network) trained with negative log-likelihood.

* Epistemic uncertainty can be accounted for by either having an infinite amount of data or by using ensembles.

* The paper uses a neural network to predict the mean and standard deviation of a gaussian distribution which defines the predictive model. This setup is referred to as the "probabilistic" model and denoted by **P**.

* The alternate setup of the deterministic model is where a neural network is used to make a point prediction (and is denoted by **D**).

* Ensemble of probabilistic models is denoted as **PE** while that of deterministic models is denoted as **DE**.

## Planning and Control with learned Dynamics

* Model Predictive Control (MPC) is used for planning.

* Given a start state and an action sequence, the probabilistic dynamics model induces a distribution over the trajectories.

* The first action, among the sequence of optimized actions, is executed.

* Instead of random shooting, [Cross Entropy Method (CEM)](https://www.sciencedirect.com/science/article/pii/B9780444538598000035) is used.

## Trajectory Sampling

* Let us say there are B bootstrap models in the ensemble. Given the current state, P particles are created and each particle is propagated using one of the bootstrap models. Two variants are considered:

* TS1 - At each timestep, each particle samples a bootstrap. In this case, particle separation can not be attributed to ti the compounding effects of the bootstraps.

* TS$\infty$ - The bootstrapped model (per particle) is sampled just once and is not changed after that. This setup separates aleatoric and epistemic uncertainty. Aleatoric state variance is the average variance of the particles of the same bootstrap while epistemic state variance is the variance of the average of particles of same bootstrap indexes.

## Result

* The proposed approach reaches the asymptotic performance of state-of-the-art model-free algorithms in much fewer samples.

* The general performance trend is probabilistic emsemble > probabilisitc model > deterministc ensemble > determinisitc model./.

* Initial experiments for learning policy by propagating gradients through the ensemble of models did not work and has been left as future work.
2 changes: 1 addition & 1 deletion site/_site
Submodule _site updated from 231033 to 8b756c

0 comments on commit 1c4f643

Please sign in to comment.