Added handful-of-trials paper

shagunsodhani · Aug 19, 2019 · 1c4f643 · 1c4f643
1 parent 165411b
commit 1c4f643
Show file tree

Hide file tree

Showing 4 changed files with 57 additions and 2 deletions.
diff --git a/README.md b/README.md
@@ -5,6 +5,7 @@ I am trying a new initiative - a-paper-a-week. This repository will hold all tho
 
 ## List of papers
 
+* [Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models](https://shagunsodhani.com/papers-I-read/Deep-Reinforcement-Learning-in-a-Handful-of-Trials-using-Probabilistic-Dynamics-Models)
 * [Assessing Generalization in Deep Reinforcement Learning](https://shagunsodhani.com/papers-I-read/Assessing-Generalization-in-Deep-Reinforcement-Learning)
 * [Quantifying Generalization in Reinforcement Learning](https://shagunsodhani.com/papers-I-read/Quantifying-Generalization-in-Reinforcement-Learning)
 * [Set Transformer: A Framework for Attention-based Permutation-Invariant Neural Networks](https://shagunsodhani.com/papers-I-read/Set-Transformer-A-Framework-for-Attention-based-Permutation-Invariant-Neural-Networks)

diff --git a/site/_posts/2019-08-01-Assessing Generalization in Deep Reinforcement Learning.md b/site/_posts/2019-08-01-Assessing Generalization in Deep Reinforcement Learning.md
@@ -3,7 +3,7 @@ layout: post
 title: Assessing Generalization in Deep Reinforcement Learning
 comments: True
 excerpt: 
-tags: ['2018', 'Deep Reinforcement Learning', 'Evaluating Generalization', Reinforcement Learning',  AI, DRL, Evaluation, Generalization, RL]
+tags: ['2018', 'Deep Reinforcement Learning', 'Evaluating Generalization', 'Reinforcement Learning',  AI, DRL, Evaluation, Generalization, RL]
 
 ---
 

diff --git a/...orcement Learning in a Handful of Trials using Probabilistic Dynamics Models.md b/...orcement Learning in a Handful of Trials using Probabilistic Dynamics Models.md
@@ -0,0 +1,54 @@
+---
+layout: post
+title: Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models
+comments: True
+excerpt: 
+tags: ['2018', 'Deep Reinforcement Learning', 'Model-based', 'Neurips 2018', 'Reinforcement Learning',  AI, DRL, MBRL, Neurips, RL]
+
+---
+
+## Introduction
+
+* The paper proposes a new algorithm called as Probabilistic Ensemble with Trajectory sampling (PETS) that combines uncertainty aware deep learning models (ensemble of deep learning models that encode uncertainty) with sampling-based uncertainty propagation.
+
+* PETS improves over other probabilistic MBRL approaches by isolating epistemic uncertainty (due to limited training data) and aleatoric uncertainty (inherent in the system). 
+
+* [Link to the paper]()
+
+## Uncertainty-Aware Neural Network Dynamics Model
+
+* Aleatoric uncertainty can be accounted for by learning a parameterized distribution (probabilistic neural network) trained with negative log-likelihood.
+
+* Epistemic uncertainty can be accounted for by either having an infinite amount of data or by using ensembles.
+
+* The paper uses a neural network to predict the mean and standard deviation of a gaussian distribution which defines the predictive model. This setup is referred to as the "probabilistic" model and denoted by **P**.
+
+* The alternate setup of the deterministic model is where a neural network is used to make a point prediction (and is denoted by **D**).
+
+* Ensemble of probabilistic models is denoted as **PE** while that of deterministic models is denoted as **DE**.
+
+## Planning and Control with learned Dynamics
+
+* Model Predictive Control (MPC) is used for planning.
+
+* Given a start state and an action sequence, the probabilistic dynamics model induces a distribution over the trajectories.
+
+* The first action, among the sequence of optimized actions, is executed.
+
+* Instead of random shooting, [Cross Entropy Method (CEM)](https://www.sciencedirect.com/science/article/pii/B9780444538598000035) is used.
+
+## Trajectory Sampling
+
+* Let us say there are B bootstrap models in the ensemble. Given the current state, P particles are created and each particle is propagated using one of the bootstrap models. Two variants are considered:
+
+    * TS1 - At each timestep, each particle samples a bootstrap. In this case, particle separation can not be attributed to ti the compounding effects of the bootstraps.
+
+    * TS$\infty$ - The bootstrapped model (per particle) is sampled just once and is not changed after that. This setup separates aleatoric and epistemic uncertainty. Aleatoric state variance is the average variance of the particles of the same bootstrap while epistemic state variance is the variance of the average of particles of same bootstrap indexes.
+
+## Result
+
+* The proposed approach reaches the asymptotic performance of state-of-the-art model-free algorithms in much fewer samples.
+
+* The general performance trend is probabilistic emsemble > probabilisitc model > deterministc ensemble > determinisitc model./.
+
+* Initial experiments for learning policy by propagating gradients through the ensemble of models did not work and has been left as future work.
diff --git a/site/_site b/site/_site