From 4404c3ea573b845c5a36f045b202acd6c3695394 Mon Sep 17 00:00:00 2001
From: Shagun Sodhani <sshagunsodhani@gmail.com>
Date: Sun, 30 Jun 2019 07:52:21 -0400
Subject: [PATCH] Add multiple model-based RL paper

---
 README.md                                     |  1 +
 ...iple Model-Based Reinforcement Learning.md | 42 +++++++++++++++++++
 2 files changed, 43 insertions(+)
 create mode 100755 site/_posts/2019-05-14-Multiple Model-Based Reinforcement Learning.md

diff --git a/README.md b/README.md
index e7c82c54..d82fb306 100755
--- a/README.md
+++ b/README.md
@@ -11,6 +11,7 @@ I am trying a new initiative - a-paper-a-week. This repository will hold all tho
 * [Meta-Reinforcement Learning of Structured Exploration Strategies](https://shagunsodhani.com/papers-I-read/Meta-Reinforcement-Learning-of-Structured-Exploration-Strategies)
 * [Relational Reinforcement Learning](https://shagunsodhani.com/papers-I-read/Relational-Reinforcement-Learning)
 * [Good-Enough Compositional Data Augmentation](https://shagunsodhani.com/papers-I-read/Good-Enough-Compositional-Data-Augmentation)
+* [Multiple Model-Based Reinforcement Learning](https://shagunsodhani.com/papers-I-read/Multiple-Model-Based-Reinforcement-Learning)
 * [Towards a natural benchmark for continual learning](https://shagunsodhani.com/papers-I-read/Towards-a-natural-benchmark-for-continual-learning)
 * [Meta-Learning Update Rules for Unsupervised Representation Learning](https://shagunsodhani.com/papers-I-read/Meta-Learning-Update-Rules-for-Unsupervised-Representation-Learning)
 * [GNN Explainer - A Tool for Post-hoc Explanation of Graph Neural Networks](https://shagunsodhani.com/papers-I-read/GNN-Explainer-A-Tool-for-Post-hoc-Explanation-of-Graph-Neural-Networks)
diff --git a/site/_posts/2019-05-14-Multiple Model-Based Reinforcement Learning.md b/site/_posts/2019-05-14-Multiple Model-Based Reinforcement Learning.md
new file mode 100755
index 00000000..26700cd4
--- /dev/null
+++ b/site/_posts/2019-05-14-Multiple Model-Based Reinforcement Learning.md	
@@ -0,0 +1,42 @@
+---
+layout: post
+title: Multiple Model-Based Reinforcement Learning
+comments: True
+excerpt: 
+tags: ['2002', 'Model-Based', 'Neural Computation', 'Neural Computation 2002', 'Reinforcement Learning', AI, RL]
+---
+
+
+* The paper presents some general ideas and mechanisms for multiple model-based RL. Even though the task and model architecture may not be very relevant now, I find the general idea and the mechanisms to be quite useful. As such, I am focusing only on high-level ideas and not the implementation details themselves.
+
+* The main idea behind Multiple Model-based RL (MMRL) is to decompose complex tasks into multiple domains in space and time so that the environment dynamics within each domain is predictable.
+
+* [Link to the paper](https://www.mitpressjournals.org/doi/abs/10.1162/089976602753712972)
+
+* MMRL proposes an RL architecture composes of multiple modules, each with its own state prediction model and RL controller.
+
+* The prediction error from each of the state prediction model defines the "responsibility signal" for each module.
+
+* This responsibility signal is used to: 
+
+    * Weigh the state prediction output ie the predicted state is the weighted sum of individual state predictions (weighted by the responsibility signal).
+
+    * Weigh the parameter update of the environment models as well as the RL controllers.
+
+    * Weighing the action output - ie predicted action is a weighted sum of individual actions.
+
+* The framework is amenable for incorporating prior knowledge about which module should be selected.
+
+* In the modular decomposition of a task, the modules should not change too frequently and some kind of spatial and temporal continuity is also desired.
+
+* Temporal continuity can be accounted for by using the previous responsibility signal as input during the current timestep.
+
+* Spatial continuity can b ensured by considering a spatial prior like the Gaussian spatial prior.
+
+* Though model-free methods could be used for learning the RL controllers, model-based methods could be more relevant given that the modules are learning state-prediction models as well.
+
+* Exploration can be ensured by using a stochastic version of greedy action selection.
+
+* One failure mode for such modular architectures is when a single module tries to perform well across all the tasks. The modules themselves should be relatively simplistic (eg linear models) which can learn quickly and generalize well.
+
+* Non-stationary hunting task in a grid world and non-linear, non-stationary control task of swinging up a pendulum provides the proof of concept for the proposed methods.