From 4404c3ea573b845c5a36f045b202acd6c3695394 Mon Sep 17 00:00:00 2001 From: Shagun Sodhani Date: Sun, 30 Jun 2019 07:52:21 -0400 Subject: [PATCH] Add multiple model-based RL paper --- README.md | 1 + ...iple Model-Based Reinforcement Learning.md | 42 +++++++++++++++++++ 2 files changed, 43 insertions(+) create mode 100755 site/_posts/2019-05-14-Multiple Model-Based Reinforcement Learning.md diff --git a/README.md b/README.md index e7c82c54..d82fb306 100755 --- a/README.md +++ b/README.md @@ -11,6 +11,7 @@ I am trying a new initiative - a-paper-a-week. This repository will hold all tho * [Meta-Reinforcement Learning of Structured Exploration Strategies](https://shagunsodhani.com/papers-I-read/Meta-Reinforcement-Learning-of-Structured-Exploration-Strategies) * [Relational Reinforcement Learning](https://shagunsodhani.com/papers-I-read/Relational-Reinforcement-Learning) * [Good-Enough Compositional Data Augmentation](https://shagunsodhani.com/papers-I-read/Good-Enough-Compositional-Data-Augmentation) +* [Multiple Model-Based Reinforcement Learning](https://shagunsodhani.com/papers-I-read/Multiple-Model-Based-Reinforcement-Learning) * [Towards a natural benchmark for continual learning](https://shagunsodhani.com/papers-I-read/Towards-a-natural-benchmark-for-continual-learning) * [Meta-Learning Update Rules for Unsupervised Representation Learning](https://shagunsodhani.com/papers-I-read/Meta-Learning-Update-Rules-for-Unsupervised-Representation-Learning) * [GNN Explainer - A Tool for Post-hoc Explanation of Graph Neural Networks](https://shagunsodhani.com/papers-I-read/GNN-Explainer-A-Tool-for-Post-hoc-Explanation-of-Graph-Neural-Networks) diff --git a/site/_posts/2019-05-14-Multiple Model-Based Reinforcement Learning.md b/site/_posts/2019-05-14-Multiple Model-Based Reinforcement Learning.md new file mode 100755 index 00000000..26700cd4 --- /dev/null +++ b/site/_posts/2019-05-14-Multiple Model-Based Reinforcement Learning.md @@ -0,0 +1,42 @@ +--- +layout: post +title: Multiple Model-Based Reinforcement Learning +comments: True +excerpt: +tags: ['2002', 'Model-Based', 'Neural Computation', 'Neural Computation 2002', 'Reinforcement Learning', AI, RL] +--- + + +* The paper presents some general ideas and mechanisms for multiple model-based RL. Even though the task and model architecture may not be very relevant now, I find the general idea and the mechanisms to be quite useful. As such, I am focusing only on high-level ideas and not the implementation details themselves. + +* The main idea behind Multiple Model-based RL (MMRL) is to decompose complex tasks into multiple domains in space and time so that the environment dynamics within each domain is predictable. + +* [Link to the paper](https://www.mitpressjournals.org/doi/abs/10.1162/089976602753712972) + +* MMRL proposes an RL architecture composes of multiple modules, each with its own state prediction model and RL controller. + +* The prediction error from each of the state prediction model defines the "responsibility signal" for each module. + +* This responsibility signal is used to: + + * Weigh the state prediction output ie the predicted state is the weighted sum of individual state predictions (weighted by the responsibility signal). + + * Weigh the parameter update of the environment models as well as the RL controllers. + + * Weighing the action output - ie predicted action is a weighted sum of individual actions. + +* The framework is amenable for incorporating prior knowledge about which module should be selected. + +* In the modular decomposition of a task, the modules should not change too frequently and some kind of spatial and temporal continuity is also desired. + +* Temporal continuity can be accounted for by using the previous responsibility signal as input during the current timestep. + +* Spatial continuity can b ensured by considering a spatial prior like the Gaussian spatial prior. + +* Though model-free methods could be used for learning the RL controllers, model-based methods could be more relevant given that the modules are learning state-prediction models as well. + +* Exploration can be ensured by using a stochastic version of greedy action selection. + +* One failure mode for such modular architectures is when a single module tries to perform well across all the tasks. The modules themselves should be relatively simplistic (eg linear models) which can learn quickly and generalize well. + +* Non-stationary hunting task in a grid world and non-linear, non-stationary control task of swinging up a pendulum provides the proof of concept for the proposed methods.