Skip to content

Commit

Permalink
Add multiple model-based RL paper
Browse files Browse the repository at this point in the history
  • Loading branch information
shagunsodhani committed Jun 30, 2019
1 parent d664e29 commit 4404c3e
Show file tree
Hide file tree
Showing 2 changed files with 43 additions and 0 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ I am trying a new initiative - a-paper-a-week. This repository will hold all tho
* [Meta-Reinforcement Learning of Structured Exploration Strategies](https://shagunsodhani.com/papers-I-read/Meta-Reinforcement-Learning-of-Structured-Exploration-Strategies)
* [Relational Reinforcement Learning](https://shagunsodhani.com/papers-I-read/Relational-Reinforcement-Learning)
* [Good-Enough Compositional Data Augmentation](https://shagunsodhani.com/papers-I-read/Good-Enough-Compositional-Data-Augmentation)
* [Multiple Model-Based Reinforcement Learning](https://shagunsodhani.com/papers-I-read/Multiple-Model-Based-Reinforcement-Learning)
* [Towards a natural benchmark for continual learning](https://shagunsodhani.com/papers-I-read/Towards-a-natural-benchmark-for-continual-learning)
* [Meta-Learning Update Rules for Unsupervised Representation Learning](https://shagunsodhani.com/papers-I-read/Meta-Learning-Update-Rules-for-Unsupervised-Representation-Learning)
* [GNN Explainer - A Tool for Post-hoc Explanation of Graph Neural Networks](https://shagunsodhani.com/papers-I-read/GNN-Explainer-A-Tool-for-Post-hoc-Explanation-of-Graph-Neural-Networks)
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
---
layout: post
title: Multiple Model-Based Reinforcement Learning
comments: True
excerpt:
tags: ['2002', 'Model-Based', 'Neural Computation', 'Neural Computation 2002', 'Reinforcement Learning', AI, RL]
---


* The paper presents some general ideas and mechanisms for multiple model-based RL. Even though the task and model architecture may not be very relevant now, I find the general idea and the mechanisms to be quite useful. As such, I am focusing only on high-level ideas and not the implementation details themselves.

* The main idea behind Multiple Model-based RL (MMRL) is to decompose complex tasks into multiple domains in space and time so that the environment dynamics within each domain is predictable.

* [Link to the paper](https://www.mitpressjournals.org/doi/abs/10.1162/089976602753712972)

* MMRL proposes an RL architecture composes of multiple modules, each with its own state prediction model and RL controller.

* The prediction error from each of the state prediction model defines the "responsibility signal" for each module.

* This responsibility signal is used to:

* Weigh the state prediction output ie the predicted state is the weighted sum of individual state predictions (weighted by the responsibility signal).

* Weigh the parameter update of the environment models as well as the RL controllers.

* Weighing the action output - ie predicted action is a weighted sum of individual actions.

* The framework is amenable for incorporating prior knowledge about which module should be selected.

* In the modular decomposition of a task, the modules should not change too frequently and some kind of spatial and temporal continuity is also desired.

* Temporal continuity can be accounted for by using the previous responsibility signal as input during the current timestep.

* Spatial continuity can b ensured by considering a spatial prior like the Gaussian spatial prior.

* Though model-free methods could be used for learning the RL controllers, model-based methods could be more relevant given that the modules are learning state-prediction models as well.

* Exploration can be ensured by using a stochastic version of greedy action selection.

* One failure mode for such modular architectures is when a single module tries to perform well across all the tasks. The modules themselves should be relatively simplistic (eg linear models) which can learn quickly and generalize well.

* Non-stationary hunting task in a grid world and non-linear, non-stationary control task of swinging up a pendulum provides the proof of concept for the proposed methods.

0 comments on commit 4404c3e

Please sign in to comment.