Add RRL paper

shagunsodhani · Jun 25, 2019 · 4f68948 · 4f68948
1 parent 8a6aa39
commit 4f68948
Show file tree

Hide file tree

Showing 4 changed files with 84 additions and 3 deletions.
diff --git a/README.md b/README.md
@@ -8,6 +8,7 @@ I am trying a new initiative - a-paper-a-week. This repository will hold all tho
 * [Hamiltonian Neural Networks](https://shagunsodhani.com/papers-I-read/Hamiltonian-Neural-Networks)
 * [Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations](https://shagunsodhani.com/papers-I-read/Extrapolating-Beyond-Suboptimal-Demonstrations-via-Inverse-Reinforcement-Learning-from-Observations)
 * [Meta-Reinforcement Learning of Structured Exploration Strategies](https://shagunsodhani.com/papers-I-read/Meta-Reinforcement-Learning-of-Structured-Exploration-Strategies)
+* [Relational Reinforcement Learning](https://shagunsodhani.com/papers-I-read/Relational-Reinforcement-Learning)
 * [Good-Enough Compositional Data Augmentation](https://shagunsodhani.com/papers-I-read/Good-Enough-Compositional-Data-Augmentation)
 * [Towards a natural benchmark for continual learning](https://shagunsodhani.com/papers-I-read/Towards-a-natural-benchmark-for-continual-learning)
 * [Meta-Learning Update Rules for Unsupervised Representation Learning](https://shagunsodhani.com/papers-I-read/Meta-Learning-Update-Rules-for-Unsupervised-Representation-Learning)

diff --git a/site/_posts/2019-06-01-Relational Reinforcement Learning.md b/site/_posts/2019-06-01-Relational Reinforcement Learning.md
@@ -0,0 +1,80 @@
+---
+layout: post
+title: Relational Reinforcement Learning
+comments: True
+excerpt: 
+tags: ['2018', 'Deep Reinforcement Learning', 'ICLR 2019', Reinforcement Learning', 'Relational Learning', AI, ICLR, RL, RRL]
+
+---
+
+## Introduction
+
+* Relational Reinforcement Learning (RRL) paradigm uses relational state (and action) space and policy representation to leverage the generalization capability of relational learning for reinforcement learning.
+
+* The paper shows that effectiveness of RRL - in terms of generalization, sample efficiency and interplay - using box-world and StarCraft II minigames.
+
+* [Link to the paper](https://arxiv.org/abs/1806.01830).
+
+## Architecture
+
+* The main idea is to use neural network models that operate on structured representations and perform relational reasoning via iterated, message-passing style methods.
+
+* Use of non-local computations using a shared function (in terms of pairwise interactions between entities) provides a better inductive bias.
+
+* Multi-head dot product attention mechanism is used to model the pairwise interactions (with one or more attention blocks).
+
+* Iterative computations can be used to capture higher-order interactions between entities.
+
+* Entity extraction is based on the assumption that entities are things located at a particular point in space.
+
+*  A CNN is used to parse the pixel space observation into *k* feature maps of size *nxn*. The *(x, y)* coordinates are concatenated to each *k-*dimensional pixel feature-vector to indicate the pixel's position in the map.
+
+* The resulting *n<sup>2</sup> x k* matrix acts as the entity matrix.
+
+* Actor-critic architecture (using distributed agent IMPALA) is used.
+
+## Environment
+
+### Box-World
+
+* 12 x 12-pixel room with keys and boxes placed randomly.
+
+* Agent can move in 4 directions.
+
+* The task is to collect gems by unlocking boxes (which may contain keys to unlock other boxes).
+
+* Each level has a unique sequence in which boxes need to be opened as opening the wrong box could make the level unsolvable.
+
+* Difficulty of a level can be controlled using: (i) Number of boxes in the path to the goal. (ii) The number of distractor branches, (iii)  Length of distractor branches.
+
+### StarCraft II minigames
+
+* 9 mini games designed as specific scenarios in the Starcraft game are used.
+
+## Results
+
+### Box-World
+
+* RRL agents solve over 98% of the levels while the RL agent solves less than 95% of the levels.
+
+* Visualising the attention scores indicate that:
+
+    * keys attend to locks they can unlock.
+
+    * all objects attend to agent's location.
+
+    * agent and gem attend to each other (and themselves).
+
+* Generalization capacity is tested in two ways:
+
+    * Performance on levels that require opening a larger sequence of boxes than it is trained on.
+
+    * Performance on levels that require key-lock combinations not seen during training.
+
+* In both the scenarios, the RRL agent significantly outperforms the RL agent.
+
+## StarCraft
+
+* RLL agent achieves better or equal results that the RL agent in all but one game.
+
+* For testing generalization, the agent, that was trained for controlling two marines, was transferred on the task which requires it to control 5 marines. These results are not conclusive given the high variability.
diff --git a/site/_site b/site/_site
diff --git a/site/index.html b/site/index.html
@@ -15,8 +15,8 @@ <h1 class="post-title">
     <span class="post-date">{{ post.date | date_to_string }}</span>
 
       <div class="excerpt">
-        {{ post.content | truncatewords: 20 }}
-        <a href="{{site.baseurl}}{{ post.url }}">{{ site.str_continue_reading }}</a>
+        {{ post.content | truncatewords: 15 }}
+        <!-- <a href="{{site.baseurl}}{{ post.url }}">{{ site.str_continue_reading }}</a> -->
         <br>
 <!--         <span class="comments-count">
           <a class="comments-count-icon"><i class="fa fa-comment"></i></a>