diff --git a/README.md b/README.md index 9287ee92..1b8cc22d 100755 --- a/README.md +++ b/README.md @@ -8,6 +8,7 @@ I am trying a new initiative - a-paper-a-week. This repository will hold all tho * [Hamiltonian Neural Networks](https://shagunsodhani.com/papers-I-read/Hamiltonian-Neural-Networks) * [Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations](https://shagunsodhani.com/papers-I-read/Extrapolating-Beyond-Suboptimal-Demonstrations-via-Inverse-Reinforcement-Learning-from-Observations) * [Meta-Reinforcement Learning of Structured Exploration Strategies](https://shagunsodhani.com/papers-I-read/Meta-Reinforcement-Learning-of-Structured-Exploration-Strategies) +* [Relational Reinforcement Learning](https://shagunsodhani.com/papers-I-read/Relational-Reinforcement-Learning) * [Good-Enough Compositional Data Augmentation](https://shagunsodhani.com/papers-I-read/Good-Enough-Compositional-Data-Augmentation) * [Towards a natural benchmark for continual learning](https://shagunsodhani.com/papers-I-read/Towards-a-natural-benchmark-for-continual-learning) * [Meta-Learning Update Rules for Unsupervised Representation Learning](https://shagunsodhani.com/papers-I-read/Meta-Learning-Update-Rules-for-Unsupervised-Representation-Learning) diff --git a/site/_posts/2019-06-01-Relational Reinforcement Learning.md b/site/_posts/2019-06-01-Relational Reinforcement Learning.md new file mode 100755 index 00000000..32bfb929 --- /dev/null +++ b/site/_posts/2019-06-01-Relational Reinforcement Learning.md @@ -0,0 +1,80 @@ +--- +layout: post +title: Relational Reinforcement Learning +comments: True +excerpt: +tags: ['2018', 'Deep Reinforcement Learning', 'ICLR 2019', Reinforcement Learning', 'Relational Learning', AI, ICLR, RL, RRL] + +--- + +## Introduction + +* Relational Reinforcement Learning (RRL) paradigm uses relational state (and action) space and policy representation to leverage the generalization capability of relational learning for reinforcement learning. + +* The paper shows that effectiveness of RRL - in terms of generalization, sample efficiency and interplay - using box-world and StarCraft II minigames. + +* [Link to the paper](https://arxiv.org/abs/1806.01830). + +## Architecture + +* The main idea is to use neural network models that operate on structured representations and perform relational reasoning via iterated, message-passing style methods. + +* Use of non-local computations using a shared function (in terms of pairwise interactions between entities) provides a better inductive bias. + +* Multi-head dot product attention mechanism is used to model the pairwise interactions (with one or more attention blocks). + +* Iterative computations can be used to capture higher-order interactions between entities. + +* Entity extraction is based on the assumption that entities are things located at a particular point in space. + +* A CNN is used to parse the pixel space observation into *k* feature maps of size *nxn*. The *(x, y)* coordinates are concatenated to each *k-*dimensional pixel feature-vector to indicate the pixel's position in the map. + +* The resulting *n2 x k* matrix acts as the entity matrix. + +* Actor-critic architecture (using distributed agent IMPALA) is used. + +## Environment + +### Box-World + +* 12 x 12-pixel room with keys and boxes placed randomly. + +* Agent can move in 4 directions. + +* The task is to collect gems by unlocking boxes (which may contain keys to unlock other boxes). + +* Each level has a unique sequence in which boxes need to be opened as opening the wrong box could make the level unsolvable. + +* Difficulty of a level can be controlled using: (i) Number of boxes in the path to the goal. (ii) The number of distractor branches, (iii) Length of distractor branches. + +### StarCraft II minigames + +* 9 mini games designed as specific scenarios in the Starcraft game are used. + +## Results + +### Box-World + +* RRL agents solve over 98% of the levels while the RL agent solves less than 95% of the levels. + +* Visualising the attention scores indicate that: + + * keys attend to locks they can unlock. + + * all objects attend to agent's location. + + * agent and gem attend to each other (and themselves). + +* Generalization capacity is tested in two ways: + + * Performance on levels that require opening a larger sequence of boxes than it is trained on. + + * Performance on levels that require key-lock combinations not seen during training. + +* In both the scenarios, the RRL agent significantly outperforms the RL agent. + +## StarCraft + +* RLL agent achieves better or equal results that the RL agent in all but one game. + +* For testing generalization, the agent, that was trained for controlling two marines, was transferred on the task which requires it to control 5 marines. These results are not conclusive given the high variability. \ No newline at end of file diff --git a/site/_site b/site/_site index 826701a0..95206620 160000 --- a/site/_site +++ b/site/_site @@ -1 +1 @@ -Subproject commit 826701a05e5d3b066ee509705b9033ef5b4914e7 +Subproject commit 9520662076f312d93ba4dbb69fdb2e5307d95dcc diff --git a/site/index.html b/site/index.html index 4e962715..6f9df239 100755 --- a/site/index.html +++ b/site/index.html @@ -15,8 +15,8 @@