From 165411b00c4717323704a820879bd9236c5c7743 Mon Sep 17 00:00:00 2001 From: Shagun Sodhani Date: Sun, 4 Aug 2019 21:34:58 -0400 Subject: [PATCH] Added quantifying generalization paper --- README.md | 1 + ...eneralization in Reinforcement Learning.md | 79 +++++++++++++++++++ site/_site | 2 +- 3 files changed, 81 insertions(+), 1 deletion(-) create mode 100755 site/_posts/2019-07-25-Quantifying Generalization in Reinforcement Learning.md diff --git a/README.md b/README.md index 88179dbc..7ae6e87c 100755 --- a/README.md +++ b/README.md @@ -6,6 +6,7 @@ I am trying a new initiative - a-paper-a-week. This repository will hold all tho ## List of papers * [Assessing Generalization in Deep Reinforcement Learning](https://shagunsodhani.com/papers-I-read/Assessing-Generalization-in-Deep-Reinforcement-Learning) +* [Quantifying Generalization in Reinforcement Learning](https://shagunsodhani.com/papers-I-read/Quantifying-Generalization-in-Reinforcement-Learning) * [Set Transformer: A Framework for Attention-based Permutation-Invariant Neural Networks](https://shagunsodhani.com/papers-I-read/Set-Transformer-A-Framework-for-Attention-based-Permutation-Invariant-Neural-Networks) * [Measuring abstract reasoning in neural networks](https://shagunsodhani.com/papers-I-read/Measuring-Abstract-Reasoning-in-Neural-Networks) * [Hamiltonian Neural Networks](https://shagunsodhani.com/papers-I-read/Hamiltonian-Neural-Networks) diff --git a/site/_posts/2019-07-25-Quantifying Generalization in Reinforcement Learning.md b/site/_posts/2019-07-25-Quantifying Generalization in Reinforcement Learning.md new file mode 100755 index 00000000..7b0d27e5 --- /dev/null +++ b/site/_posts/2019-07-25-Quantifying Generalization in Reinforcement Learning.md @@ -0,0 +1,79 @@ +--- +layout: post +title: Quantifying Generalization in Reinforcement Learning +comments: True +excerpt: +tags: ['2018', 'Deep Reinforcement Learning', 'ICML 2019', Evaluating Generalization', Reinforcement Learning', AI, DRL, Environment, Evaluation, ICML, Generalization, RL] + +--- + +## Introduction + +* The paper introduces a new, procedurally generated environment called as CoinRun that is designed to benchmark the generalization capabilities of RL algorithms. + +* The paper reports that deep convolutional architectures and techniques like L2 regularization, batch norm, etc (which were proposed in the context of generalization in supervised learning) are also useful for RL. + +* [Link to the paper](https://arxiv.org/abs/1812.02341) + +## CoinRun Environment + +* CoinRun is made of multiple levels. + +* In each level, the agent spawns on the far left side and needs to collect a single coin that lies on the far right side. + +* There are many obstacles in between and colliding with an obstacle leads to agent's death. + +* Each episode extends for a maximum for 1000 steps. + +* CoinRun is designed such that given sufficient training time and levels, a near-optimal policy can be learned for all the levels. + +## Generalization + +* Generalization can be measure by training an agent on a given set of training tasks and evaluating on an unseen set of test tasks. + +* 9 agents are trained to play CoinRun, on different training sets (each with a different number of levels). + +* The first 8 agents are trained on sets of size 100 to 16000 levels while the last agent is trained on an unbounded set of levels. + +* Training a model on an unbounded set of levels provides a good proxy for the train-to-test generalization performance. + +## Evaluating Architectures + +* Two convolutional architectures (of different sizes) are compared: + + * Nature-CNN: The CNN architecture used in the [Deep Q Network](https://web.stanford.edu/class/psych209/Readings/MnihEtAlHassibis15NatureControlDeepRL.pdf). This is the smaller network among the two models. + + * IMPALA-CNN: The CNN architecture used in the [Imapla architecture](https://arxiv.org/abs/1802.01561). + +* IMPALA-CNN agent always outperforms the Nature-CNN agent indicating that larger architecture has more capacity for generalization. But increasing the network size beyond a limit gives diminishing returns. + +## Evaluating Regularization + +* While both L2 regularization and Dropout helps to improve generalization, L2 regularization is more impactful. + +* A domain randomization/data augmentation approach is tested where rectangular regions of different sizes are masked and assigned a random color. This approach seems to improve performance. + +* Batch Normalization helps to improve performance as well. + +* Environment stochasticity is introduced by using sticky actions while policy stochasticity is introduced by controlling the entropy bonus. Both these forms of stochasticity boost performance. + +* While combining different regularization methods help, the gains are only marginally better than using just 1 regularization approach. This suggests that these different approaches induce similar generalization properties. + +## Additional Environments + +* Two additional environments are also considered to verify the high degree of overfitting observed in the CoinRun environment: + + * CoinRun-Platforms: + + * Unlike CoinRun, each episode can have multiple coins and the time limit is 0increased to 1000 steps. + + * Levels are larger as well so the agent might need to backtrack their steps. + + * RandomMazes: + + * Partially observed environment with square mazes of dimensions 3x3 to 25x25. + + * Timelimit of 500 steps + + +* Overfitting is observed for both these environments as well. \ No newline at end of file diff --git a/site/_site b/site/_site index 62cbd5ed..231033f1 160000 --- a/site/_site +++ b/site/_site @@ -1 +1 @@ -Subproject commit 62cbd5ed8c5c4ed1e0c698fbbd9ccab666d3d00b +Subproject commit 231033f120164330acbf3a435f1b7cbbc0f91893