Add observational overfitting paper

shagunsodhani · Feb 14, 2020 · 465d4a5 · 465d4a5
1 parent ab99b85
commit 465d4a5
Show file tree

Hide file tree

Showing 7 changed files with 96 additions and 6 deletions.
diff --git a/...ts/2018-02-05-Get To The Point-Summarization with Pointer-Generator Networks.md b/...ts/2018-02-05-Get To The Point-Summarization with Pointer-Generator Networks.md
@@ -3,7 +3,7 @@ layout: post
 title: Get To The Point - Summarization with Pointer-Generator Networks
 comments: True
 excerpt: The paper proposes a hybrid Pointer-Generator network along with the use of a coverage vector for the task of article summarization.
-tags: ['2017', 'ACL2017', 'Abstract Summarization', 'Pointer Network', ACL, AI, NLP, SOTA, Summarization]
+tags: ['2017', 'ACL 2017', 'Abstract Summarization', 'Pointer Network', ACL, AI, NLP, SOTA, Summarization]
 ---
 
 ## Introduction

diff --git a/...06-Learning to Count Objects in Natural Images for Visual Question Answering.md b/...06-Learning to Count Objects in Natural Images for Visual Question Answering.md
@@ -3,7 +3,7 @@ layout: post
 title: Learning to Count Objects in Natural Images for Visual Question Answering
 comments: True
 excerpt: The paper proposes to overcome challenges related to count-based questions in VQA task by using the attention maps (and not the aggregated feature vectors) as input to a separate **count** module.
-tags: ['2018', 'Count Based VQA', 'ICLR2018', AI, CV, ICLR, NLP, VQA, SOTA]
+tags: ['2018', 'Count Based VQA', 'ICLR 2018', AI, CV, ICLR, NLP, VQA, SOTA]
 ---
 
 ## Introduction

diff --git a/site/_posts/2019-07-25-Quantifying Generalization in Reinforcement Learning.md b/site/_posts/2019-07-25-Quantifying Generalization in Reinforcement Learning.md
@@ -3,7 +3,7 @@ layout: post
 title: Quantifying Generalization in Reinforcement Learning
 comments: True
 excerpt: 
-tags: ['2018', 'Deep Reinforcement Learning', 'ICML 2019', Evaluating Generalization', Reinforcement Learning',  AI, DRL, Environment, Evaluation, ICML, Generalization, RL]
+tags: ['2018', 'Deep Reinforcement Learning', 'ICML 2019', 'Evaluating Generalization', 'Reinforcement Learning',  AI, DRL, Environment, Evaluation, ICML, Generalization, RL]
 
 ---
 

diff --git a/site/_posts/2019-09-05-How to train your MAML.md b/site/_posts/2019-09-05-How to train your MAML.md
@@ -3,7 +3,7 @@ layout: post
 title: How to train your MAML
 comments: True
 excerpt: 
-tags: ['2018', 'Empirical Advice', ICLR 2019', 'Meta Learning', AI, ICLR, MAML]
+tags: ['2018', 'Empirical Advice', 'ICLR 2019', 'Meta Learning', AI, ICLR, MAML]
 
 ---
 

diff --git a/...Happens for a Reason - Discovering the Purpose of Actions in Procedural Text.md b/...Happens for a Reason - Discovering the Purpose of Actions in Procedural Text.md
@@ -3,7 +3,7 @@ layout: post
 title: Everything Happens for a Reason - Discovering the Purpose of Actions in Procedural Text
 comments: True
 excerpt: 
-tags: ['2019', 'EMNLP 2019', 'Procedural Text', 'Relation Learning', 'Relational Learning', AI, Dataset, ENMLP, Graph, NLP, Reasoning]
+tags: ['2019', 'EMNLP 2019', 'Procedural Text', 'Relation Learning', 'Relational Learning', AI, Dataset, EMNLP, Graph, NLP, Reasoning]
 
 ---
 

diff --git a/..._posts/2020-01-09-Accurate Large Minibatch SGD - Training ImageNet in 1 Hour.md b/..._posts/2020-01-09-Accurate Large Minibatch SGD - Training ImageNet in 1 Hour.md
@@ -3,7 +3,7 @@ layout: post
 title: Accurate, Large Minibatch SGD - Training ImageNet in 1 Hour
 comments: True
 excerpt: 
-tags: ['2017', 'Distributed Computing', 'Distributed SGD', 'Empirical Advice', Synchronous SGD', AI, ImageNet]
+tags: ['2017', 'Distributed Computing', 'Distributed SGD', 'Empirical Advice', 'Synchronous SGD', AI, ImageNet]
 
 ---
 

diff --git a/site/_posts/2020-01-23-Observational Overfitting in Reinforcement Learning.md b/site/_posts/2020-01-23-Observational Overfitting in Reinforcement Learning.md
@@ -0,0 +1,90 @@
+---
+layout: post
+title: Observational Overfitting in Reinforcement Learning
+comments: True
+excerpt: 
+tags: ['2019', 'ICLR 2020', 'Deep Reinforcement Learning', 'Evaluating Generalization', 'Markov Decision Process', 'Reinforcement Learning', AI, DRL, Evaluation, Generalization, ICLR, MDP, RL]
+
+---
+
+
+## Introduction
+
+* The paper studies *observational overfitting*: The phenomenon where an agent overfits to different observation spaces even though the underlying MDP remains fixed.
+
+* Unlike other works, the "background information" (in the pixel space) is correlated with the progress of the agent (and is not just noise).
+
+* [Link to the paper](https://arxiv.org/abs/1912.02975)
+
+## Setup
+
+* Base MDP $M = (S, A, R, T)$ where $S$ is the state space, $A$ is the action space, $R$ is the reward function, and $T$ is the transition dynamics. 
+
+* $M$ is parameterized using $\theta$. In practice, it means introducing an observation function $\phi_{\theta}$ ie $M_{\theta} = (M, \phi_{\theta})$.
+
+* A distribution over $\theta$ defines a distribution over the MDPs.
+
+* The learning agent has access to the pixel space observations and not the state space observations.
+
+* Generalization gap is defined as $J_{\theta}(\pi) - J_{\theta^{train}}(\pi)$ where $\pi$ is the learning agent, $\theta$ is the distribution over all the observation functions, $\theta^{train}$ is the distribution over the observation functions corresponding to the training environments. $J_{\theta}(\pi)$ is the average reward that the agent obtains over environments sampled from $M_{\theta}$.
+
+* $\phi_{\theta}$ considers two featurs - generalizable (invariant across $\theta$) and non-generalizable (depends on $\theta$) ie $\phi_{\theta}(s) = concat(f(s), g_{\theta}(s))$ where $f$ is the invariant function and $g$ is the non-generalizable function.
+
+* The problem is set up such that "explicit regularization" can easily solve it. The focus is on understanding the effect of "implicit regularization".
+
+## Experiments
+
+### Overparameterized LQR
+
+* LQR is used as a proxy for deep RL architectures given its advantages like enabling exact gradient descent.
+
+* The functions are parameterized as follows:
+
+    * $f(s) = W_c(s)$
+
+    * $g_{\theta}(s) = W_{\theta}(s)$
+
+* Observation at time $t$ , $o_t$, is given as $[W_c W_{\theta}]^{-1} s_t$.
+
+* Action at time $t$ is given as $a_t = K o_{t}$ where $K$ is the policy matrix.
+
+* Dimensionality:
+
+    * state $s$: $d_{state}$ 100
+    * $f(s)$: $d_{state}$ 100
+    * $g_{\theta}(s)$: $d_{noise}$ 100
+    * observation $o$: $d_{state}$ + $d_{noise}$ 1100
+
+* In case of training on just one environment, multiple solutions exist, and overfitting happens.
+
+* Increasing $d_{noise}$ increases the generalization gap.
+
+* Overparameterizing the network decreases the generalization gap and also reduces the norm of the policy.
+
+### Projected Gym Environments
+
+* The base MDP is the Gym Environment.
+
+* $M_{\theta}$ is generated as before.
+
+* Increasing both width and depth for basic MLPs improves generalization.
+
+* Generalization also depends on the choice of activation function, residual layers, etc.
+
+### Deconvolutional Projections
+
+* In the Gym environment, the actual state is projected to a larger vector and reshaped into an 84x84 tensor (image).
+
+* The image from $f$ is concatenated with the image from $g$. This setup is referred to as the Gym-Deconv.
+
+* The relative order of performance between NatureCNN, IMPALA, and IMPALA-Large (on both CoinRun and Gym-Deconv) is the same as the order of the number of parameters they contain.
+
+* In an ablation, the policy is given access to only $g_{\theta}(s)$, which makes it impossible for the model to generalize. In this test of memorization capacity, implicit regularization seems to reduce the memorization effect.
+
+### Overparameterization in CoinRun
+
+* The pixel space observation in CoinRun is downsized from 64x64 to 32x32 and flattened into a vector.
+
+* In CoinRun, the dynamics change per level, and the noisy "irrelevant" features change location across the 1D input, making this setup more challenging than the previous ones.
+
+* Overparameterization improves generalization in this scenario as well.