Skip to content

Commit

Permalink
Add MAML++ paper
Browse files Browse the repository at this point in the history
  • Loading branch information
shagunsodhani committed Sep 29, 2019
1 parent 1d2dcde commit 3bb1ccf
Show file tree
Hide file tree
Showing 3 changed files with 65 additions and 1 deletion.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ I am trying a new initiative - a-paper-a-week. This repository will hold all tho

## List of papers

* [How to train your MAML](https://shagunsodhani.com/papers-I-read/How-to-train-your-MAML)
* [PHYRE - A New Benchmark for Physical Reasoning](https://shagunsodhani.com/papers-I-read/PHYRE-A-New-Benchmark-for-Physical-Reasoning)
* [Large Memory Layers with Product Keys](https://shagunsodhani.com/papers-I-read/Large-Memory-Layers-with-Product-Keys)
* [Abductive Commonsense Reasoning](https://shagunsodhani.com/papers-I-read/Abductive-Commonsense-Reasoning)
Expand Down
63 changes: 63 additions & 0 deletions site/_posts/2019-09-05-How to train your MAML.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
---
layout: post
title: How to train your MAML
comments: True
excerpt:
tags: ['2018', 'Empirical Advice', ICLR 2019', 'Meta Learning', AI, ICLR, MAML]

---

## Introduction

* The paper proposes MAML++ - a modification of MAML algorithm that stabilizes its training improves generalization performance and reduces the computational overhead.

* [Link to the paper](https://arxiv.org/abs/1810.09502)

## Notes

### Unstable Training

* Training the outer loop requires unfolding the inner loop multiple times.

* In absence of skip connections, the gradient is multiplied by the same parameter multiple times.

* Large depth and absent skip connections could lead to exploding and vanishing gradients respectively.

* The paper proposes to stabilize the gradient propagation by minimizing the target set loss computed by the base-network after every step towards a support set task.

* It is important to anneal the contribution of earlier steps and increase the contribution of later steps over time.

### Second Order derivatives are expensive to compute

* While the first-order MAML is faster, the resulting model may not have as good a generalization error as the second-order MAML.

* The paper proposes to use derivative order annealing where the first order gradients are used for the first 50 epochs and the network uses second-order gradients from thereon.

* This derivative order annealing appears to be more stable than models that use second-order derivatives only.


### Batch Normalization

* In MAML, the statistics of the current batch are used for normalization instead of accumulating the running statistics.

* The paper proposes to collect the statistics per step which can increase the convergence speed, stability, and generalization performance.

* In MAML, the batch normalization biases are not updated in the inner-loop which can adversely impact the performance.

* The paper proposes to learn a set of biases (per step) within the inner loop update.

### Fixed Learning Rate

* MAML uses a single learning rate across all the steps and all the parameters. This means there is one single learning rate that needs to be hyperparameter to work well for all the layers and steps.

* An alternate solution would be to learn a separate learning rate per parameter but this can be impractical as it doubles the number of parameters to be learned.

* The paper proposes to learn a learning rate and direction for each layer in the network, for each step it takes in the inner loop.

* The paper also proposed to anneal the learning rate of the outer loop (using cosine annealing) as it helps to achieve better generalization.

## Results

* Using these modifications helps to outperform the MAML model on both Omniglot and MiniImagenet datasets.

* The biggest benefit comes by learning the per-layer, per-step learning rates and by using the per-step batch normalization.
2 changes: 1 addition & 1 deletion site/_site
Submodule _site updated from efc07d to e504b4

0 comments on commit 3bb1ccf

Please sign in to comment.