Skip to content

Commit

Permalink
Add Mixup paper
Browse files Browse the repository at this point in the history
  • Loading branch information
shagunsodhani committed May 10, 2020
1 parent 7bf220e commit 783ffd7
Show file tree
Hide file tree
Showing 2 changed files with 53 additions and 1 deletion.
52 changes: 52 additions & 0 deletions site/_posts/2020-02-27-mixup Beyond Empirical Risk Minimization.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
---
layout: post
title: mixup - Beyond Empirical Risk Minimization
comments: True
excerpt:
tags: ['2017', 'Adversarial Robustness', 'Data Augmentation', ICLR 2018', AI, ERM, ICLR, Robustness]

---

## Introduction

* The paper proposes a simple and dataset-agnostic data augmentation mechanism called *mixup*.

* [Link to the paper]()

* Consider two training examples, $(x_1, y_1)$ and $(y_1, y_2)$, where $x_1$ and $x_2$ are the datapoints and $y_1$ and $y_2$ are the labels.

* New training examples of the form $(\lambda \times x_1 + (1-\lambda) \times x_2, \lambda \times y_1 + (1-\lambda) \times y_2)$ are constructured by considering the linear interpolation of the datapoints and the labels. Here $\lambda \in [0, 1]$.

* $\lambda$ is sampled from a Beta distribution $Beta(\alpha, \alpha)$ where $\alpha \in (0, \infty)$.

* Setting $\lambda$ to 0 or 1 eliminates the effect of *mixup*.

* Mixup encourages the neural network to favor linear behavior between the training examples.


## Experiments

* **Supervised Learning**

* ImageNet for ResNet-50, ResNet-101 and ResNext-101.

* CIFAR10/CIFAR100 for PreAct ResNet-18, WideResNet-28-10 and DenseNet.

* Google command dataset for LeNet and VGG.

* In all these setups, adding *mixup* improves the performance of the model.

* *Mixup* makes the model more robust to noisy labels. Moreover, *mixup* + dropout improves over *mixup* alone. This hints that *mixup*'s benefits are complementary to those of dropout.

* *Mixup* makes the network more robust to adversarial examples in both white-box and black-box settings (ImageNet + Resnet101).

* *Mixup* also stabilizes the training of GANs by acting as a regularizer for the gradient of the discriminator.


## Observations

* Convex combination of three or more examples (with weights sampled from a Dirichlet distribution) does not provide gains over the case of two examples.

* In the authors' implementation, *mixup* is applied between images of the same batch (after shuffling).

* Interpolating only between inputs, with the same labels, did not lead to the same kind of gains as *mixup*.
2 changes: 1 addition & 1 deletion site/_site
Submodule _site updated from 18213a to 08f588

0 comments on commit 783ffd7

Please sign in to comment.