-
Notifications
You must be signed in to change notification settings - Fork 78
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
7bf220e
commit 783ffd7
Showing
2 changed files
with
53 additions
and
1 deletion.
There are no files selected for viewing
52 changes: 52 additions & 0 deletions
52
site/_posts/2020-02-27-mixup Beyond Empirical Risk Minimization.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,52 @@ | ||
--- | ||
layout: post | ||
title: mixup - Beyond Empirical Risk Minimization | ||
comments: True | ||
excerpt: | ||
tags: ['2017', 'Adversarial Robustness', 'Data Augmentation', ICLR 2018', AI, ERM, ICLR, Robustness] | ||
|
||
--- | ||
|
||
## Introduction | ||
|
||
* The paper proposes a simple and dataset-agnostic data augmentation mechanism called *mixup*. | ||
|
||
* [Link to the paper]() | ||
|
||
* Consider two training examples, $(x_1, y_1)$ and $(y_1, y_2)$, where $x_1$ and $x_2$ are the datapoints and $y_1$ and $y_2$ are the labels. | ||
|
||
* New training examples of the form $(\lambda \times x_1 + (1-\lambda) \times x_2, \lambda \times y_1 + (1-\lambda) \times y_2)$ are constructured by considering the linear interpolation of the datapoints and the labels. Here $\lambda \in [0, 1]$. | ||
|
||
* $\lambda$ is sampled from a Beta distribution $Beta(\alpha, \alpha)$ where $\alpha \in (0, \infty)$. | ||
|
||
* Setting $\lambda$ to 0 or 1 eliminates the effect of *mixup*. | ||
|
||
* Mixup encourages the neural network to favor linear behavior between the training examples. | ||
|
||
|
||
## Experiments | ||
|
||
* **Supervised Learning** | ||
|
||
* ImageNet for ResNet-50, ResNet-101 and ResNext-101. | ||
|
||
* CIFAR10/CIFAR100 for PreAct ResNet-18, WideResNet-28-10 and DenseNet. | ||
|
||
* Google command dataset for LeNet and VGG. | ||
|
||
* In all these setups, adding *mixup* improves the performance of the model. | ||
|
||
* *Mixup* makes the model more robust to noisy labels. Moreover, *mixup* + dropout improves over *mixup* alone. This hints that *mixup*'s benefits are complementary to those of dropout. | ||
|
||
* *Mixup* makes the network more robust to adversarial examples in both white-box and black-box settings (ImageNet + Resnet101). | ||
|
||
* *Mixup* also stabilizes the training of GANs by acting as a regularizer for the gradient of the discriminator. | ||
|
||
|
||
## Observations | ||
|
||
* Convex combination of three or more examples (with weights sampled from a Dirichlet distribution) does not provide gains over the case of two examples. | ||
|
||
* In the authors' implementation, *mixup* is applied between images of the same batch (after shuffling). | ||
|
||
* Interpolating only between inputs, with the same labels, did not lead to the same kind of gains as *mixup*. |
Submodule _site
updated
from 18213a to 08f588