Fractional Max Pooling

INTRODUCTION

Link To Paper
Here is the implementation in theano.
Covolutional Networks have been evolved overtime. Many researchers had put their efforts in designing Different kinds of Activation Layers, Different Size of Convolution Layers, reducing overfitting by Dropout, BatchNormalization.
However, very little thought has been putup into updating traditional MaxPooling Layers.
Pooling Layers are building blocks of the CNN
It reduces spatial dimension of the data. Like if we have N_in x N_in Matrix, and we apply MaxPool Layer on that it spatial dimensions will reduce to N_out x N_out. Where reduction factor α = N_in / N_out

Traditional MaxPool Layer

Traditionally 2 x 2 MaxPool layer has been used for Spatial Pooling

Advantages

Fast, reduces size of hidden layer quickly.
Encodes degree of invariance with respect to translations and elastic distortions.

Disadvantages

Disjoint nature of Pooling operations can reduce generalization.
MaxPooling reduces size so quickly that to build a deep network stack of Convolution Layers are required.

Alternatives Proposed Before

Using 3 x 3 pooling regions overlapping with stride 2 as used in AlexNet
Stochastic Pooling

Fractional Max Pooling

Reduces spatial size of Image by a factor of α, where 1 < α < 2
Introduce randomness like Stochastic Pooling
Overlapping pooling regions

How to design it?

Input : N_in x N_in, Output : N_out x N_out, reduction factor α = N_in / N_out
General Idea if to divide N_in x N_in square into N_out^2 pooling regions (P_i,j)
Output_i,j = max_{(k,l) ∈ P _i,j} Input_k,l
To do this, generate two increasing subsequences (a_i) and (b_i), 0 <= i <= N_out, starting with 1 and ending at N_in and with an icrement of 1 or 2.
Now, we can generate two kind of Pooling regions
Disjoint Overlapping

P = [a_i-1, a_i-1] x [b_j-1, b_j -1] P = [a_i-1, a_i] x [b_j-1, b_j]
To generate integer sequence two different approaches
- random = increments are obtained by random permutations of appropriate number of 1 and 2
- pseudorandom = increments are obtained by a_i = ceiling(α(i+u)), with α ∈ (1,2) and u ∈ (0,1)
While training or testing, whenever CNN with FMP is applied on the dataset, we can generate different integer sequences and then average it to generate ensemble of it.

Which limitations does it overcome over traditional MP?

Disjoint as well as Overlapping Pooling Regions.
Randomness included like Stochastic Pooling
Reduction factor α reduced to α ∈ (1,2), so now we can generate deep network without

Key Points Observations

Random Fractional Max Pooling may undefit when combined with DropOut.
Improvement over traditional MP is substantial.
Overlapping FMP better than Disjoint FMP

Notable Results

Paper was released in 2015. Still, best results on CIFAR10.
Near the best results on CIFAR100 and MNIST.

Further possible improvement

Looking at the distortions, it is decomposible in both x and y directions. Can we explore pooling regions which are different than the regions given by equation above?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fractional-max-Pooling-paper-Summary.md

fractional-max-Pooling-paper-Summary.md

Fractional Max Pooling

INTRODUCTION

Traditional MaxPool Layer

Advantages

Disadvantages

Alternatives Proposed Before

Fractional Max Pooling

How to design it?

Which limitations does it overcome over traditional MP?

Key Points Observations

Notable Results

Further possible improvement

Files

fractional-max-Pooling-paper-Summary.md

Latest commit

History

fractional-max-Pooling-paper-Summary.md

File metadata and controls

Fractional Max Pooling

INTRODUCTION

Traditional MaxPool Layer

Advantages

Disadvantages

Alternatives Proposed Before

Fractional Max Pooling

How to design it?

Which limitations does it overcome over traditional MP?

Key Points Observations

Notable Results

Further possible improvement