MNIST

This project is a 2 layer neural network that learns to recognise handwritten numbers by being fed the famous MNIST dataset as a .csv. This project uses no tensorflow, pytorch etc., solely math.

Settings
Forward Propagation
Backward Propagation
Update Parameters

Settings

The script includes a set of tweakable constants:

ITERATIONS: How many iterations the neural network will loop.
DISPLAY_REG: How often the script will display the neural network's current accuracy.
IMG_SIZE: The number of pixels in the input image.
DATASET_PARTITION: Where the data should be parted for crossvalidation.
TEST_PREDICTIONS: How many times the neural network will be tried for predictions upon completing training.
DATASET_FILE: Input path for the data.

Forward Propagation

During forward propagation, the neural network takes images and learns to create predictions out of them:

A⁰: This is the input layer (layer 0) of the neural network. It simply receives the IMAGE_SIZE number of pixels into each node.
Z¹: Unactivated first layer. Z¹ is obtained by applying a weight obtained from the connections between nodes in the prior layer (W¹) and a bias (b¹) to the input layer (A⁰). Or, Z¹ = W¹ * A⁰ + b¹.
A¹: First layer. A¹ is obtained by putting Z¹ through an activation function. The activation function I use is Exponential Linear Unit, or ELU.
Z²: Unactivated second layer. Z² is obtained by applying a weight obtained from the connections between nodes in the prior layer (W²) and a bias (b²) to the prior layer (A¹). Or, Z² = W² * A¹ + b².
A²: Second and final layer. A² is obtained from passing Z² through an activation function. This time we're using softmax, which will assign a probability to each node in this output layer.

Backward Propagation

Backward propagation is a method of improving the algorithm as it learns. This is done by taking the prediction, measuring how much it deviated from the image's label and working backwards.

dZ²: A measure of the error in the second layer. It's obtained by taking the predictions and subtracting the labels from them. For that, we one-hot encode the label as Y. dZ² = A² - Y
dW²: The derivative of the loss function with respect to the weights in layer 2. dW² = 1/m * dZ² * A¹.T. (Where .T is transposition of a matrix or vector)
db²: This is the average of the absolute error. db² = 1/m * Σ dZ².
dZ¹: A measure of the error in the first layer. This formula essentially performs forward propagation in reverse. dZ¹ = W².T * dZ¹ * g'() where g'() is the derivative of the activation function.
dW¹: The derivative of the loss function with respect to the weights in layer 1. dW¹ = 1/m * dZ¹ * X.T.
db¹: db¹ = 1/m * Σ dZ¹.

Update Parameters

After successful forward & backward propagation, the algorithm updates a hyperparameter α in this fashion:

W¹ := W¹ - αdW¹
b¹ := b¹ - αdb¹
W² := W² - αdW²
b² := b² - αdb²

α, being a hyperparameter, isn't set by gradient descent, but by the end-user. α can be interpreted as the learning rate.

After that, the algorithm loops back to forward propagation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

MNIST

Settings

Forward Propagation

Backward Propagation

Update Parameters

Files

README.md

Latest commit

History

README.md

File metadata and controls

MNIST

Settings

Forward Propagation

Backward Propagation

Update Parameters