MNIST

This project is a 2 layer neural network that learns to recognise handwritten numbers by being fed the famous MNIST dataset as a .csv. This project uses no tensorflow, pytorch etc., solely math.

Settings
Forward Propagation
Backward Propagation
Update Parameters

Settings

The script includes a set of tweakable constants:

ITERATIONS: How many iterations the neural network will loop.
DISPLAY_REG: How often the script will display the neural network's current accuracy.
IMG_SIZE: The number of pixels in the input image.
DATASET_PARTITION: Where the data should be parted for crossvalidation.
TEST_PREDICTIONS: How many times the neural network will be tried for predictions upon completing training.
DATASET_FILE: Input path for the data.

Forward Propagation

During forward propagation, the neural network takes images and learns to create predictions out of them:

A⁰: This is the input layer (layer 0) of the neural network. It simply receives the IMAGE_SIZE number of pixels into each node.
Z¹: Unactivated first layer. Z¹ is obtained by applying a weight obtained from the connections between nodes in the prior layer (W¹) and a bias (b¹) to the input layer (A⁰). Or, Z¹ = W¹ * A⁰ + b¹.
A¹: First layer. A¹ is obtained by putting Z¹ through an activation function. The activation function I use is Exponential Linear Unit, or ELU.
Z²: Unactivated second layer. Z² is obtained by applying a weight obtained from the connections between nodes in the prior layer (W²) and a bias (b²) to the prior layer (A¹). Or, Z² = W² * A¹ + b².
A²: Second and final layer. A² is obtained from passing Z² through an activation function. This time we're using softmax, which will assign a probability to each node in this output layer.

Backward Propagation

Backward propagation is a method of improving the algorithm as it learns. This is done by taking the prediction, measuring how much it deviated from the image's label and working backwards.

dZ²: A measure of the error in the second layer. It's obtained by taking the predictions and subtracting the labels from them. For that, we one-hot encode the label as Y. dZ² = A² - Y
dW²: The derivative of the loss function with respect to the weights in layer 2. dW² = 1/m * dZ² * A¹.T. (Where .T is transposition of a matrix or vector)
db²: This is the average of the absolute error. db² = 1/m * Σ dZ².
dZ¹: A measure of the error in the first layer. This formula essentially performs forward propagation in reverse. dZ¹ = W².T * dZ¹ * g'() where g'() is the derivative of the activation function.
dW¹: The derivative of the loss function with respect to the weights in layer 1. dW¹ = 1/m * dZ¹ * X.T.
db¹: db¹ = 1/m * Σ dZ¹.

Update Parameters

After successful forward & backward propagation, the algorithm updates a hyperparameter α in this fashion:

W¹ := W¹ - αdW¹
b¹ := b¹ - αdb¹
W² := W² - αdW²
b² := b² - αdb²

α, being a hyperparameter, isn't set by gradient descent, but by the end-user. α can be interpreted as the learning rate.

After that, the algorithm loops back to forward propagation.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
img		img
.gitignore		.gitignore
README.md		README.md
mnist_nn.py		mnist_nn.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MNIST

Settings

Forward Propagation

Backward Propagation

Update Parameters

About

Releases

Packages

Languages

raneamri/mnist

Folders and files

Latest commit

History

Repository files navigation

MNIST

Settings

Forward Propagation

Backward Propagation

Update Parameters

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages