Skip to content
/ mnist Public

A simple Python3.11 built neural network that learns from the famous MNIST dataset in order to recognise handwritten numbers.

Notifications You must be signed in to change notification settings

raneamri/mnist

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation

MNIST

PyPI - Version Static Badge Static Badge

This project is a 2 layer neural network that learns to recognise handwritten numbers by being fed the famous MNIST dataset as a .csv. This project uses no tensorflow, pytorch etc., solely math.



Settings

The script includes a set of tweakable constants:

  • ITERATIONS: How many iterations the neural network will loop.
  • DISPLAY_REG: How often the script will display the neural network's current accuracy.
  • IMG_SIZE: The number of pixels in the input image.
  • DATASET_PARTITION: Where the data should be parted for crossvalidation.
  • TEST_PREDICTIONS: How many times the neural network will be tried for predictions upon completing training.
  • DATASET_FILE: Input path for the data.

Forward Propagation

During forward propagation, the neural network takes images and learns to create predictions out of them:

  • A0: This is the input layer (layer 0) of the neural network. It simply receives the IMAGE_SIZE number of pixels into each node.

  • Z1: Unactivated first layer. Z1 is obtained by applying a weight obtained from the connections between nodes in the prior layer (W1) and a bias (b1) to the input layer (A0). Or, Z1 = W1 * A0 + b1.

  • A1: First layer. A1 is obtained by putting Z1 through an activation function. The activation function I use is Exponential Linear Unit, or ELU.

  • Z2: Unactivated second layer. Z2 is obtained by applying a weight obtained from the connections between nodes in the prior layer (W2) and a bias (b2) to the prior layer (A1). Or, Z2 = W2 * A1 + b2.

  • A2: Second and final layer. A2 is obtained from passing Z2 through an activation function. This time we're using softmax, which will assign a probability to each node in this output layer.

Backward Propagation

Backward propagation is a method of improving the algorithm as it learns. This is done by taking the prediction, measuring how much it deviated from the image's label and working backwards.

  • dZ2: A measure of the error in the second layer. It's obtained by taking the predictions and subtracting the labels from them. For that, we one-hot encode the label as Y. dZ2 = A2 - Y

  • dW2: The derivative of the loss function with respect to the weights in layer 2. dW2 = 1/m * dZ2 * A1.T. (Where .T is transposition of a matrix or vector)

  • db2: This is the average of the absolute error. db2 = 1/m * Σ dZ2.

  • dZ1: A measure of the error in the first layer. This formula essentially performs forward propagation in reverse. dZ1 = W2.T * dZ1 * g'() where g'() is the derivative of the activation function.

  • dW1: The derivative of the loss function with respect to the weights in layer 1. dW1 = 1/m * dZ1 * X.T.

  • db1: db1 = 1/m * Σ dZ1.

Update Parameters

After successful forward & backward propagation, the algorithm updates a hyperparameter α in this fashion:

  • W1 := W1 - αdW1
  • b1 := b1 - αdb1
  • W2 := W2 - αdW2
  • b2 := b2 - αdb2

α, being a hyperparameter, isn't set by gradient descent, but by the end-user. α can be interpreted as the learning rate.

After that, the algorithm loops back to forward propagation.

Example

About

A simple Python3.11 built neural network that learns from the famous MNIST dataset in order to recognise handwritten numbers.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages