Skip to content

Latest commit

 

History

History
56 lines (43 loc) · 4.15 KB

README.md

File metadata and controls

56 lines (43 loc) · 4.15 KB

MNIST

PyPI - Version Static Badge Static Badge

This project is a 2 layer neural network that learns to recognise handwritten numbers by being fed the famous MNIST dataset as a .csv. This project uses no tensorflow, pytorch etc., solely math.



Settings

The script includes a set of tweakable constants:

  • ITERATIONS: How many iterations the neural network will loop.
  • DISPLAY_REG: How often the script will display the neural network's current accuracy.
  • IMG_SIZE: The number of pixels in the input image.
  • DATASET_PARTITION: Where the data should be parted for crossvalidation.
  • TEST_PREDICTIONS: How many times the neural network will be tried for predictions upon completing training.
  • DATASET_FILE: Input path for the data.

Forward Propagation

During forward propagation, the neural network takes images and learns to create predictions out of them:

  • A0: This is the input layer (layer 0) of the neural network. It simply receives the IMAGE_SIZE number of pixels into each node.

  • Z1: Unactivated first layer. Z1 is obtained by applying a weight obtained from the connections between nodes in the prior layer (W1) and a bias (b1) to the input layer (A0). Or, Z1 = W1 * A0 + b1.

  • A1: First layer. A1 is obtained by putting Z1 through an activation function. The activation function I use is Exponential Linear Unit, or ELU.

  • Z2: Unactivated second layer. Z2 is obtained by applying a weight obtained from the connections between nodes in the prior layer (W2) and a bias (b2) to the prior layer (A1). Or, Z2 = W2 * A1 + b2.

  • A2: Second and final layer. A2 is obtained from passing Z2 through an activation function. This time we're using softmax, which will assign a probability to each node in this output layer.

Backward Propagation

Backward propagation is a method of improving the algorithm as it learns. This is done by taking the prediction, measuring how much it deviated from the image's label and working backwards.

  • dZ2: A measure of the error in the second layer. It's obtained by taking the predictions and subtracting the labels from them. For that, we one-hot encode the label as Y. dZ2 = A2 - Y

  • dW2: The derivative of the loss function with respect to the weights in layer 2. dW2 = 1/m * dZ2 * A1.T. (Where .T is transposition of a matrix or vector)

  • db2: This is the average of the absolute error. db2 = 1/m * Σ dZ2.

  • dZ1: A measure of the error in the first layer. This formula essentially performs forward propagation in reverse. dZ1 = W2.T * dZ1 * g'() where g'() is the derivative of the activation function.

  • dW1: The derivative of the loss function with respect to the weights in layer 1. dW1 = 1/m * dZ1 * X.T.

  • db1: db1 = 1/m * Σ dZ1.

Update Parameters

After successful forward & backward propagation, the algorithm updates a hyperparameter α in this fashion:

  • W1 := W1 - αdW1
  • b1 := b1 - αdb1
  • W2 := W2 - αdW2
  • b2 := b2 - αdb2

α, being a hyperparameter, isn't set by gradient descent, but by the end-user. α can be interpreted as the learning rate.

After that, the algorithm loops back to forward propagation.

Example