Trained an LSTM model for action recognition in a video dataset. The project is implemented in TensorFlow Framework..
The dataset contains 588 videos each of 4 to 6 seconds in length. Each video consist of 50 frames of 2048 pixels each. The videos were divided in 5 categories:
- Cricket Bowling : Consist of bowling videos from cricket matches.
- Cricket Shot: Consist of videos of batsman hitting balls in cricket games.
- Pizza Tossing: Consist of videos of persons tossing pizzas in the air.
- Playing Cello: Consists of videos of people playing cello.
- Playing Sitar: Consists of videos of people playing sitar.
All the videos were divided into frames and were passed through a Convolutional Neural Network (VGG16) and features were stored as sequential data in a npz file. The data was divided into training set and validation set.
Model consist of a basic LSTM cell with 128 neurons along with a dynamic rnn layer. The whole model was implemented in TensorFlow framework.
Training was done on batches of size 32. Adam Optimizer was used with learning rate = 1e-4, beta1 = 0.9 beta2 = 0.999 and epsilon = 1e-8. Training was successful with approximately 90% accuracy on the validation set Softmax classification was used with cross entropy loss.