Predicting caption for a given image using Deep Learning methods provided in keras library to create model architecture and training our model over the dataset.
- Data Cleaning
- Data preprocesing (images & captions)
- Preparing data for training using python generator function
- Creating word embeddings of the vocabulary
- Creating model architecture
- Training on dataset
Basic Deep Learning concepts like Multi-layered Perceptrons, CNN, RNN, Transfer Learning, Gradient Descent, Backpropagation, Overfitting, Probability, Text Processing, Python syntax and data structures, Keras library, etc.
- python 3.8.2
- keras 2.4.3
- tensorflow 2.3.0
- pandas 1.0.3
- numpy 1.18.4
- matplotlib 3.2.1
- pydot 1.2.3
Dataset for this project has been collected from kaggle.
The model has been trained on both Flickr8k & Flickr 30k dataset However if we want to train on a larger dataset MS COCO which contains 180,000 images could be used.
- Assistive vision : Images could be captured in realtime, and the caption predicted through that image which describes the image could be converted into a voice message with help of an appropriate API. This voice message could helpful in guiding or describing the scene to the blind.
- Images similar to a given picture could be found out by searching the predicted caption in an appropriate search engine.
- Accuracy & precision has to be measured, BLEU Score could be used to evaluate our results.
- This model can be deployed in an appropriate web app.