-> The model is capable on running on low-end GPUs such as the GTX 1050 with 4GB GPU Memory and 8GB RAM
-> The learning of the model is Steep
-> The Model was based on Images and captions scrapped from the E-Commerce Websites
The Picture above summarises the RNN model that was used.
The Features of each image was extracted through the VGG16 model with the last softmax layer popped out and connected to a RNN layer.