Image Captioning Using CNN Long Short-Term Memory Network
9 Pages Posted: 17 Apr 2020
Date Written: April 15, 2020
Abstract
In recent years, various Deep Learning models have been used for deriving textual information from an image. Deep Learning has achieved success in the development and training and has applications in large (more layers) neural networks. The neural networks provide flexibility and have come with solutions which are useful and reliable. The real-world image captioning can be evaluated through the Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Deep Neural Networks (DNNs) and Long Short-Term Memory (LSTM). Among these advanced models, the CNN’s and LSTMs have performed much better as compare to DNNs. The results show that CNN’s and LSTMs models have proved to more flexible and pragmatic in mapping the features. CNNs can reduce the frequency variations, therefore suited to models, which has spatial representation. CNNs successfully caters to the tasks relating to computer vision, image classification and object detection. LSTMs caters to temporal modeling, the tasks which has sequences and can perform the predictions on them. LSTMs have wide applications in the Natural Language Processing Tasks, as they do not change the entire information, but rather alter it slightly, or they can forget and remember the information selectively. The combination of CNN Long Short-Term Memory Network (CNN-LSTM) architecture are good to handle sequence prediction models with the spatial inputs like images. We observe that the merge model CNN-LSTM proves that machines can see very well as the images are correctly captioned.
Keywords: Image captioning; Natural language processing; Neural network; Convolutional neural network (CNN); Recurrent neural network (RNN); Long short-term memory (LSTM)
Suggested Citation: Suggested Citation