Image Captioning Using CNN Long Short-Term Memory Network

Jain, Charvi; Sharma, Neha

doi:10.2139/ssrn.3576373

Download This Paper

Open PDF in Browser

Add Paper to My Library

Image Captioning Using CNN Long Short-Term Memory Network

Proceedings of the International Conference on Advances in Electronics, Electrical & Computational Intelligence (ICAEEC) 2019

9 Pages Posted: 17 Apr 2020

See all articles by Charvi Jain

Charvi Jain

Indian Institute of Information Technology Una

Neha Sharma

Indian Institute of Information Technology, Una

Date Written: April 15, 2020

Abstract

In recent years, various Deep Learning models have been used for deriving textual information from an image. Deep Learning has achieved success in the development and training and has applications in large (more layers) neural networks. The neural networks provide flexibility and have come with solutions which are useful and reliable. The real-world image captioning can be evaluated through the Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Deep Neural Networks (DNNs) and Long Short-Term Memory (LSTM). Among these advanced models, the CNN’s and LSTMs have performed much better as compare to DNNs. The results show that CNN’s and LSTMs models have proved to more flexible and pragmatic in mapping the features. CNNs can reduce the frequency variations, therefore suited to models, which has spatial representation. CNNs successfully caters to the tasks relating to computer vision, image classification and object detection. LSTMs caters to temporal modeling, the tasks which has sequences and can perform the predictions on them. LSTMs have wide applications in the Natural Language Processing Tasks, as they do not change the entire information, but rather alter it slightly, or they can forget and remember the information selectively. The combination of CNN Long Short-Term Memory Network (CNN-LSTM) architecture are good to handle sequence prediction models with the spatial inputs like images. We observe that the merge model CNN-LSTM proves that machines can see very well as the images are correctly captioned.

Keywords: Image captioning; Natural language processing; Neural network; Convolutional neural network (CNN); Recurrent neural network (RNN); Long short-term memory (LSTM)

Suggested Citation: Suggested Citation

Jain, Charvi and Sharma, Neha, Image Captioning Using CNN Long Short-Term Memory Network (April 15, 2020). Proceedings of the International Conference on Advances in Electronics, Electrical & Computational Intelligence (ICAEEC) 2019, Available at SSRN: https://ssrn.com/abstract=3576373 or http://dx.doi.org/10.2139/ssrn.3576373