Image captioning using bidirectional LSTM neural network

Abstract Automatic image captioning is a crucial task in image processing and machine vision, where images are segmented into regions, and captions are assigned based on shared attributes. Given the vast number of images available online, choosing an effective method for accurate captioning remains...

Full description

Saved in:

Bibliographic Details
Main Authors:	Farnaz Hoseini, Anaram Yaghoobi Notash
Format:	Article
Language:	English
Published:	Springer 2025-05-01
Series:	Discover Artificial Intelligence
Subjects:	LSTM neural network ImageNet MobileNet Deep learning Captioning of images Bidirectional LSTM neural network
Online Access:	https://doi.org/10.1007/s44163-025-00315-8
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849688300053331968
author	Farnaz Hoseini Anaram Yaghoobi Notash
author_facet	Farnaz Hoseini Anaram Yaghoobi Notash
author_sort	Farnaz Hoseini
collection	DOAJ
description	Abstract Automatic image captioning is a crucial task in image processing and machine vision, where images are segmented into regions, and captions are assigned based on shared attributes. Given the vast number of images available online, choosing an effective method for accurate captioning remains challenging. This study presents a Bidirectional LSTM (BiLSTM) neural network with 14 layers for automatic captioning, trained on the ImageNet dataset using the MobileNet architecture. To manage the high computational demands, MobileNet is employed to optimize performance. The dataset, including Flickr and MSCOCO with over 1.04 GB of data, was selected for its complexity in image captioning. Implemented in Python with TensorFlow and Cross on Google Colab, the proposed model is compared to five other LSTM-based models. Performance is evaluated using Precision, Accuracy, Recall, F-score, and Loss Function metrics. Despite hardware limitations, the proposed BiLSTM model demonstrates a competitive accuracy of 75.90%, highlighting improvements in both performance and computational efficiency.
format	Article
id	doaj-art-5d57cdd8a73e412daa6fe654ff829840
institution	DOAJ
issn	2731-0809
language	English
publishDate	2025-05-01
publisher	Springer
record_format	Article
series	Discover Artificial Intelligence
spelling	doaj-art-5d57cdd8a73e412daa6fe654ff8298402025-08-20T03:22:03ZengSpringerDiscover Artificial Intelligence2731-08092025-05-015112210.1007/s44163-025-00315-8Image captioning using bidirectional LSTM neural networkFarnaz Hoseini0Anaram Yaghoobi Notash1Department of Computer Engineering, National University of Skills (NUS)Shariati Hospital, Tehran University of Medical Science (TUMS)Abstract Automatic image captioning is a crucial task in image processing and machine vision, where images are segmented into regions, and captions are assigned based on shared attributes. Given the vast number of images available online, choosing an effective method for accurate captioning remains challenging. This study presents a Bidirectional LSTM (BiLSTM) neural network with 14 layers for automatic captioning, trained on the ImageNet dataset using the MobileNet architecture. To manage the high computational demands, MobileNet is employed to optimize performance. The dataset, including Flickr and MSCOCO with over 1.04 GB of data, was selected for its complexity in image captioning. Implemented in Python with TensorFlow and Cross on Google Colab, the proposed model is compared to five other LSTM-based models. Performance is evaluated using Precision, Accuracy, Recall, F-score, and Loss Function metrics. Despite hardware limitations, the proposed BiLSTM model demonstrates a competitive accuracy of 75.90%, highlighting improvements in both performance and computational efficiency.https://doi.org/10.1007/s44163-025-00315-8LSTM neural networkImageNetMobileNetDeep learningCaptioning of imagesBidirectional LSTM neural network
spellingShingle	Farnaz Hoseini Anaram Yaghoobi Notash Image captioning using bidirectional LSTM neural network Discover Artificial Intelligence LSTM neural network ImageNet MobileNet Deep learning Captioning of images Bidirectional LSTM neural network
title	Image captioning using bidirectional LSTM neural network
title_full	Image captioning using bidirectional LSTM neural network
title_fullStr	Image captioning using bidirectional LSTM neural network
title_full_unstemmed	Image captioning using bidirectional LSTM neural network
title_short	Image captioning using bidirectional LSTM neural network
title_sort	image captioning using bidirectional lstm neural network
topic	LSTM neural network ImageNet MobileNet Deep learning Captioning of images Bidirectional LSTM neural network
url	https://doi.org/10.1007/s44163-025-00315-8
work_keys_str_mv	AT farnazhoseini imagecaptioningusingbidirectionallstmneuralnetwork AT anaramyaghoobinotash imagecaptioningusingbidirectionallstmneuralnetwork

Image captioning using bidirectional LSTM neural network

Similar Items