Image captioning using bidirectional LSTM neural network

Abstract Automatic image captioning is a crucial task in image processing and machine vision, where images are segmented into regions, and captions are assigned based on shared attributes. Given the vast number of images available online, choosing an effective method for accurate captioning remains...

Full description

Saved in:
Bibliographic Details
Main Authors: Farnaz Hoseini, Anaram Yaghoobi Notash
Format: Article
Language:English
Published: Springer 2025-05-01
Series:Discover Artificial Intelligence
Subjects:
Online Access:https://doi.org/10.1007/s44163-025-00315-8
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849688300053331968
author Farnaz Hoseini
Anaram Yaghoobi Notash
author_facet Farnaz Hoseini
Anaram Yaghoobi Notash
author_sort Farnaz Hoseini
collection DOAJ
description Abstract Automatic image captioning is a crucial task in image processing and machine vision, where images are segmented into regions, and captions are assigned based on shared attributes. Given the vast number of images available online, choosing an effective method for accurate captioning remains challenging. This study presents a Bidirectional LSTM (BiLSTM) neural network with 14 layers for automatic captioning, trained on the ImageNet dataset using the MobileNet architecture. To manage the high computational demands, MobileNet is employed to optimize performance. The dataset, including Flickr and MSCOCO with over 1.04 GB of data, was selected for its complexity in image captioning. Implemented in Python with TensorFlow and Cross on Google Colab, the proposed model is compared to five other LSTM-based models. Performance is evaluated using Precision, Accuracy, Recall, F-score, and Loss Function metrics. Despite hardware limitations, the proposed BiLSTM model demonstrates a competitive accuracy of 75.90%, highlighting improvements in both performance and computational efficiency.
format Article
id doaj-art-5d57cdd8a73e412daa6fe654ff829840
institution DOAJ
issn 2731-0809
language English
publishDate 2025-05-01
publisher Springer
record_format Article
series Discover Artificial Intelligence
spelling doaj-art-5d57cdd8a73e412daa6fe654ff8298402025-08-20T03:22:03ZengSpringerDiscover Artificial Intelligence2731-08092025-05-015112210.1007/s44163-025-00315-8Image captioning using bidirectional LSTM neural networkFarnaz Hoseini0Anaram Yaghoobi Notash1Department of Computer Engineering, National University of Skills (NUS)Shariati Hospital, Tehran University of Medical Science (TUMS)Abstract Automatic image captioning is a crucial task in image processing and machine vision, where images are segmented into regions, and captions are assigned based on shared attributes. Given the vast number of images available online, choosing an effective method for accurate captioning remains challenging. This study presents a Bidirectional LSTM (BiLSTM) neural network with 14 layers for automatic captioning, trained on the ImageNet dataset using the MobileNet architecture. To manage the high computational demands, MobileNet is employed to optimize performance. The dataset, including Flickr and MSCOCO with over 1.04 GB of data, was selected for its complexity in image captioning. Implemented in Python with TensorFlow and Cross on Google Colab, the proposed model is compared to five other LSTM-based models. Performance is evaluated using Precision, Accuracy, Recall, F-score, and Loss Function metrics. Despite hardware limitations, the proposed BiLSTM model demonstrates a competitive accuracy of 75.90%, highlighting improvements in both performance and computational efficiency.https://doi.org/10.1007/s44163-025-00315-8LSTM neural networkImageNetMobileNetDeep learningCaptioning of imagesBidirectional LSTM neural network
spellingShingle Farnaz Hoseini
Anaram Yaghoobi Notash
Image captioning using bidirectional LSTM neural network
Discover Artificial Intelligence
LSTM neural network
ImageNet
MobileNet
Deep learning
Captioning of images
Bidirectional LSTM neural network
title Image captioning using bidirectional LSTM neural network
title_full Image captioning using bidirectional LSTM neural network
title_fullStr Image captioning using bidirectional LSTM neural network
title_full_unstemmed Image captioning using bidirectional LSTM neural network
title_short Image captioning using bidirectional LSTM neural network
title_sort image captioning using bidirectional lstm neural network
topic LSTM neural network
ImageNet
MobileNet
Deep learning
Captioning of images
Bidirectional LSTM neural network
url https://doi.org/10.1007/s44163-025-00315-8
work_keys_str_mv AT farnazhoseini imagecaptioningusingbidirectionallstmneuralnetwork
AT anaramyaghoobinotash imagecaptioningusingbidirectionallstmneuralnetwork