Image captioning using bidirectional LSTM neural network

Abstract Automatic image captioning is a crucial task in image processing and machine vision, where images are segmented into regions, and captions are assigned based on shared attributes. Given the vast number of images available online, choosing an effective method for accurate captioning remains...

Full description

Saved in:
Bibliographic Details
Main Authors: Farnaz Hoseini, Anaram Yaghoobi Notash
Format: Article
Language:English
Published: Springer 2025-05-01
Series:Discover Artificial Intelligence
Subjects:
Online Access:https://doi.org/10.1007/s44163-025-00315-8
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Automatic image captioning is a crucial task in image processing and machine vision, where images are segmented into regions, and captions are assigned based on shared attributes. Given the vast number of images available online, choosing an effective method for accurate captioning remains challenging. This study presents a Bidirectional LSTM (BiLSTM) neural network with 14 layers for automatic captioning, trained on the ImageNet dataset using the MobileNet architecture. To manage the high computational demands, MobileNet is employed to optimize performance. The dataset, including Flickr and MSCOCO with over 1.04 GB of data, was selected for its complexity in image captioning. Implemented in Python with TensorFlow and Cross on Google Colab, the proposed model is compared to five other LSTM-based models. Performance is evaluated using Precision, Accuracy, Recall, F-score, and Loss Function metrics. Despite hardware limitations, the proposed BiLSTM model demonstrates a competitive accuracy of 75.90%, highlighting improvements in both performance and computational efficiency.
ISSN:2731-0809