Image captioning using bidirectional LSTM neural network
Abstract Automatic image captioning is a crucial task in image processing and machine vision, where images are segmented into regions, and captions are assigned based on shared attributes. Given the vast number of images available online, choosing an effective method for accurate captioning remains...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Springer
2025-05-01
|
| Series: | Discover Artificial Intelligence |
| Subjects: | |
| Online Access: | https://doi.org/10.1007/s44163-025-00315-8 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Abstract Automatic image captioning is a crucial task in image processing and machine vision, where images are segmented into regions, and captions are assigned based on shared attributes. Given the vast number of images available online, choosing an effective method for accurate captioning remains challenging. This study presents a Bidirectional LSTM (BiLSTM) neural network with 14 layers for automatic captioning, trained on the ImageNet dataset using the MobileNet architecture. To manage the high computational demands, MobileNet is employed to optimize performance. The dataset, including Flickr and MSCOCO with over 1.04 GB of data, was selected for its complexity in image captioning. Implemented in Python with TensorFlow and Cross on Google Colab, the proposed model is compared to five other LSTM-based models. Performance is evaluated using Precision, Accuracy, Recall, F-score, and Loss Function metrics. Despite hardware limitations, the proposed BiLSTM model demonstrates a competitive accuracy of 75.90%, highlighting improvements in both performance and computational efficiency. |
|---|---|
| ISSN: | 2731-0809 |