Image captioning using bidirectional LSTM neural network
Abstract Automatic image captioning is a crucial task in image processing and machine vision, where images are segmented into regions, and captions are assigned based on shared attributes. Given the vast number of images available online, choosing an effective method for accurate captioning remains...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Springer
2025-05-01
|
| Series: | Discover Artificial Intelligence |
| Subjects: | |
| Online Access: | https://doi.org/10.1007/s44163-025-00315-8 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849688300053331968 |
|---|---|
| author | Farnaz Hoseini Anaram Yaghoobi Notash |
| author_facet | Farnaz Hoseini Anaram Yaghoobi Notash |
| author_sort | Farnaz Hoseini |
| collection | DOAJ |
| description | Abstract Automatic image captioning is a crucial task in image processing and machine vision, where images are segmented into regions, and captions are assigned based on shared attributes. Given the vast number of images available online, choosing an effective method for accurate captioning remains challenging. This study presents a Bidirectional LSTM (BiLSTM) neural network with 14 layers for automatic captioning, trained on the ImageNet dataset using the MobileNet architecture. To manage the high computational demands, MobileNet is employed to optimize performance. The dataset, including Flickr and MSCOCO with over 1.04 GB of data, was selected for its complexity in image captioning. Implemented in Python with TensorFlow and Cross on Google Colab, the proposed model is compared to five other LSTM-based models. Performance is evaluated using Precision, Accuracy, Recall, F-score, and Loss Function metrics. Despite hardware limitations, the proposed BiLSTM model demonstrates a competitive accuracy of 75.90%, highlighting improvements in both performance and computational efficiency. |
| format | Article |
| id | doaj-art-5d57cdd8a73e412daa6fe654ff829840 |
| institution | DOAJ |
| issn | 2731-0809 |
| language | English |
| publishDate | 2025-05-01 |
| publisher | Springer |
| record_format | Article |
| series | Discover Artificial Intelligence |
| spelling | doaj-art-5d57cdd8a73e412daa6fe654ff8298402025-08-20T03:22:03ZengSpringerDiscover Artificial Intelligence2731-08092025-05-015112210.1007/s44163-025-00315-8Image captioning using bidirectional LSTM neural networkFarnaz Hoseini0Anaram Yaghoobi Notash1Department of Computer Engineering, National University of Skills (NUS)Shariati Hospital, Tehran University of Medical Science (TUMS)Abstract Automatic image captioning is a crucial task in image processing and machine vision, where images are segmented into regions, and captions are assigned based on shared attributes. Given the vast number of images available online, choosing an effective method for accurate captioning remains challenging. This study presents a Bidirectional LSTM (BiLSTM) neural network with 14 layers for automatic captioning, trained on the ImageNet dataset using the MobileNet architecture. To manage the high computational demands, MobileNet is employed to optimize performance. The dataset, including Flickr and MSCOCO with over 1.04 GB of data, was selected for its complexity in image captioning. Implemented in Python with TensorFlow and Cross on Google Colab, the proposed model is compared to five other LSTM-based models. Performance is evaluated using Precision, Accuracy, Recall, F-score, and Loss Function metrics. Despite hardware limitations, the proposed BiLSTM model demonstrates a competitive accuracy of 75.90%, highlighting improvements in both performance and computational efficiency.https://doi.org/10.1007/s44163-025-00315-8LSTM neural networkImageNetMobileNetDeep learningCaptioning of imagesBidirectional LSTM neural network |
| spellingShingle | Farnaz Hoseini Anaram Yaghoobi Notash Image captioning using bidirectional LSTM neural network Discover Artificial Intelligence LSTM neural network ImageNet MobileNet Deep learning Captioning of images Bidirectional LSTM neural network |
| title | Image captioning using bidirectional LSTM neural network |
| title_full | Image captioning using bidirectional LSTM neural network |
| title_fullStr | Image captioning using bidirectional LSTM neural network |
| title_full_unstemmed | Image captioning using bidirectional LSTM neural network |
| title_short | Image captioning using bidirectional LSTM neural network |
| title_sort | image captioning using bidirectional lstm neural network |
| topic | LSTM neural network ImageNet MobileNet Deep learning Captioning of images Bidirectional LSTM neural network |
| url | https://doi.org/10.1007/s44163-025-00315-8 |
| work_keys_str_mv | AT farnazhoseini imagecaptioningusingbidirectionallstmneuralnetwork AT anaramyaghoobinotash imagecaptioningusingbidirectionallstmneuralnetwork |