Image captioning in Bengali language using visual attention.

Automatically generating image captions poses one of the most challenging applications within artificial intelligence due to its integration of computer vision and natural language processing algorithms. This task becomes notably more formidable when dealing with a language as intricate as Bengali a...

Full description

Saved in:
Bibliographic Details
Main Authors: Adiba Masud, Md Biplob Hosen, Md Habibullah, Mehrin Anannya, M Shamim Kaiser
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2025-01-01
Series:PLoS ONE
Online Access:https://doi.org/10.1371/journal.pone.0309364
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850040552342421504
author Adiba Masud
Md Biplob Hosen
Md Habibullah
Mehrin Anannya
M Shamim Kaiser
author_facet Adiba Masud
Md Biplob Hosen
Md Habibullah
Mehrin Anannya
M Shamim Kaiser
author_sort Adiba Masud
collection DOAJ
description Automatically generating image captions poses one of the most challenging applications within artificial intelligence due to its integration of computer vision and natural language processing algorithms. This task becomes notably more formidable when dealing with a language as intricate as Bengali and the overall scarcity of Bengali-captioned image databases. In this investigation, a meticulously human-annotated dataset of Bengali captions has been curated specifically for the encompassing collection of pictures. Simultaneously, an innovative end-to-end architecture has been introduced to craft pertinent image descriptions in the Bengali language, leveraging an attention-driven decoder. Initially, the amalgamation of images' spatial and temporal attributes is facilitated by Gated Recurrent Units, constituting the input features. These features are subsequently fed into the attention layer alongside embedded caption features. The attention mechanism scrutinizes the interrelation between visual and linguistic representations, encompassing both categories of representations. Later, a comprehensive recursive unit comprising two layers employs the amalgamated attention traits to construct coherent sentences. Utilizing our furnished dataset, this model undergoes training, culminating in achievements of a 43% BLEU-4 score, a 39% METEOR score, and a 47% ROUGE score. Compared to all preceding endeavors in Bengali image captioning, these outcomes signify the pinnacle of current attainable standards.
format Article
id doaj-art-3a67bca91d624c88bac5ef4cc867c798
institution DOAJ
issn 1932-6203
language English
publishDate 2025-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj-art-3a67bca91d624c88bac5ef4cc867c7982025-08-20T02:56:03ZengPublic Library of Science (PLoS)PLoS ONE1932-62032025-01-01202e030936410.1371/journal.pone.0309364Image captioning in Bengali language using visual attention.Adiba MasudMd Biplob HosenMd HabibullahMehrin AnannyaM Shamim KaiserAutomatically generating image captions poses one of the most challenging applications within artificial intelligence due to its integration of computer vision and natural language processing algorithms. This task becomes notably more formidable when dealing with a language as intricate as Bengali and the overall scarcity of Bengali-captioned image databases. In this investigation, a meticulously human-annotated dataset of Bengali captions has been curated specifically for the encompassing collection of pictures. Simultaneously, an innovative end-to-end architecture has been introduced to craft pertinent image descriptions in the Bengali language, leveraging an attention-driven decoder. Initially, the amalgamation of images' spatial and temporal attributes is facilitated by Gated Recurrent Units, constituting the input features. These features are subsequently fed into the attention layer alongside embedded caption features. The attention mechanism scrutinizes the interrelation between visual and linguistic representations, encompassing both categories of representations. Later, a comprehensive recursive unit comprising two layers employs the amalgamated attention traits to construct coherent sentences. Utilizing our furnished dataset, this model undergoes training, culminating in achievements of a 43% BLEU-4 score, a 39% METEOR score, and a 47% ROUGE score. Compared to all preceding endeavors in Bengali image captioning, these outcomes signify the pinnacle of current attainable standards.https://doi.org/10.1371/journal.pone.0309364
spellingShingle Adiba Masud
Md Biplob Hosen
Md Habibullah
Mehrin Anannya
M Shamim Kaiser
Image captioning in Bengali language using visual attention.
PLoS ONE
title Image captioning in Bengali language using visual attention.
title_full Image captioning in Bengali language using visual attention.
title_fullStr Image captioning in Bengali language using visual attention.
title_full_unstemmed Image captioning in Bengali language using visual attention.
title_short Image captioning in Bengali language using visual attention.
title_sort image captioning in bengali language using visual attention
url https://doi.org/10.1371/journal.pone.0309364
work_keys_str_mv AT adibamasud imagecaptioninginbengalilanguageusingvisualattention
AT mdbiplobhosen imagecaptioninginbengalilanguageusingvisualattention
AT mdhabibullah imagecaptioninginbengalilanguageusingvisualattention
AT mehrinanannya imagecaptioninginbengalilanguageusingvisualattention
AT mshamimkaiser imagecaptioninginbengalilanguageusingvisualattention