Image captioning in Bengali language using visual attention.

Automatically generating image captions poses one of the most challenging applications within artificial intelligence due to its integration of computer vision and natural language processing algorithms. This task becomes notably more formidable when dealing with a language as intricate as Bengali a...

Full description

Saved in:

Bibliographic Details
Main Authors:	Adiba Masud, Md Biplob Hosen, Md Habibullah, Mehrin Anannya, M Shamim Kaiser
Format:	Article
Language:	English
Published:	Public Library of Science (PLoS) 2025-01-01
Series:	PLoS ONE
Online Access:	https://doi.org/10.1371/journal.pone.0309364
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850040552342421504
author	Adiba Masud Md Biplob Hosen Md Habibullah Mehrin Anannya M Shamim Kaiser
author_facet	Adiba Masud Md Biplob Hosen Md Habibullah Mehrin Anannya M Shamim Kaiser
author_sort	Adiba Masud
collection	DOAJ
description	Automatically generating image captions poses one of the most challenging applications within artificial intelligence due to its integration of computer vision and natural language processing algorithms. This task becomes notably more formidable when dealing with a language as intricate as Bengali and the overall scarcity of Bengali-captioned image databases. In this investigation, a meticulously human-annotated dataset of Bengali captions has been curated specifically for the encompassing collection of pictures. Simultaneously, an innovative end-to-end architecture has been introduced to craft pertinent image descriptions in the Bengali language, leveraging an attention-driven decoder. Initially, the amalgamation of images' spatial and temporal attributes is facilitated by Gated Recurrent Units, constituting the input features. These features are subsequently fed into the attention layer alongside embedded caption features. The attention mechanism scrutinizes the interrelation between visual and linguistic representations, encompassing both categories of representations. Later, a comprehensive recursive unit comprising two layers employs the amalgamated attention traits to construct coherent sentences. Utilizing our furnished dataset, this model undergoes training, culminating in achievements of a 43% BLEU-4 score, a 39% METEOR score, and a 47% ROUGE score. Compared to all preceding endeavors in Bengali image captioning, these outcomes signify the pinnacle of current attainable standards.
format	Article
id	doaj-art-3a67bca91d624c88bac5ef4cc867c798
institution	DOAJ
issn	1932-6203
language	English
publishDate	2025-01-01
publisher	Public Library of Science (PLoS)
record_format	Article
series	PLoS ONE
spelling	doaj-art-3a67bca91d624c88bac5ef4cc867c7982025-08-20T02:56:03ZengPublic Library of Science (PLoS)PLoS ONE1932-62032025-01-01202e030936410.1371/journal.pone.0309364Image captioning in Bengali language using visual attention.Adiba MasudMd Biplob HosenMd HabibullahMehrin AnannyaM Shamim KaiserAutomatically generating image captions poses one of the most challenging applications within artificial intelligence due to its integration of computer vision and natural language processing algorithms. This task becomes notably more formidable when dealing with a language as intricate as Bengali and the overall scarcity of Bengali-captioned image databases. In this investigation, a meticulously human-annotated dataset of Bengali captions has been curated specifically for the encompassing collection of pictures. Simultaneously, an innovative end-to-end architecture has been introduced to craft pertinent image descriptions in the Bengali language, leveraging an attention-driven decoder. Initially, the amalgamation of images' spatial and temporal attributes is facilitated by Gated Recurrent Units, constituting the input features. These features are subsequently fed into the attention layer alongside embedded caption features. The attention mechanism scrutinizes the interrelation between visual and linguistic representations, encompassing both categories of representations. Later, a comprehensive recursive unit comprising two layers employs the amalgamated attention traits to construct coherent sentences. Utilizing our furnished dataset, this model undergoes training, culminating in achievements of a 43% BLEU-4 score, a 39% METEOR score, and a 47% ROUGE score. Compared to all preceding endeavors in Bengali image captioning, these outcomes signify the pinnacle of current attainable standards.https://doi.org/10.1371/journal.pone.0309364
spellingShingle	Adiba Masud Md Biplob Hosen Md Habibullah Mehrin Anannya M Shamim Kaiser Image captioning in Bengali language using visual attention. PLoS ONE
title	Image captioning in Bengali language using visual attention.
title_full	Image captioning in Bengali language using visual attention.
title_fullStr	Image captioning in Bengali language using visual attention.
title_full_unstemmed	Image captioning in Bengali language using visual attention.
title_short	Image captioning in Bengali language using visual attention.
title_sort	image captioning in bengali language using visual attention
url	https://doi.org/10.1371/journal.pone.0309364
work_keys_str_mv	AT adibamasud imagecaptioninginbengalilanguageusingvisualattention AT mdbiplobhosen imagecaptioninginbengalilanguageusingvisualattention AT mdhabibullah imagecaptioninginbengalilanguageusingvisualattention AT mehrinanannya imagecaptioninginbengalilanguageusingvisualattention AT mshamimkaiser imagecaptioninginbengalilanguageusingvisualattention

Image captioning in Bengali language using visual attention.

Similar Items