Model Semantic Attention (SemAtt) With Hybrid Learning Separable Neural Network and Long Short-Term Memory to Generate Caption

Image captioning is a hot topic that combines a multidiscipline task between computer vision and natural language processing. One of the tasks in the geological field is to make descriptions from the images of geological rocks. The task of a geologist is to write a content description of an image an...

Full description

Saved in:
Bibliographic Details
Main Authors: Agus Nursikuwagus, Rinaldi Munir, Masayu L. Khodra, Deshinta Arrova Dewi
Format: Article
Language:English
Published: IEEE 2024-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10719978/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850201474827550720
author Agus Nursikuwagus
Rinaldi Munir
Masayu L. Khodra
Deshinta Arrova Dewi
author_facet Agus Nursikuwagus
Rinaldi Munir
Masayu L. Khodra
Deshinta Arrova Dewi
author_sort Agus Nursikuwagus
collection DOAJ
description Image captioning is a hot topic that combines a multidiscipline task between computer vision and natural language processing. One of the tasks in the geological field is to make descriptions from the images of geological rocks. The task of a geologist is to write a content description of an image and display it as text that can be used in the future. Interpretation of the object is one of the objectives of the research, which is to traverse the image structures in depth. Shapes, colors, and structures are to be focused on to get the image&#x2019;s features. The problem faced is how the separable neural network (SNN) and long short-term memory (LSTM) have an impact on the caption that can meet the geologist&#x2019;s description. SNN is called Visual Attention (VaT), and LSTM is called Semantic Attention (SemAtt) as an architecture of image captioning. The result of the experiment confirms that the accuracy model for captioning gets BLEU-<inline-formula> <tex-math notation="LaTeX">$1=0.908$ </tex-math></inline-formula>, BLEU-<inline-formula> <tex-math notation="LaTeX">$2=0.877$ </tex-math></inline-formula>, BLEU-<inline-formula> <tex-math notation="LaTeX">$3=0.750$ </tex-math></inline-formula>, and BLEU-<inline-formula> <tex-math notation="LaTeX">$4=0.510$ </tex-math></inline-formula>. The evaluation score is compared to those of other evaluators, such as Meteor and RougeL, which get 0.670 and 0.623, respectively. The model confirms that it outperforms the baseline model. Referring to the evaluations, we concluded that the model was able to generate captioned geological rock images that met the geologist&#x2019;s description. Precision and recall have supported the models in providing the predicted word that is suitable for the image features.
format Article
id doaj-art-2ea75f3b48f84d36954f973b9a65de56
institution OA Journals
issn 2169-3536
language English
publishDate 2024-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-2ea75f3b48f84d36954f973b9a65de562025-08-20T02:12:01ZengIEEEIEEE Access2169-35362024-01-011215446715448110.1109/ACCESS.2024.348149910719978Model Semantic Attention (SemAtt) With Hybrid Learning Separable Neural Network and Long Short-Term Memory to Generate CaptionAgus Nursikuwagus0https://orcid.org/0000-0001-8435-7522Rinaldi Munir1Masayu L. Khodra2Deshinta Arrova Dewi3https://orcid.org/0000-0003-1488-7696Faculty of Engineering and Computer Science, Universitas Komputer Indonesia, Bandung, IndonesiaSchool of Electrical Engineering and Informatics, Bandung Institute of Technology, Bandung, IndonesiaSchool of Electrical Engineering and Informatics, Bandung Institute of Technology, Bandung, IndonesiaFaculty of Data Science and Information Technology, INTI International University, Nilai, MalaysiaImage captioning is a hot topic that combines a multidiscipline task between computer vision and natural language processing. One of the tasks in the geological field is to make descriptions from the images of geological rocks. The task of a geologist is to write a content description of an image and display it as text that can be used in the future. Interpretation of the object is one of the objectives of the research, which is to traverse the image structures in depth. Shapes, colors, and structures are to be focused on to get the image&#x2019;s features. The problem faced is how the separable neural network (SNN) and long short-term memory (LSTM) have an impact on the caption that can meet the geologist&#x2019;s description. SNN is called Visual Attention (VaT), and LSTM is called Semantic Attention (SemAtt) as an architecture of image captioning. The result of the experiment confirms that the accuracy model for captioning gets BLEU-<inline-formula> <tex-math notation="LaTeX">$1=0.908$ </tex-math></inline-formula>, BLEU-<inline-formula> <tex-math notation="LaTeX">$2=0.877$ </tex-math></inline-formula>, BLEU-<inline-formula> <tex-math notation="LaTeX">$3=0.750$ </tex-math></inline-formula>, and BLEU-<inline-formula> <tex-math notation="LaTeX">$4=0.510$ </tex-math></inline-formula>. The evaluation score is compared to those of other evaluators, such as Meteor and RougeL, which get 0.670 and 0.623, respectively. The model confirms that it outperforms the baseline model. Referring to the evaluations, we concluded that the model was able to generate captioned geological rock images that met the geologist&#x2019;s description. Precision and recall have supported the models in providing the predicted word that is suitable for the image features.https://ieeexplore.ieee.org/document/10719978/Separable neural networkLSTMtransformerscaptioningsemanticattention
spellingShingle Agus Nursikuwagus
Rinaldi Munir
Masayu L. Khodra
Deshinta Arrova Dewi
Model Semantic Attention (SemAtt) With Hybrid Learning Separable Neural Network and Long Short-Term Memory to Generate Caption
IEEE Access
Separable neural network
LSTM
transformers
captioning
semantic
attention
title Model Semantic Attention (SemAtt) With Hybrid Learning Separable Neural Network and Long Short-Term Memory to Generate Caption
title_full Model Semantic Attention (SemAtt) With Hybrid Learning Separable Neural Network and Long Short-Term Memory to Generate Caption
title_fullStr Model Semantic Attention (SemAtt) With Hybrid Learning Separable Neural Network and Long Short-Term Memory to Generate Caption
title_full_unstemmed Model Semantic Attention (SemAtt) With Hybrid Learning Separable Neural Network and Long Short-Term Memory to Generate Caption
title_short Model Semantic Attention (SemAtt) With Hybrid Learning Separable Neural Network and Long Short-Term Memory to Generate Caption
title_sort model semantic attention sematt with hybrid learning separable neural network and long short term memory to generate caption
topic Separable neural network
LSTM
transformers
captioning
semantic
attention
url https://ieeexplore.ieee.org/document/10719978/
work_keys_str_mv AT agusnursikuwagus modelsemanticattentionsemattwithhybridlearningseparableneuralnetworkandlongshorttermmemorytogeneratecaption
AT rinaldimunir modelsemanticattentionsemattwithhybridlearningseparableneuralnetworkandlongshorttermmemorytogeneratecaption
AT masayulkhodra modelsemanticattentionsemattwithhybridlearningseparableneuralnetworkandlongshorttermmemorytogeneratecaption
AT deshintaarrovadewi modelsemanticattentionsemattwithhybridlearningseparableneuralnetworkandlongshorttermmemorytogeneratecaption