Model Semantic Attention (SemAtt) With Hybrid Learning Separable Neural Network and Long Short-Term Memory to Generate Caption
Image captioning is a hot topic that combines a multidiscipline task between computer vision and natural language processing. One of the tasks in the geological field is to make descriptions from the images of geological rocks. The task of a geologist is to write a content description of an image an...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2024-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/10719978/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850201474827550720 |
|---|---|
| author | Agus Nursikuwagus Rinaldi Munir Masayu L. Khodra Deshinta Arrova Dewi |
| author_facet | Agus Nursikuwagus Rinaldi Munir Masayu L. Khodra Deshinta Arrova Dewi |
| author_sort | Agus Nursikuwagus |
| collection | DOAJ |
| description | Image captioning is a hot topic that combines a multidiscipline task between computer vision and natural language processing. One of the tasks in the geological field is to make descriptions from the images of geological rocks. The task of a geologist is to write a content description of an image and display it as text that can be used in the future. Interpretation of the object is one of the objectives of the research, which is to traverse the image structures in depth. Shapes, colors, and structures are to be focused on to get the image’s features. The problem faced is how the separable neural network (SNN) and long short-term memory (LSTM) have an impact on the caption that can meet the geologist’s description. SNN is called Visual Attention (VaT), and LSTM is called Semantic Attention (SemAtt) as an architecture of image captioning. The result of the experiment confirms that the accuracy model for captioning gets BLEU-<inline-formula> <tex-math notation="LaTeX">$1=0.908$ </tex-math></inline-formula>, BLEU-<inline-formula> <tex-math notation="LaTeX">$2=0.877$ </tex-math></inline-formula>, BLEU-<inline-formula> <tex-math notation="LaTeX">$3=0.750$ </tex-math></inline-formula>, and BLEU-<inline-formula> <tex-math notation="LaTeX">$4=0.510$ </tex-math></inline-formula>. The evaluation score is compared to those of other evaluators, such as Meteor and RougeL, which get 0.670 and 0.623, respectively. The model confirms that it outperforms the baseline model. Referring to the evaluations, we concluded that the model was able to generate captioned geological rock images that met the geologist’s description. Precision and recall have supported the models in providing the predicted word that is suitable for the image features. |
| format | Article |
| id | doaj-art-2ea75f3b48f84d36954f973b9a65de56 |
| institution | OA Journals |
| issn | 2169-3536 |
| language | English |
| publishDate | 2024-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-2ea75f3b48f84d36954f973b9a65de562025-08-20T02:12:01ZengIEEEIEEE Access2169-35362024-01-011215446715448110.1109/ACCESS.2024.348149910719978Model Semantic Attention (SemAtt) With Hybrid Learning Separable Neural Network and Long Short-Term Memory to Generate CaptionAgus Nursikuwagus0https://orcid.org/0000-0001-8435-7522Rinaldi Munir1Masayu L. Khodra2Deshinta Arrova Dewi3https://orcid.org/0000-0003-1488-7696Faculty of Engineering and Computer Science, Universitas Komputer Indonesia, Bandung, IndonesiaSchool of Electrical Engineering and Informatics, Bandung Institute of Technology, Bandung, IndonesiaSchool of Electrical Engineering and Informatics, Bandung Institute of Technology, Bandung, IndonesiaFaculty of Data Science and Information Technology, INTI International University, Nilai, MalaysiaImage captioning is a hot topic that combines a multidiscipline task between computer vision and natural language processing. One of the tasks in the geological field is to make descriptions from the images of geological rocks. The task of a geologist is to write a content description of an image and display it as text that can be used in the future. Interpretation of the object is one of the objectives of the research, which is to traverse the image structures in depth. Shapes, colors, and structures are to be focused on to get the image’s features. The problem faced is how the separable neural network (SNN) and long short-term memory (LSTM) have an impact on the caption that can meet the geologist’s description. SNN is called Visual Attention (VaT), and LSTM is called Semantic Attention (SemAtt) as an architecture of image captioning. The result of the experiment confirms that the accuracy model for captioning gets BLEU-<inline-formula> <tex-math notation="LaTeX">$1=0.908$ </tex-math></inline-formula>, BLEU-<inline-formula> <tex-math notation="LaTeX">$2=0.877$ </tex-math></inline-formula>, BLEU-<inline-formula> <tex-math notation="LaTeX">$3=0.750$ </tex-math></inline-formula>, and BLEU-<inline-formula> <tex-math notation="LaTeX">$4=0.510$ </tex-math></inline-formula>. The evaluation score is compared to those of other evaluators, such as Meteor and RougeL, which get 0.670 and 0.623, respectively. The model confirms that it outperforms the baseline model. Referring to the evaluations, we concluded that the model was able to generate captioned geological rock images that met the geologist’s description. Precision and recall have supported the models in providing the predicted word that is suitable for the image features.https://ieeexplore.ieee.org/document/10719978/Separable neural networkLSTMtransformerscaptioningsemanticattention |
| spellingShingle | Agus Nursikuwagus Rinaldi Munir Masayu L. Khodra Deshinta Arrova Dewi Model Semantic Attention (SemAtt) With Hybrid Learning Separable Neural Network and Long Short-Term Memory to Generate Caption IEEE Access Separable neural network LSTM transformers captioning semantic attention |
| title | Model Semantic Attention (SemAtt) With Hybrid Learning Separable Neural Network and Long Short-Term Memory to Generate Caption |
| title_full | Model Semantic Attention (SemAtt) With Hybrid Learning Separable Neural Network and Long Short-Term Memory to Generate Caption |
| title_fullStr | Model Semantic Attention (SemAtt) With Hybrid Learning Separable Neural Network and Long Short-Term Memory to Generate Caption |
| title_full_unstemmed | Model Semantic Attention (SemAtt) With Hybrid Learning Separable Neural Network and Long Short-Term Memory to Generate Caption |
| title_short | Model Semantic Attention (SemAtt) With Hybrid Learning Separable Neural Network and Long Short-Term Memory to Generate Caption |
| title_sort | model semantic attention sematt with hybrid learning separable neural network and long short term memory to generate caption |
| topic | Separable neural network LSTM transformers captioning semantic attention |
| url | https://ieeexplore.ieee.org/document/10719978/ |
| work_keys_str_mv | AT agusnursikuwagus modelsemanticattentionsemattwithhybridlearningseparableneuralnetworkandlongshorttermmemorytogeneratecaption AT rinaldimunir modelsemanticattentionsemattwithhybridlearningseparableneuralnetworkandlongshorttermmemorytogeneratecaption AT masayulkhodra modelsemanticattentionsemattwithhybridlearningseparableneuralnetworkandlongshorttermmemorytogeneratecaption AT deshintaarrovadewi modelsemanticattentionsemattwithhybridlearningseparableneuralnetworkandlongshorttermmemorytogeneratecaption |