Multi-Head Attention Based Bidirectional LSTM for Spelling Error Detection in the Indonesian Language

Spelling errors in textual content may significantly hinder communication and comprehension, particularly in formal writing, such as news or reports. Thus, it becomes considerably more important to identify and fix spelling mistakes in the Indonesian language. Despite its significance, there has not...

Full description

Saved in:
Bibliographic Details
Main Authors: Yanfi Yanfi, Haryono Soeparno, Reina Setiawan, Widodo Budiharto
Format: Article
Language:English
Published: IEEE 2024-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10580948/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1846118255213674496
author Yanfi Yanfi
Haryono Soeparno
Reina Setiawan
Widodo Budiharto
author_facet Yanfi Yanfi
Haryono Soeparno
Reina Setiawan
Widodo Budiharto
author_sort Yanfi Yanfi
collection DOAJ
description Spelling errors in textual content may significantly hinder communication and comprehension, particularly in formal writing, such as news or reports. Thus, it becomes considerably more important to identify and fix spelling mistakes in the Indonesian language. Despite its significance, there has not been much progress toward developing efficient systems for identifying spelling errors in Indonesian texts. The solutions that are now available frequently fall short of meeting all spelling needs, including nonword, real-word, and punctuation errors. This study aims to address this gap by presenting a novel algorithm to improve spelling mistake detection within the Indonesian language context. We found gaps in the current methodologies through a thorough, systematic literature study, which helped us develop our innovative solution. Our proposed algorithm starts processing data by gathering and preparing the dataset, merging correct and incorrect sentences, labeling, and preprocessing the data. Furthermore, deep learning techniques were integrated, which combined Bidirectional Long Short-Term Memory (Bi-LSTM) networks to effectively capture the intricacies of sequential data and Multi-Head Attention (MHA) mechanisms to emphasize pertinent segments of input sequences, thereby improving the prediction accuracy. We conducted comprehensive experiments to benchmark the performance of our model against existing models. The findings are interesting, with our model reaching a peak accuracy of 92.26% and greatly exceeding the baseline models, which had the lowest accuracy of 65.72%. This study makes a significant contribution to the Natural Language Processing (NLP) field by demonstrating the efficacy of combining Bi-LSTM with MHA in fixing spelling errors in the Indonesian language.
format Article
id doaj-art-1fa2b148968e4a88b1c00f8ff7aa4127
institution Kabale University
issn 2169-3536
language English
publishDate 2024-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-1fa2b148968e4a88b1c00f8ff7aa41272024-12-18T00:02:49ZengIEEEIEEE Access2169-35362024-01-011218856018857110.1109/ACCESS.2024.342231810580948Multi-Head Attention Based Bidirectional LSTM for Spelling Error Detection in the Indonesian LanguageYanfi Yanfi0https://orcid.org/0000-0002-3610-0383Haryono Soeparno1Reina Setiawan2https://orcid.org/0000-0002-2123-1460Widodo Budiharto3https://orcid.org/0000-0003-2681-0901Computer Science Department, BINUS Graduate Program-Doctor of Computer Science, Bina Nusantara University, Jakarta, IndonesiaComputer Science Department, BINUS Graduate Program-Doctor of Computer Science, Bina Nusantara University, Jakarta, IndonesiaComputer Science Department, BINUS Graduate Program-Doctor of Computer Science, Bina Nusantara University, Jakarta, IndonesiaComputer Science Department, School of Computer Science, Bina Nusantara University, Jakarta, IndonesiaSpelling errors in textual content may significantly hinder communication and comprehension, particularly in formal writing, such as news or reports. Thus, it becomes considerably more important to identify and fix spelling mistakes in the Indonesian language. Despite its significance, there has not been much progress toward developing efficient systems for identifying spelling errors in Indonesian texts. The solutions that are now available frequently fall short of meeting all spelling needs, including nonword, real-word, and punctuation errors. This study aims to address this gap by presenting a novel algorithm to improve spelling mistake detection within the Indonesian language context. We found gaps in the current methodologies through a thorough, systematic literature study, which helped us develop our innovative solution. Our proposed algorithm starts processing data by gathering and preparing the dataset, merging correct and incorrect sentences, labeling, and preprocessing the data. Furthermore, deep learning techniques were integrated, which combined Bidirectional Long Short-Term Memory (Bi-LSTM) networks to effectively capture the intricacies of sequential data and Multi-Head Attention (MHA) mechanisms to emphasize pertinent segments of input sequences, thereby improving the prediction accuracy. We conducted comprehensive experiments to benchmark the performance of our model against existing models. The findings are interesting, with our model reaching a peak accuracy of 92.26% and greatly exceeding the baseline models, which had the lowest accuracy of 65.72%. This study makes a significant contribution to the Natural Language Processing (NLP) field by demonstrating the efficacy of combining Bi-LSTM with MHA in fixing spelling errors in the Indonesian language.https://ieeexplore.ieee.org/document/10580948/Bidirectional long short-term memoryIndonesian languagemulti-head attention mechanismnatural language processingspell error detection
spellingShingle Yanfi Yanfi
Haryono Soeparno
Reina Setiawan
Widodo Budiharto
Multi-Head Attention Based Bidirectional LSTM for Spelling Error Detection in the Indonesian Language
IEEE Access
Bidirectional long short-term memory
Indonesian language
multi-head attention mechanism
natural language processing
spell error detection
title Multi-Head Attention Based Bidirectional LSTM for Spelling Error Detection in the Indonesian Language
title_full Multi-Head Attention Based Bidirectional LSTM for Spelling Error Detection in the Indonesian Language
title_fullStr Multi-Head Attention Based Bidirectional LSTM for Spelling Error Detection in the Indonesian Language
title_full_unstemmed Multi-Head Attention Based Bidirectional LSTM for Spelling Error Detection in the Indonesian Language
title_short Multi-Head Attention Based Bidirectional LSTM for Spelling Error Detection in the Indonesian Language
title_sort multi head attention based bidirectional lstm for spelling error detection in the indonesian language
topic Bidirectional long short-term memory
Indonesian language
multi-head attention mechanism
natural language processing
spell error detection
url https://ieeexplore.ieee.org/document/10580948/
work_keys_str_mv AT yanfiyanfi multiheadattentionbasedbidirectionallstmforspellingerrordetectionintheindonesianlanguage
AT haryonosoeparno multiheadattentionbasedbidirectionallstmforspellingerrordetectionintheindonesianlanguage
AT reinasetiawan multiheadattentionbasedbidirectionallstmforspellingerrordetectionintheindonesianlanguage
AT widodobudiharto multiheadattentionbasedbidirectionallstmforspellingerrordetectionintheindonesianlanguage