Multi-Head Attention Based Bidirectional LSTM for Spelling Error Detection in the Indonesian Language
Spelling errors in textual content may significantly hinder communication and comprehension, particularly in formal writing, such as news or reports. Thus, it becomes considerably more important to identify and fix spelling mistakes in the Indonesian language. Despite its significance, there has not...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2024-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/10580948/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1846118255213674496 |
|---|---|
| author | Yanfi Yanfi Haryono Soeparno Reina Setiawan Widodo Budiharto |
| author_facet | Yanfi Yanfi Haryono Soeparno Reina Setiawan Widodo Budiharto |
| author_sort | Yanfi Yanfi |
| collection | DOAJ |
| description | Spelling errors in textual content may significantly hinder communication and comprehension, particularly in formal writing, such as news or reports. Thus, it becomes considerably more important to identify and fix spelling mistakes in the Indonesian language. Despite its significance, there has not been much progress toward developing efficient systems for identifying spelling errors in Indonesian texts. The solutions that are now available frequently fall short of meeting all spelling needs, including nonword, real-word, and punctuation errors. This study aims to address this gap by presenting a novel algorithm to improve spelling mistake detection within the Indonesian language context. We found gaps in the current methodologies through a thorough, systematic literature study, which helped us develop our innovative solution. Our proposed algorithm starts processing data by gathering and preparing the dataset, merging correct and incorrect sentences, labeling, and preprocessing the data. Furthermore, deep learning techniques were integrated, which combined Bidirectional Long Short-Term Memory (Bi-LSTM) networks to effectively capture the intricacies of sequential data and Multi-Head Attention (MHA) mechanisms to emphasize pertinent segments of input sequences, thereby improving the prediction accuracy. We conducted comprehensive experiments to benchmark the performance of our model against existing models. The findings are interesting, with our model reaching a peak accuracy of 92.26% and greatly exceeding the baseline models, which had the lowest accuracy of 65.72%. This study makes a significant contribution to the Natural Language Processing (NLP) field by demonstrating the efficacy of combining Bi-LSTM with MHA in fixing spelling errors in the Indonesian language. |
| format | Article |
| id | doaj-art-1fa2b148968e4a88b1c00f8ff7aa4127 |
| institution | Kabale University |
| issn | 2169-3536 |
| language | English |
| publishDate | 2024-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-1fa2b148968e4a88b1c00f8ff7aa41272024-12-18T00:02:49ZengIEEEIEEE Access2169-35362024-01-011218856018857110.1109/ACCESS.2024.342231810580948Multi-Head Attention Based Bidirectional LSTM for Spelling Error Detection in the Indonesian LanguageYanfi Yanfi0https://orcid.org/0000-0002-3610-0383Haryono Soeparno1Reina Setiawan2https://orcid.org/0000-0002-2123-1460Widodo Budiharto3https://orcid.org/0000-0003-2681-0901Computer Science Department, BINUS Graduate Program-Doctor of Computer Science, Bina Nusantara University, Jakarta, IndonesiaComputer Science Department, BINUS Graduate Program-Doctor of Computer Science, Bina Nusantara University, Jakarta, IndonesiaComputer Science Department, BINUS Graduate Program-Doctor of Computer Science, Bina Nusantara University, Jakarta, IndonesiaComputer Science Department, School of Computer Science, Bina Nusantara University, Jakarta, IndonesiaSpelling errors in textual content may significantly hinder communication and comprehension, particularly in formal writing, such as news or reports. Thus, it becomes considerably more important to identify and fix spelling mistakes in the Indonesian language. Despite its significance, there has not been much progress toward developing efficient systems for identifying spelling errors in Indonesian texts. The solutions that are now available frequently fall short of meeting all spelling needs, including nonword, real-word, and punctuation errors. This study aims to address this gap by presenting a novel algorithm to improve spelling mistake detection within the Indonesian language context. We found gaps in the current methodologies through a thorough, systematic literature study, which helped us develop our innovative solution. Our proposed algorithm starts processing data by gathering and preparing the dataset, merging correct and incorrect sentences, labeling, and preprocessing the data. Furthermore, deep learning techniques were integrated, which combined Bidirectional Long Short-Term Memory (Bi-LSTM) networks to effectively capture the intricacies of sequential data and Multi-Head Attention (MHA) mechanisms to emphasize pertinent segments of input sequences, thereby improving the prediction accuracy. We conducted comprehensive experiments to benchmark the performance of our model against existing models. The findings are interesting, with our model reaching a peak accuracy of 92.26% and greatly exceeding the baseline models, which had the lowest accuracy of 65.72%. This study makes a significant contribution to the Natural Language Processing (NLP) field by demonstrating the efficacy of combining Bi-LSTM with MHA in fixing spelling errors in the Indonesian language.https://ieeexplore.ieee.org/document/10580948/Bidirectional long short-term memoryIndonesian languagemulti-head attention mechanismnatural language processingspell error detection |
| spellingShingle | Yanfi Yanfi Haryono Soeparno Reina Setiawan Widodo Budiharto Multi-Head Attention Based Bidirectional LSTM for Spelling Error Detection in the Indonesian Language IEEE Access Bidirectional long short-term memory Indonesian language multi-head attention mechanism natural language processing spell error detection |
| title | Multi-Head Attention Based Bidirectional LSTM for Spelling Error Detection in the Indonesian Language |
| title_full | Multi-Head Attention Based Bidirectional LSTM for Spelling Error Detection in the Indonesian Language |
| title_fullStr | Multi-Head Attention Based Bidirectional LSTM for Spelling Error Detection in the Indonesian Language |
| title_full_unstemmed | Multi-Head Attention Based Bidirectional LSTM for Spelling Error Detection in the Indonesian Language |
| title_short | Multi-Head Attention Based Bidirectional LSTM for Spelling Error Detection in the Indonesian Language |
| title_sort | multi head attention based bidirectional lstm for spelling error detection in the indonesian language |
| topic | Bidirectional long short-term memory Indonesian language multi-head attention mechanism natural language processing spell error detection |
| url | https://ieeexplore.ieee.org/document/10580948/ |
| work_keys_str_mv | AT yanfiyanfi multiheadattentionbasedbidirectionallstmforspellingerrordetectionintheindonesianlanguage AT haryonosoeparno multiheadattentionbasedbidirectionallstmforspellingerrordetectionintheindonesianlanguage AT reinasetiawan multiheadattentionbasedbidirectionallstmforspellingerrordetectionintheindonesianlanguage AT widodobudiharto multiheadattentionbasedbidirectionallstmforspellingerrordetectionintheindonesianlanguage |