Multi-Head Attention Based Bidirectional LSTM for Spelling Error Detection in the Indonesian Language

Spelling errors in textual content may significantly hinder communication and comprehension, particularly in formal writing, such as news or reports. Thus, it becomes considerably more important to identify and fix spelling mistakes in the Indonesian language. Despite its significance, there has not...

Full description

Saved in:
Bibliographic Details
Main Authors: Yanfi Yanfi, Haryono Soeparno, Reina Setiawan, Widodo Budiharto
Format: Article
Language:English
Published: IEEE 2024-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10580948/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Spelling errors in textual content may significantly hinder communication and comprehension, particularly in formal writing, such as news or reports. Thus, it becomes considerably more important to identify and fix spelling mistakes in the Indonesian language. Despite its significance, there has not been much progress toward developing efficient systems for identifying spelling errors in Indonesian texts. The solutions that are now available frequently fall short of meeting all spelling needs, including nonword, real-word, and punctuation errors. This study aims to address this gap by presenting a novel algorithm to improve spelling mistake detection within the Indonesian language context. We found gaps in the current methodologies through a thorough, systematic literature study, which helped us develop our innovative solution. Our proposed algorithm starts processing data by gathering and preparing the dataset, merging correct and incorrect sentences, labeling, and preprocessing the data. Furthermore, deep learning techniques were integrated, which combined Bidirectional Long Short-Term Memory (Bi-LSTM) networks to effectively capture the intricacies of sequential data and Multi-Head Attention (MHA) mechanisms to emphasize pertinent segments of input sequences, thereby improving the prediction accuracy. We conducted comprehensive experiments to benchmark the performance of our model against existing models. The findings are interesting, with our model reaching a peak accuracy of 92.26% and greatly exceeding the baseline models, which had the lowest accuracy of 65.72%. This study makes a significant contribution to the Natural Language Processing (NLP) field by demonstrating the efficacy of combining Bi-LSTM with MHA in fixing spelling errors in the Indonesian language.
ISSN:2169-3536