Indonesian Lip-Reading Detection and Recognition Based on Lip Shape Using Face Mesh and Long-Term Recurrent Convolutional Network
Communication through speech can be hindered by environmental noise, prompting the need for alternative methods such as lip reading, which bypasses auditory challenges. However, the accurate interpretation of lip movements is impeded by the uniqueness of individual lip shapes, necessitating detailed...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Wiley
2024-01-01
|
| Series: | Applied Computational Intelligence and Soft Computing |
| Online Access: | http://dx.doi.org/10.1155/2024/6479124 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849413881157386240 |
|---|---|
| author | null Aripin Abas Setiawan |
| author_facet | null Aripin Abas Setiawan |
| author_sort | null Aripin |
| collection | DOAJ |
| description | Communication through speech can be hindered by environmental noise, prompting the need for alternative methods such as lip reading, which bypasses auditory challenges. However, the accurate interpretation of lip movements is impeded by the uniqueness of individual lip shapes, necessitating detailed analysis. In addition, the development of an Indonesian dataset addresses the lack of diversity in existing datasets, predominantly in English, fostering more inclusive research. This study proposes an enhanced lip-reading system trained using the long-term recurrent convolutional network (LRCN) considering eight different types of lip shapes. MediaPipe Face Mesh precisely detects lip landmarks, enabling the LRCN model to recognize Indonesian utterances. Experimental results demonstrate the effectiveness of the approach, with the LRCN model with three convolutional layers (LRCN-3Conv) achieving 95.42% accuracy for word test data and 95.63% for phrases, outperforming the convolutional long short-term memory (Conv-LSTM) method. The proposed approach outperforms Conv-LSTM in terms of accuracy. Furthermore, the evaluation of the original MIRACL-VC1 dataset also produced a best accuracy of 90.67% on LRCN-3Conv compared to previous studies in the word-labeled class. The success is attributed to MediaPipe Face Mesh detection, which facilitates the accurate detection of the lip region. Leveraging advanced deep learning techniques and precise landmark detection, these findings promise improved communication accessibility for individuals facing auditory challenges. |
| format | Article |
| id | doaj-art-700b0f006fac4137897f123181f8dfaf |
| institution | Kabale University |
| issn | 1687-9732 |
| language | English |
| publishDate | 2024-01-01 |
| publisher | Wiley |
| record_format | Article |
| series | Applied Computational Intelligence and Soft Computing |
| spelling | doaj-art-700b0f006fac4137897f123181f8dfaf2025-08-20T03:34:00ZengWileyApplied Computational Intelligence and Soft Computing1687-97322024-01-01202410.1155/2024/6479124Indonesian Lip-Reading Detection and Recognition Based on Lip Shape Using Face Mesh and Long-Term Recurrent Convolutional Networknull Aripin0Abas Setiawan1Department of Biomedical EngineeringDepartment of Computer ScienceCommunication through speech can be hindered by environmental noise, prompting the need for alternative methods such as lip reading, which bypasses auditory challenges. However, the accurate interpretation of lip movements is impeded by the uniqueness of individual lip shapes, necessitating detailed analysis. In addition, the development of an Indonesian dataset addresses the lack of diversity in existing datasets, predominantly in English, fostering more inclusive research. This study proposes an enhanced lip-reading system trained using the long-term recurrent convolutional network (LRCN) considering eight different types of lip shapes. MediaPipe Face Mesh precisely detects lip landmarks, enabling the LRCN model to recognize Indonesian utterances. Experimental results demonstrate the effectiveness of the approach, with the LRCN model with three convolutional layers (LRCN-3Conv) achieving 95.42% accuracy for word test data and 95.63% for phrases, outperforming the convolutional long short-term memory (Conv-LSTM) method. The proposed approach outperforms Conv-LSTM in terms of accuracy. Furthermore, the evaluation of the original MIRACL-VC1 dataset also produced a best accuracy of 90.67% on LRCN-3Conv compared to previous studies in the word-labeled class. The success is attributed to MediaPipe Face Mesh detection, which facilitates the accurate detection of the lip region. Leveraging advanced deep learning techniques and precise landmark detection, these findings promise improved communication accessibility for individuals facing auditory challenges.http://dx.doi.org/10.1155/2024/6479124 |
| spellingShingle | null Aripin Abas Setiawan Indonesian Lip-Reading Detection and Recognition Based on Lip Shape Using Face Mesh and Long-Term Recurrent Convolutional Network Applied Computational Intelligence and Soft Computing |
| title | Indonesian Lip-Reading Detection and Recognition Based on Lip Shape Using Face Mesh and Long-Term Recurrent Convolutional Network |
| title_full | Indonesian Lip-Reading Detection and Recognition Based on Lip Shape Using Face Mesh and Long-Term Recurrent Convolutional Network |
| title_fullStr | Indonesian Lip-Reading Detection and Recognition Based on Lip Shape Using Face Mesh and Long-Term Recurrent Convolutional Network |
| title_full_unstemmed | Indonesian Lip-Reading Detection and Recognition Based on Lip Shape Using Face Mesh and Long-Term Recurrent Convolutional Network |
| title_short | Indonesian Lip-Reading Detection and Recognition Based on Lip Shape Using Face Mesh and Long-Term Recurrent Convolutional Network |
| title_sort | indonesian lip reading detection and recognition based on lip shape using face mesh and long term recurrent convolutional network |
| url | http://dx.doi.org/10.1155/2024/6479124 |
| work_keys_str_mv | AT nullaripin indonesianlipreadingdetectionandrecognitionbasedonlipshapeusingfacemeshandlongtermrecurrentconvolutionalnetwork AT abassetiawan indonesianlipreadingdetectionandrecognitionbasedonlipshapeusingfacemeshandlongtermrecurrentconvolutionalnetwork |