Urdu Lip Reading Systems for Digits in Controlled and Uncontrolled Environment
Lip reading technology can significantly benefit various domains, such as enhancing communication for the hearing impaired person, assisting in noisy environments, and improving security with silent password inputs. Despite advancements in lip reading for several languages, there has been limited su...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2025-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10845760/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1823857047761846272 |
---|---|
author | Amanullah Baloch Mushtaq Ali Lal Hussain Touseef Sadiq Badr S. Alkahtani |
author_facet | Amanullah Baloch Mushtaq Ali Lal Hussain Touseef Sadiq Badr S. Alkahtani |
author_sort | Amanullah Baloch |
collection | DOAJ |
description | Lip reading technology can significantly benefit various domains, such as enhancing communication for the hearing impaired person, assisting in noisy environments, and improving security with silent password inputs. Despite advancements in lip reading for several languages, there has been limited success in developing an effective model for Urdu lip reading due to the lack of an appropriate dataset and the challenges faced by earlier models, such as the unsuccessful adaptation of the LipNet model to Urdu. To address these issues, we contribute by introducing the ULRD dataset, employing diverse data augmentation techniques, and comparing three DNN models: a Hybrid 2D-3D CNN-LSTM model, a LipNet-based 2D CNN-LSTM model, and a baseline 3D CNN-GRU model. Each model is evaluated in both controlled and uncontrolled environments, using both seen and unseen data. Results indicate that the LipNet-based 2D CNN-LSTM model achieves overall 92.15 % high accuracy in all conditions, but the Hybrid model demonstrates impressive generalization with an overall 90.00 % accuracy on unseen data due to its enhanced spatiotemporal feature extraction capability. Additionally, the precision: 0.91, recall: 0.91, and F1-Score: 0.91 results of LipNet-based 2D CNN-LSTM model are also high, then its other competitors models. The other various findings highlight the effectiveness of different DNN architectures and the potential improvements offered by the ULRD dataset for Urdu lip reading research. |
format | Article |
id | doaj-art-35d97a9819d148ceb0965f1f4921764b |
institution | Kabale University |
issn | 2169-3536 |
language | English |
publishDate | 2025-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj-art-35d97a9819d148ceb0965f1f4921764b2025-02-12T00:02:55ZengIEEEIEEE Access2169-35362025-01-0113249062492710.1109/ACCESS.2025.353164010845760Urdu Lip Reading Systems for Digits in Controlled and Uncontrolled EnvironmentAmanullah Baloch0Mushtaq Ali1https://orcid.org/0000-0002-3697-9498Lal Hussain2https://orcid.org/0000-0003-1103-4938Touseef Sadiq3https://orcid.org/0000-0001-6603-3639Badr S. Alkahtani4Department of Computer Science and Information Technology, Hazara University Mansehra, Mansehra, PakistanDepartment of Computer Science and Information Technology, Hazara University Mansehra, Mansehra, PakistanDepartment of Computer Science and IT, Neelum Campus, The University of Azad Jammu and Kashmir, Athmuqam, PakistanDepartment of Information and Communication Technology, Centre for Artificial Intelligence Research (CAIR), University of Agder, Grimstad, NorwayDepartment of Mathematics, King Saud University, Riyadh, Saudi ArabiaLip reading technology can significantly benefit various domains, such as enhancing communication for the hearing impaired person, assisting in noisy environments, and improving security with silent password inputs. Despite advancements in lip reading for several languages, there has been limited success in developing an effective model for Urdu lip reading due to the lack of an appropriate dataset and the challenges faced by earlier models, such as the unsuccessful adaptation of the LipNet model to Urdu. To address these issues, we contribute by introducing the ULRD dataset, employing diverse data augmentation techniques, and comparing three DNN models: a Hybrid 2D-3D CNN-LSTM model, a LipNet-based 2D CNN-LSTM model, and a baseline 3D CNN-GRU model. Each model is evaluated in both controlled and uncontrolled environments, using both seen and unseen data. Results indicate that the LipNet-based 2D CNN-LSTM model achieves overall 92.15 % high accuracy in all conditions, but the Hybrid model demonstrates impressive generalization with an overall 90.00 % accuracy on unseen data due to its enhanced spatiotemporal feature extraction capability. Additionally, the precision: 0.91, recall: 0.91, and F1-Score: 0.91 results of LipNet-based 2D CNN-LSTM model are also high, then its other competitors models. The other various findings highlight the effectiveness of different DNN architectures and the potential improvements offered by the ULRD dataset for Urdu lip reading research.https://ieeexplore.ieee.org/document/10845760/Lip reading systemUrdu datasetdeep learning modeldata augmentationlips extraction |
spellingShingle | Amanullah Baloch Mushtaq Ali Lal Hussain Touseef Sadiq Badr S. Alkahtani Urdu Lip Reading Systems for Digits in Controlled and Uncontrolled Environment IEEE Access Lip reading system Urdu dataset deep learning model data augmentation lips extraction |
title | Urdu Lip Reading Systems for Digits in Controlled and Uncontrolled Environment |
title_full | Urdu Lip Reading Systems for Digits in Controlled and Uncontrolled Environment |
title_fullStr | Urdu Lip Reading Systems for Digits in Controlled and Uncontrolled Environment |
title_full_unstemmed | Urdu Lip Reading Systems for Digits in Controlled and Uncontrolled Environment |
title_short | Urdu Lip Reading Systems for Digits in Controlled and Uncontrolled Environment |
title_sort | urdu lip reading systems for digits in controlled and uncontrolled environment |
topic | Lip reading system Urdu dataset deep learning model data augmentation lips extraction |
url | https://ieeexplore.ieee.org/document/10845760/ |
work_keys_str_mv | AT amanullahbaloch urdulipreadingsystemsfordigitsincontrolledanduncontrolledenvironment AT mushtaqali urdulipreadingsystemsfordigitsincontrolledanduncontrolledenvironment AT lalhussain urdulipreadingsystemsfordigitsincontrolledanduncontrolledenvironment AT touseefsadiq urdulipreadingsystemsfordigitsincontrolledanduncontrolledenvironment AT badrsalkahtani urdulipreadingsystemsfordigitsincontrolledanduncontrolledenvironment |