Urdu Lip Reading Systems for Digits in Controlled and Uncontrolled Environment

Lip reading technology can significantly benefit various domains, such as enhancing communication for the hearing impaired person, assisting in noisy environments, and improving security with silent password inputs. Despite advancements in lip reading for several languages, there has been limited su...

Full description

Saved in:
Bibliographic Details
Main Authors: Amanullah Baloch, Mushtaq Ali, Lal Hussain, Touseef Sadiq, Badr S. Alkahtani
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10845760/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Lip reading technology can significantly benefit various domains, such as enhancing communication for the hearing impaired person, assisting in noisy environments, and improving security with silent password inputs. Despite advancements in lip reading for several languages, there has been limited success in developing an effective model for Urdu lip reading due to the lack of an appropriate dataset and the challenges faced by earlier models, such as the unsuccessful adaptation of the LipNet model to Urdu. To address these issues, we contribute by introducing the ULRD dataset, employing diverse data augmentation techniques, and comparing three DNN models: a Hybrid 2D-3D CNN-LSTM model, a LipNet-based 2D CNN-LSTM model, and a baseline 3D CNN-GRU model. Each model is evaluated in both controlled and uncontrolled environments, using both seen and unseen data. Results indicate that the LipNet-based 2D CNN-LSTM model achieves overall 92.15 % high accuracy in all conditions, but the Hybrid model demonstrates impressive generalization with an overall 90.00 % accuracy on unseen data due to its enhanced spatiotemporal feature extraction capability. Additionally, the precision: 0.91, recall: 0.91, and F1-Score: 0.91 results of LipNet-based 2D CNN-LSTM model are also high, then its other competitors models. The other various findings highlight the effectiveness of different DNN architectures and the potential improvements offered by the ULRD dataset for Urdu lip reading research.
ISSN:2169-3536