Speaker-Independent Phoneme-Based Automatic Quranic Speech Recognition Using Deep Learning

An automatic speech recognition system is important to help Muslims recite the Holy Quran accurately. Most existing research ignores a wide range of potential users (reciters) in their systems by focusing on professional adult male reciters due to the abundance of this group’s recordings...

Full description

Saved in:

Bibliographic Details
Main Authors:	Samah Al-Zaro, Mahmoud Al-Ayyoub, Osama Al-Khaleel
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Access
Subjects:	Arabic deep learning DeepSpeech language model natural language processing Quran
Online Access:	https://ieeexplore.ieee.org/document/11080439/
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	An automatic speech recognition system is important to help Muslims recite the Holy Quran accurately. Most existing research ignores a wide range of potential users (reciters) in their systems by focusing on professional adult male reciters due to the abundance of this group’s recordings and the lack of annotated data for other groups. This work bridges this gap by developing a speaker-independent system that recognizes Quranic recitations of different genders, ages, accents, and Tajweed levels. Our recognizer is designed on the phoneme level to offer Tajweed detection. Using a private dataset, rich of non-transcribed recitations, we propose training the DeepSpeech model with Transfer Learning and semi-supervised learning techniques. The performance of our model is evaluated using several proposed language models and evaluation metrics, including Word Error Rate (WER) and Phoneme Error Rate (PER). The goal is to show how our model would perform in regard to diverse reciter groups. Starting with a typical test set of unseen professional adult male recitations, the WER/PER of our model are 3.11% and 6.18%, respectively. More interestingly, our model achieves a WER of 25.39% and 17.93% when tested on recitations of non-professional (normal) females and children, respectively. The results are very promising and ensure the ability of our model to recognize recitations of various groups of normal reciters. Moreover, the latter results were done on the public “in-the-wild” Tarteel dataset, hoping this will be useful for comparison with future research and building more practical recitation teaching applications. In fact, a major limitation of existing systems (including ours) is the ability to handle diverse in-the-wild scenarios, such as when the reciter is reciting the verses in a very high tempo (common for those trying to memorize the Quran.
ISSN:	2169-3536

Speaker-Independent Phoneme-Based Automatic Quranic Speech Recognition Using Deep Learning

Similar Items