Speaker-Independent Phoneme-Based Automatic Quranic Speech Recognition Using Deep Learning
An automatic speech recognition system is important to help Muslims recite the Holy Quran accurately. Most existing research ignores a wide range of potential users (reciters) in their systems by focusing on professional adult male reciters due to the abundance of this group’s recordings...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11080439/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | An automatic speech recognition system is important to help Muslims recite the Holy Quran accurately. Most existing research ignores a wide range of potential users (reciters) in their systems by focusing on professional adult male reciters due to the abundance of this group’s recordings and the lack of annotated data for other groups. This work bridges this gap by developing a speaker-independent system that recognizes Quranic recitations of different genders, ages, accents, and Tajweed levels. Our recognizer is designed on the phoneme level to offer Tajweed detection. Using a private dataset, rich of non-transcribed recitations, we propose training the DeepSpeech model with Transfer Learning and semi-supervised learning techniques. The performance of our model is evaluated using several proposed language models and evaluation metrics, including Word Error Rate (WER) and Phoneme Error Rate (PER). The goal is to show how our model would perform in regard to diverse reciter groups. Starting with a typical test set of unseen professional adult male recitations, the WER/PER of our model are 3.11% and 6.18%, respectively. More interestingly, our model achieves a WER of 25.39% and 17.93% when tested on recitations of non-professional (normal) females and children, respectively. The results are very promising and ensure the ability of our model to recognize recitations of various groups of normal reciters. Moreover, the latter results were done on the public “in-the-wild” Tarteel dataset, hoping this will be useful for comparison with future research and building more practical recitation teaching applications. In fact, a major limitation of existing systems (including ours) is the ability to handle diverse in-the-wild scenarios, such as when the reciter is reciting the verses in a very high tempo (common for those trying to memorize the Quran. |
|---|---|
| ISSN: | 2169-3536 |