Convolutional Neural Networks to Facilitate the Continuous Recognition of Arabic Speech with Independent Speakers

Automatic speech recognition (ASR) is a field of research that focuses on the ability of computers to process and interpret speech feedback from humans and to provide the highest degree of accuracy in recognition. Speech is one of the simplest ways to convey a message in a basic context, and ASR ref...

Full description

Saved in:
Bibliographic Details
Main Authors: Sally A. Sayed, Rania Ahmed Abdel Azeem Abul Seoud, Howida Y. Abdel Naby
Format: Article
Language:English
Published: Wiley 2024-01-01
Series:Journal of Electrical and Computer Engineering
Online Access:http://dx.doi.org/10.1155/2024/4976944
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850166914273247232
author Sally A. Sayed
Rania Ahmed Abdel Azeem Abul Seoud
Howida Y. Abdel Naby
author_facet Sally A. Sayed
Rania Ahmed Abdel Azeem Abul Seoud
Howida Y. Abdel Naby
author_sort Sally A. Sayed
collection DOAJ
description Automatic speech recognition (ASR) is a field of research that focuses on the ability of computers to process and interpret speech feedback from humans and to provide the highest degree of accuracy in recognition. Speech is one of the simplest ways to convey a message in a basic context, and ASR refers to the ability of machines to process and accept speech data from humans with the greatest degree of accuracy. As the human-to-machine interface continues to evolve, speech recognition is expected to become increasingly important. However, the Arabic language has distinct features that set it apart from other languages, such as the dialect and the pronunciation of words. Until now, insufficient attention has been devoted to continuous Arabic speech recognition research for independent speakers with a limited database. This research proposed two techniques for the recognition of Arabic speech. The first uses a combination of convolutional neural network (CNN) and long short-term memory (LSTM) encoders, and an attention-based decoder, and the second is based on the Sphinx-4 recognizer, which includes pocket sphinx, base sphinx, and sphinx train, with various types and number of features to be extracted (filter bank and mel frequency cepstral coefficients (MFCC)) based on the CMU Sphinx tool, which generates a language model for different sentences spoken by different speakers. These approaches were tested on a dataset containing 7 hours of spoken Arabic from 11 Arab countries, covering the Levant, Gulf, and African regions, which make up the Arab world, and achieved promising results. CNN-LSTM achieved a word error rate (WER) of 3.63% using 120 features for filter bank and 4.04% WER using 39 features for MFCC, respectively, while the Sphinx-4 recognizer technique achieved 8.17% WER and an accuracy of 91.83% using 25 features for MFCC and 8 Gaussian mixtures, respectively, when tested on the same benchmark dataset.
format Article
id doaj-art-e53ab645a57e494ba6acaf1555913fe8
institution OA Journals
issn 2090-0155
language English
publishDate 2024-01-01
publisher Wiley
record_format Article
series Journal of Electrical and Computer Engineering
spelling doaj-art-e53ab645a57e494ba6acaf1555913fe82025-08-20T02:21:19ZengWileyJournal of Electrical and Computer Engineering2090-01552024-01-01202410.1155/2024/4976944Convolutional Neural Networks to Facilitate the Continuous Recognition of Arabic Speech with Independent SpeakersSally A. Sayed0Rania Ahmed Abdel Azeem Abul Seoud1Howida Y. Abdel Naby2Department of Computer ScienceDepartment of Electrical EngineeringDepartment of Computer ScienceAutomatic speech recognition (ASR) is a field of research that focuses on the ability of computers to process and interpret speech feedback from humans and to provide the highest degree of accuracy in recognition. Speech is one of the simplest ways to convey a message in a basic context, and ASR refers to the ability of machines to process and accept speech data from humans with the greatest degree of accuracy. As the human-to-machine interface continues to evolve, speech recognition is expected to become increasingly important. However, the Arabic language has distinct features that set it apart from other languages, such as the dialect and the pronunciation of words. Until now, insufficient attention has been devoted to continuous Arabic speech recognition research for independent speakers with a limited database. This research proposed two techniques for the recognition of Arabic speech. The first uses a combination of convolutional neural network (CNN) and long short-term memory (LSTM) encoders, and an attention-based decoder, and the second is based on the Sphinx-4 recognizer, which includes pocket sphinx, base sphinx, and sphinx train, with various types and number of features to be extracted (filter bank and mel frequency cepstral coefficients (MFCC)) based on the CMU Sphinx tool, which generates a language model for different sentences spoken by different speakers. These approaches were tested on a dataset containing 7 hours of spoken Arabic from 11 Arab countries, covering the Levant, Gulf, and African regions, which make up the Arab world, and achieved promising results. CNN-LSTM achieved a word error rate (WER) of 3.63% using 120 features for filter bank and 4.04% WER using 39 features for MFCC, respectively, while the Sphinx-4 recognizer technique achieved 8.17% WER and an accuracy of 91.83% using 25 features for MFCC and 8 Gaussian mixtures, respectively, when tested on the same benchmark dataset.http://dx.doi.org/10.1155/2024/4976944
spellingShingle Sally A. Sayed
Rania Ahmed Abdel Azeem Abul Seoud
Howida Y. Abdel Naby
Convolutional Neural Networks to Facilitate the Continuous Recognition of Arabic Speech with Independent Speakers
Journal of Electrical and Computer Engineering
title Convolutional Neural Networks to Facilitate the Continuous Recognition of Arabic Speech with Independent Speakers
title_full Convolutional Neural Networks to Facilitate the Continuous Recognition of Arabic Speech with Independent Speakers
title_fullStr Convolutional Neural Networks to Facilitate the Continuous Recognition of Arabic Speech with Independent Speakers
title_full_unstemmed Convolutional Neural Networks to Facilitate the Continuous Recognition of Arabic Speech with Independent Speakers
title_short Convolutional Neural Networks to Facilitate the Continuous Recognition of Arabic Speech with Independent Speakers
title_sort convolutional neural networks to facilitate the continuous recognition of arabic speech with independent speakers
url http://dx.doi.org/10.1155/2024/4976944
work_keys_str_mv AT sallyasayed convolutionalneuralnetworkstofacilitatethecontinuousrecognitionofarabicspeechwithindependentspeakers
AT raniaahmedabdelazeemabulseoud convolutionalneuralnetworkstofacilitatethecontinuousrecognitionofarabicspeechwithindependentspeakers
AT howidayabdelnaby convolutionalneuralnetworkstofacilitatethecontinuousrecognitionofarabicspeechwithindependentspeakers