Combined CNN-LSTM for Enhancing Clean and Noisy Speech Recognition

This paper presents a hybrid Convolutional Neural Network-Long Short-Term Memory (CNN-LSTM) approach for Automatic Speech Recognition (ASR) using deep learning techniques on the Aurora-2 dataset. The dataset includes both clean and multi-condition modes, encompassing four noise scenarios : subway,...

Full description

Saved in:

Bibliographic Details
Main Authors:	Noussaiba Djeffal, Djamel Addou, Hamza Kheddar, Sid Ahmed Selouani
Format:	Article
Language:	Arabic
Published:	Scientific and Technological Research Center for the Development of the Arabic Language 2024-12-01
Series:	Al-Lisaniyyat
Subjects:	ASR - CNN - LSTM - Clean speech - Noisy speech - CNN-LSTM - DNN - SNR
Online Access:	https://crstdla.dz/ojs/index.php/allj/article/view/732
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850078422033760256
author	Noussaiba Djeffal Djamel Addou Hamza Kheddar Sid Ahmed Selouani
author_facet	Noussaiba Djeffal Djamel Addou Hamza Kheddar Sid Ahmed Selouani
author_sort	Noussaiba Djeffal
collection	DOAJ
description	This paper presents a hybrid Convolutional Neural Network-Long Short-Term Memory (CNN-LSTM) approach for Automatic Speech Recognition (ASR) using deep learning techniques on the Aurora-2 dataset. The dataset includes both clean and multi-condition modes, encompassing four noise scenarios : subway, babble, car, and exhibition hall, each evaluated at different signal-to-noise ratios (SNRs), and clean condition, and the results are compared with those from the ASC-10 dataset and the ESC-10 dataset. The problem addressed is the need for robust ASR models that perform well in both clean and noisy environments. The aim of utilizing the CNN-LSTM architecture is to enhance the recognition performance by combining the strengths of CNNs and LSTMs, rather than relying on either CNNs or LSTMs alone. Experimental results demonstrate that the combined CNN-LSTM model achieves superior classification performance, in clean environments on the Aurora2 dataset, attaining an accuracy of 97.96%, surpassing the individual CNN and LSTM models, which achieved 97.21% and 96.06%, respectively. In noisy conditions, the hybrid model also outperforms the standalone models, with an accuracy of 90.72%, compared to 90.12% for CNN and 86.12% for LSTM. These findings indicate that the CNN-LSTM model is more effective in handling various noise conditions and improving overall ASR accuracy.
format	Article
id	doaj-art-4c55892a73ac4bcf9a20a7cf922ead31
institution	DOAJ
issn	1112-4393 2588-2031
language	Arabic
publishDate	2024-12-01
publisher	Scientific and Technological Research Center for the Development of the Arabic Language
record_format	Article
series	Al-Lisaniyyat
spelling	doaj-art-4c55892a73ac4bcf9a20a7cf922ead312025-08-20T02:45:33ZaraScientific and Technological Research Center for the Development of the Arabic LanguageAl-Lisaniyyat1112-43932588-20312024-12-0130210.61850/allj.v30i2.732Combined CNN-LSTM for Enhancing Clean and Noisy Speech Recognition Noussaiba Djeffal0Djamel Addou1Hamza Kheddar2Sid Ahmed Selouani3Speech and Signal Processing Laboratory University of Sciences and Technology, USTHB Algiers, Speech and Signal Processing Laboratory University of Sciences and Technology, USTHB Algiers, LSEA Laboratory, dept. Electrical engineering University of MEDEA MedeaResearch Laboratory in Human-System Interaction University of Moncton, Shippagan Campus Shippagan This paper presents a hybrid Convolutional Neural Network-Long Short-Term Memory (CNN-LSTM) approach for Automatic Speech Recognition (ASR) using deep learning techniques on the Aurora-2 dataset. The dataset includes both clean and multi-condition modes, encompassing four noise scenarios : subway, babble, car, and exhibition hall, each evaluated at different signal-to-noise ratios (SNRs), and clean condition, and the results are compared with those from the ASC-10 dataset and the ESC-10 dataset. The problem addressed is the need for robust ASR models that perform well in both clean and noisy environments. The aim of utilizing the CNN-LSTM architecture is to enhance the recognition performance by combining the strengths of CNNs and LSTMs, rather than relying on either CNNs or LSTMs alone. Experimental results demonstrate that the combined CNN-LSTM model achieves superior classification performance, in clean environments on the Aurora2 dataset, attaining an accuracy of 97.96%, surpassing the individual CNN and LSTM models, which achieved 97.21% and 96.06%, respectively. In noisy conditions, the hybrid model also outperforms the standalone models, with an accuracy of 90.72%, compared to 90.12% for CNN and 86.12% for LSTM. These findings indicate that the CNN-LSTM model is more effective in handling various noise conditions and improving overall ASR accuracy. https://crstdla.dz/ojs/index.php/allj/article/view/732ASR - CNN - LSTM - Clean speech - Noisy speech - CNN-LSTM - DNN - SNR
spellingShingle	Noussaiba Djeffal Djamel Addou Hamza Kheddar Sid Ahmed Selouani Combined CNN-LSTM for Enhancing Clean and Noisy Speech Recognition Al-Lisaniyyat ASR - CNN - LSTM - Clean speech - Noisy speech - CNN-LSTM - DNN - SNR
title	Combined CNN-LSTM for Enhancing Clean and Noisy Speech Recognition
title_full	Combined CNN-LSTM for Enhancing Clean and Noisy Speech Recognition
title_fullStr	Combined CNN-LSTM for Enhancing Clean and Noisy Speech Recognition
title_full_unstemmed	Combined CNN-LSTM for Enhancing Clean and Noisy Speech Recognition
title_short	Combined CNN-LSTM for Enhancing Clean and Noisy Speech Recognition
title_sort	combined cnn lstm for enhancing clean and noisy speech recognition
topic	ASR - CNN - LSTM - Clean speech - Noisy speech - CNN-LSTM - DNN - SNR
url	https://crstdla.dz/ojs/index.php/allj/article/view/732
work_keys_str_mv	AT noussaibadjeffal combinedcnnlstmforenhancingcleanandnoisyspeechrecognition AT djameladdou combinedcnnlstmforenhancingcleanandnoisyspeechrecognition AT hamzakheddar combinedcnnlstmforenhancingcleanandnoisyspeechrecognition AT sidahmedselouani combinedcnnlstmforenhancingcleanandnoisyspeechrecognition

Combined CNN-LSTM for Enhancing Clean and Noisy Speech Recognition

Similar Items