Combined CNN-LSTM for Enhancing Clean and Noisy Speech Recognition

This paper presents a hybrid Convolutional Neural Network-Long Short-Term Memory (CNN-LSTM) approach for Automatic Speech Recognition (ASR) using deep learning techniques on the Aurora-2 dataset. The dataset includes both clean and multi-condition modes, encompassing four noise scenarios : subway,...

Full description

Saved in:
Bibliographic Details
Main Authors: Noussaiba Djeffal, Djamel Addou, Hamza Kheddar, Sid Ahmed Selouani
Format: Article
Language:Arabic
Published: Scientific and Technological Research Center for the Development of the Arabic Language 2024-12-01
Series:Al-Lisaniyyat
Subjects:
Online Access:https://crstdla.dz/ojs/index.php/allj/article/view/732
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850078422033760256
author Noussaiba Djeffal
Djamel Addou
Hamza Kheddar
Sid Ahmed Selouani
author_facet Noussaiba Djeffal
Djamel Addou
Hamza Kheddar
Sid Ahmed Selouani
author_sort Noussaiba Djeffal
collection DOAJ
description This paper presents a hybrid Convolutional Neural Network-Long Short-Term Memory (CNN-LSTM) approach for Automatic Speech Recognition (ASR) using deep learning techniques on the Aurora-2 dataset. The dataset includes both clean and multi-condition modes, encompassing four noise scenarios : subway, babble, car, and exhibition hall, each evaluated at different signal-to-noise ratios (SNRs), and clean condition, and the results are compared with those from the ASC-10 dataset and the ESC-10 dataset. The problem addressed is the need for robust ASR models that perform well in both clean and noisy environments. The aim of utilizing the CNN-LSTM architecture is to enhance the recognition performance by combining the strengths of CNNs and LSTMs, rather than relying on either CNNs or LSTMs alone. Experimental results demonstrate that the combined CNN-LSTM model achieves superior classification performance, in clean environments on the Aurora2 dataset, attaining an accuracy of 97.96%, surpassing the individual CNN and LSTM models, which achieved 97.21% and 96.06%, respectively. In noisy conditions, the hybrid model also outperforms the standalone models, with an accuracy of 90.72%, compared to 90.12% for CNN and 86.12% for LSTM. These findings indicate that the CNN-LSTM model is more effective in handling various noise conditions and improving overall ASR accuracy.
format Article
id doaj-art-4c55892a73ac4bcf9a20a7cf922ead31
institution DOAJ
issn 1112-4393
2588-2031
language Arabic
publishDate 2024-12-01
publisher Scientific and Technological Research Center for the Development of the Arabic Language
record_format Article
series Al-Lisaniyyat
spelling doaj-art-4c55892a73ac4bcf9a20a7cf922ead312025-08-20T02:45:33ZaraScientific and Technological Research Center for the Development of the Arabic LanguageAl-Lisaniyyat1112-43932588-20312024-12-0130210.61850/allj.v30i2.732Combined CNN-LSTM for Enhancing Clean and Noisy Speech Recognition Noussaiba Djeffal0Djamel Addou1Hamza Kheddar2Sid Ahmed Selouani3Speech and Signal Processing Laboratory University of Sciences and Technology, USTHB Algiers, Speech and Signal Processing Laboratory University of Sciences and Technology, USTHB Algiers, LSEA Laboratory, dept. Electrical engineering University of MEDEA MedeaResearch Laboratory in Human-System Interaction University of Moncton, Shippagan Campus Shippagan This paper presents a hybrid Convolutional Neural Network-Long Short-Term Memory (CNN-LSTM) approach for Automatic Speech Recognition (ASR) using deep learning techniques on the Aurora-2 dataset. The dataset includes both clean and multi-condition modes, encompassing four noise scenarios : subway, babble, car, and exhibition hall, each evaluated at different signal-to-noise ratios (SNRs), and clean condition, and the results are compared with those from the ASC-10 dataset and the ESC-10 dataset. The problem addressed is the need for robust ASR models that perform well in both clean and noisy environments. The aim of utilizing the CNN-LSTM architecture is to enhance the recognition performance by combining the strengths of CNNs and LSTMs, rather than relying on either CNNs or LSTMs alone. Experimental results demonstrate that the combined CNN-LSTM model achieves superior classification performance, in clean environments on the Aurora2 dataset, attaining an accuracy of 97.96%, surpassing the individual CNN and LSTM models, which achieved 97.21% and 96.06%, respectively. In noisy conditions, the hybrid model also outperforms the standalone models, with an accuracy of 90.72%, compared to 90.12% for CNN and 86.12% for LSTM. These findings indicate that the CNN-LSTM model is more effective in handling various noise conditions and improving overall ASR accuracy. https://crstdla.dz/ojs/index.php/allj/article/view/732ASR - CNN - LSTM - Clean speech - Noisy speech - CNN-LSTM - DNN - SNR
spellingShingle Noussaiba Djeffal
Djamel Addou
Hamza Kheddar
Sid Ahmed Selouani
Combined CNN-LSTM for Enhancing Clean and Noisy Speech Recognition
Al-Lisaniyyat
ASR - CNN - LSTM - Clean speech - Noisy speech - CNN-LSTM - DNN - SNR
title Combined CNN-LSTM for Enhancing Clean and Noisy Speech Recognition
title_full Combined CNN-LSTM for Enhancing Clean and Noisy Speech Recognition
title_fullStr Combined CNN-LSTM for Enhancing Clean and Noisy Speech Recognition
title_full_unstemmed Combined CNN-LSTM for Enhancing Clean and Noisy Speech Recognition
title_short Combined CNN-LSTM for Enhancing Clean and Noisy Speech Recognition
title_sort combined cnn lstm for enhancing clean and noisy speech recognition
topic ASR - CNN - LSTM - Clean speech - Noisy speech - CNN-LSTM - DNN - SNR
url https://crstdla.dz/ojs/index.php/allj/article/view/732
work_keys_str_mv AT noussaibadjeffal combinedcnnlstmforenhancingcleanandnoisyspeechrecognition
AT djameladdou combinedcnnlstmforenhancingcleanandnoisyspeechrecognition
AT hamzakheddar combinedcnnlstmforenhancingcleanandnoisyspeechrecognition
AT sidahmedselouani combinedcnnlstmforenhancingcleanandnoisyspeechrecognition