Combined CNN-LSTM for Enhancing Clean and Noisy Speech Recognition
This paper presents a hybrid Convolutional Neural Network-Long Short-Term Memory (CNN-LSTM) approach for Automatic Speech Recognition (ASR) using deep learning techniques on the Aurora-2 dataset. The dataset includes both clean and multi-condition modes, encompassing four noise scenarios : subway,...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | Arabic |
| Published: |
Scientific and Technological Research Center for the Development of the Arabic Language
2024-12-01
|
| Series: | Al-Lisaniyyat |
| Subjects: | |
| Online Access: | https://crstdla.dz/ojs/index.php/allj/article/view/732 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850078422033760256 |
|---|---|
| author | Noussaiba Djeffal Djamel Addou Hamza Kheddar Sid Ahmed Selouani |
| author_facet | Noussaiba Djeffal Djamel Addou Hamza Kheddar Sid Ahmed Selouani |
| author_sort | Noussaiba Djeffal |
| collection | DOAJ |
| description |
This paper presents a hybrid Convolutional Neural Network-Long Short-Term Memory (CNN-LSTM) approach for Automatic Speech Recognition (ASR) using deep learning techniques on the Aurora-2 dataset. The dataset includes both clean and multi-condition modes, encompassing four noise scenarios : subway, babble, car, and exhibition hall, each evaluated at different signal-to-noise ratios (SNRs), and clean condition, and the results are compared with those from the ASC-10 dataset and the ESC-10 dataset. The problem addressed is the need for robust ASR models that perform well in both clean and noisy environments. The aim of utilizing the CNN-LSTM architecture is to enhance the recognition performance by combining the strengths of CNNs and LSTMs, rather than relying on either CNNs or LSTMs alone. Experimental results demonstrate that the combined CNN-LSTM model achieves superior classification performance, in clean environments on the Aurora2 dataset, attaining an accuracy of 97.96%, surpassing the individual CNN and LSTM models, which achieved 97.21% and 96.06%, respectively. In noisy conditions, the hybrid model also outperforms the standalone models, with an accuracy of 90.72%, compared to 90.12% for CNN and 86.12% for LSTM. These findings indicate that the CNN-LSTM model is more effective in handling various noise conditions and improving overall ASR accuracy.
|
| format | Article |
| id | doaj-art-4c55892a73ac4bcf9a20a7cf922ead31 |
| institution | DOAJ |
| issn | 1112-4393 2588-2031 |
| language | Arabic |
| publishDate | 2024-12-01 |
| publisher | Scientific and Technological Research Center for the Development of the Arabic Language |
| record_format | Article |
| series | Al-Lisaniyyat |
| spelling | doaj-art-4c55892a73ac4bcf9a20a7cf922ead312025-08-20T02:45:33ZaraScientific and Technological Research Center for the Development of the Arabic LanguageAl-Lisaniyyat1112-43932588-20312024-12-0130210.61850/allj.v30i2.732Combined CNN-LSTM for Enhancing Clean and Noisy Speech Recognition Noussaiba Djeffal0Djamel Addou1Hamza Kheddar2Sid Ahmed Selouani3Speech and Signal Processing Laboratory University of Sciences and Technology, USTHB Algiers, Speech and Signal Processing Laboratory University of Sciences and Technology, USTHB Algiers, LSEA Laboratory, dept. Electrical engineering University of MEDEA MedeaResearch Laboratory in Human-System Interaction University of Moncton, Shippagan Campus Shippagan This paper presents a hybrid Convolutional Neural Network-Long Short-Term Memory (CNN-LSTM) approach for Automatic Speech Recognition (ASR) using deep learning techniques on the Aurora-2 dataset. The dataset includes both clean and multi-condition modes, encompassing four noise scenarios : subway, babble, car, and exhibition hall, each evaluated at different signal-to-noise ratios (SNRs), and clean condition, and the results are compared with those from the ASC-10 dataset and the ESC-10 dataset. The problem addressed is the need for robust ASR models that perform well in both clean and noisy environments. The aim of utilizing the CNN-LSTM architecture is to enhance the recognition performance by combining the strengths of CNNs and LSTMs, rather than relying on either CNNs or LSTMs alone. Experimental results demonstrate that the combined CNN-LSTM model achieves superior classification performance, in clean environments on the Aurora2 dataset, attaining an accuracy of 97.96%, surpassing the individual CNN and LSTM models, which achieved 97.21% and 96.06%, respectively. In noisy conditions, the hybrid model also outperforms the standalone models, with an accuracy of 90.72%, compared to 90.12% for CNN and 86.12% for LSTM. These findings indicate that the CNN-LSTM model is more effective in handling various noise conditions and improving overall ASR accuracy. https://crstdla.dz/ojs/index.php/allj/article/view/732ASR - CNN - LSTM - Clean speech - Noisy speech - CNN-LSTM - DNN - SNR |
| spellingShingle | Noussaiba Djeffal Djamel Addou Hamza Kheddar Sid Ahmed Selouani Combined CNN-LSTM for Enhancing Clean and Noisy Speech Recognition Al-Lisaniyyat ASR - CNN - LSTM - Clean speech - Noisy speech - CNN-LSTM - DNN - SNR |
| title | Combined CNN-LSTM for Enhancing Clean and Noisy Speech Recognition |
| title_full | Combined CNN-LSTM for Enhancing Clean and Noisy Speech Recognition |
| title_fullStr | Combined CNN-LSTM for Enhancing Clean and Noisy Speech Recognition |
| title_full_unstemmed | Combined CNN-LSTM for Enhancing Clean and Noisy Speech Recognition |
| title_short | Combined CNN-LSTM for Enhancing Clean and Noisy Speech Recognition |
| title_sort | combined cnn lstm for enhancing clean and noisy speech recognition |
| topic | ASR - CNN - LSTM - Clean speech - Noisy speech - CNN-LSTM - DNN - SNR |
| url | https://crstdla.dz/ojs/index.php/allj/article/view/732 |
| work_keys_str_mv | AT noussaibadjeffal combinedcnnlstmforenhancingcleanandnoisyspeechrecognition AT djameladdou combinedcnnlstmforenhancingcleanandnoisyspeechrecognition AT hamzakheddar combinedcnnlstmforenhancingcleanandnoisyspeechrecognition AT sidahmedselouani combinedcnnlstmforenhancingcleanandnoisyspeechrecognition |