Comparative Study of Visual Feature for Bimodal Hindi Speech Recognition

In building speech recognition based applications, robustness to different noisy background condition is an important challenge. In this paper bimodal approach is proposed to improve the robustness of Hindi speech recognition system. Also an importance of different types of visual features is studie...

Full description

Saved in:
Bibliographic Details
Main Authors: Prashant UPADHYAYA, Omar FAROOQ, Musiur Raza ABIDI, Priyanka VARSHNEY
Format: Article
Language:English
Published: Institute of Fundamental Technological Research Polish Academy of Sciences 2015-09-01
Series:Archives of Acoustics
Subjects:
Online Access:https://acoustics.ippt.pan.pl/index.php/aa/article/view/1607
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849716034234220544
author Prashant UPADHYAYA
Omar FAROOQ
Musiur Raza ABIDI
Priyanka VARSHNEY
author_facet Prashant UPADHYAYA
Omar FAROOQ
Musiur Raza ABIDI
Priyanka VARSHNEY
author_sort Prashant UPADHYAYA
collection DOAJ
description In building speech recognition based applications, robustness to different noisy background condition is an important challenge. In this paper bimodal approach is proposed to improve the robustness of Hindi speech recognition system. Also an importance of different types of visual features is studied for audio visual automatic speech recognition (AVASR) system under diverse noisy audio conditions. Four sets of visual feature based on Two-Dimensional Discrete Cosine Transform feature (2D-DCT), Principal Component Analysis (PCA), Two-Dimensional Discrete Wavelet Transform followed by DCT (2D-DWT-DCT) and Two-Dimensional Discrete Wavelet Transform followed by PCA (2D-DWT-PCA) are reported. The audio features are extracted using Mel Frequency Cepstral coefficients (MFCC) followed by static and dynamic feature. Overall, 48 features, i.e. 39 audio features and 9 visual features are used for measuring the performance of the AVASR system. Also, the performance of the AVASR using noisy speech signal generated by using NOISEX database is evaluated for different Signal to Noise ratio (SNR: 30 dB to -10 dB) using Aligarh Muslim University Audio Visual (AMUAV) Hindi corpus. AMUAV corpus is Hindi continuous speech high quality audio visual databases of Hindi sentences spoken by different subjects.
format Article
id doaj-art-c23b904789564468b78fc74b1fe27e7f
institution DOAJ
issn 0137-5075
2300-262X
language English
publishDate 2015-09-01
publisher Institute of Fundamental Technological Research Polish Academy of Sciences
record_format Article
series Archives of Acoustics
spelling doaj-art-c23b904789564468b78fc74b1fe27e7f2025-08-20T03:13:08ZengInstitute of Fundamental Technological Research Polish Academy of SciencesArchives of Acoustics0137-50752300-262X2015-09-0140410.1515/aoa-2015-0061Comparative Study of Visual Feature for Bimodal Hindi Speech RecognitionPrashant UPADHYAYA0Omar FAROOQ1Musiur Raza ABIDI2Priyanka VARSHNEY3Aligarh Muslim UniversityAligarh Muslim UniversityAligarh Muslim UniversityMindz TechnologyIn building speech recognition based applications, robustness to different noisy background condition is an important challenge. In this paper bimodal approach is proposed to improve the robustness of Hindi speech recognition system. Also an importance of different types of visual features is studied for audio visual automatic speech recognition (AVASR) system under diverse noisy audio conditions. Four sets of visual feature based on Two-Dimensional Discrete Cosine Transform feature (2D-DCT), Principal Component Analysis (PCA), Two-Dimensional Discrete Wavelet Transform followed by DCT (2D-DWT-DCT) and Two-Dimensional Discrete Wavelet Transform followed by PCA (2D-DWT-PCA) are reported. The audio features are extracted using Mel Frequency Cepstral coefficients (MFCC) followed by static and dynamic feature. Overall, 48 features, i.e. 39 audio features and 9 visual features are used for measuring the performance of the AVASR system. Also, the performance of the AVASR using noisy speech signal generated by using NOISEX database is evaluated for different Signal to Noise ratio (SNR: 30 dB to -10 dB) using Aligarh Muslim University Audio Visual (AMUAV) Hindi corpus. AMUAV corpus is Hindi continuous speech high quality audio visual databases of Hindi sentences spoken by different subjects.https://acoustics.ippt.pan.pl/index.php/aa/article/view/1607Aligarh Muslim University audio visual corpusAVASRbimodalDCTDWT.
spellingShingle Prashant UPADHYAYA
Omar FAROOQ
Musiur Raza ABIDI
Priyanka VARSHNEY
Comparative Study of Visual Feature for Bimodal Hindi Speech Recognition
Archives of Acoustics
Aligarh Muslim University audio visual corpus
AVASR
bimodal
DCT
DWT.
title Comparative Study of Visual Feature for Bimodal Hindi Speech Recognition
title_full Comparative Study of Visual Feature for Bimodal Hindi Speech Recognition
title_fullStr Comparative Study of Visual Feature for Bimodal Hindi Speech Recognition
title_full_unstemmed Comparative Study of Visual Feature for Bimodal Hindi Speech Recognition
title_short Comparative Study of Visual Feature for Bimodal Hindi Speech Recognition
title_sort comparative study of visual feature for bimodal hindi speech recognition
topic Aligarh Muslim University audio visual corpus
AVASR
bimodal
DCT
DWT.
url https://acoustics.ippt.pan.pl/index.php/aa/article/view/1607
work_keys_str_mv AT prashantupadhyaya comparativestudyofvisualfeatureforbimodalhindispeechrecognition
AT omarfarooq comparativestudyofvisualfeatureforbimodalhindispeechrecognition
AT musiurrazaabidi comparativestudyofvisualfeatureforbimodalhindispeechrecognition
AT priyankavarshney comparativestudyofvisualfeatureforbimodalhindispeechrecognition