DLD: An Optimized Chinese Speech Recognition Model Based on Deep Learning

Speech recognition technology has played an indispensable role in realizing human-computer intelligent interaction. However, most of the current Chinese speech recognition systems are provided online or offline models with low accuracy and poor performance. To improve the performance of offline Chin...

Full description

Saved in:

Bibliographic Details
Main Authors:	Hong Lei, Yue Xiao, Yanchun Liang, Dalin Li, Heow Pueh Lee
Format:	Article
Language:	English
Published:	Wiley 2022-01-01
Series:	Complexity
Online Access:	http://dx.doi.org/10.1155/2022/6927400
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832553519607971840
author	Hong Lei Yue Xiao Yanchun Liang Dalin Li Heow Pueh Lee
author_facet	Hong Lei Yue Xiao Yanchun Liang Dalin Li Heow Pueh Lee
author_sort	Hong Lei
collection	DOAJ
description	Speech recognition technology has played an indispensable role in realizing human-computer intelligent interaction. However, most of the current Chinese speech recognition systems are provided online or offline models with low accuracy and poor performance. To improve the performance of offline Chinese speech recognition, we propose a hybrid acoustic model of deep convolutional neural network, long short-term memory, and deep neural network (DCNN-LSTM-DNN, DLD). This model utilizes DCNN to reduce frequency variation and adds a batch normalization (BN) layer after its convolutional layer to ensure the stability of data distribution, and then use LSTM to effectively solve the gradient vanishing problem. Finally, the fully connected structure of DNN is utilized to efficiently map the input features into a separable space, which is helpful for data classification. Therefore, leveraging the strengths of DCNN, LSTM, and DNN by combining them into a unified architecture can effectively improve speech recognition performance. Our model was tested on the open Chinese speech database THCHS-30 released by the Center for Speech and Language Technology (CSLT) of Tsinghua University, and it was concluded that the DLD model with 3 layers of LSTM and 3 layers of DNN had the best performance, reaching 13.49% of words error rate (WER).
format	Article
id	doaj-art-0d77ef54025b452b90126f22ba7c8436
institution	Kabale University
issn	1099-0526
language	English
publishDate	2022-01-01
publisher	Wiley
record_format	Article
series	Complexity
spelling	doaj-art-0d77ef54025b452b90126f22ba7c84362025-02-03T05:53:49ZengWileyComplexity1099-05262022-01-01202210.1155/2022/6927400DLD: An Optimized Chinese Speech Recognition Model Based on Deep LearningHong Lei0Yue Xiao1Yanchun Liang2Dalin Li3Heow Pueh Lee4Faculty of Data ScienceZhuhai Laboratory of Key Laboratory for Symbol Computation and Knowledge Engineering of Ministry of EducationZhuhai Laboratory of Key Laboratory for Symbol Computation and Knowledge Engineering of Ministry of EducationZhuhai Laboratory of Key Laboratory for Symbol Computation and Knowledge Engineering of Ministry of EducationDepartment of Mechanical EngineeringSpeech recognition technology has played an indispensable role in realizing human-computer intelligent interaction. However, most of the current Chinese speech recognition systems are provided online or offline models with low accuracy and poor performance. To improve the performance of offline Chinese speech recognition, we propose a hybrid acoustic model of deep convolutional neural network, long short-term memory, and deep neural network (DCNN-LSTM-DNN, DLD). This model utilizes DCNN to reduce frequency variation and adds a batch normalization (BN) layer after its convolutional layer to ensure the stability of data distribution, and then use LSTM to effectively solve the gradient vanishing problem. Finally, the fully connected structure of DNN is utilized to efficiently map the input features into a separable space, which is helpful for data classification. Therefore, leveraging the strengths of DCNN, LSTM, and DNN by combining them into a unified architecture can effectively improve speech recognition performance. Our model was tested on the open Chinese speech database THCHS-30 released by the Center for Speech and Language Technology (CSLT) of Tsinghua University, and it was concluded that the DLD model with 3 layers of LSTM and 3 layers of DNN had the best performance, reaching 13.49% of words error rate (WER).http://dx.doi.org/10.1155/2022/6927400
spellingShingle	Hong Lei Yue Xiao Yanchun Liang Dalin Li Heow Pueh Lee DLD: An Optimized Chinese Speech Recognition Model Based on Deep Learning Complexity
title	DLD: An Optimized Chinese Speech Recognition Model Based on Deep Learning
title_full	DLD: An Optimized Chinese Speech Recognition Model Based on Deep Learning
title_fullStr	DLD: An Optimized Chinese Speech Recognition Model Based on Deep Learning
title_full_unstemmed	DLD: An Optimized Chinese Speech Recognition Model Based on Deep Learning
title_short	DLD: An Optimized Chinese Speech Recognition Model Based on Deep Learning
title_sort	dld an optimized chinese speech recognition model based on deep learning
url	http://dx.doi.org/10.1155/2022/6927400
work_keys_str_mv	AT honglei dldanoptimizedchinesespeechrecognitionmodelbasedondeeplearning AT yuexiao dldanoptimizedchinesespeechrecognitionmodelbasedondeeplearning AT yanchunliang dldanoptimizedchinesespeechrecognitionmodelbasedondeeplearning AT dalinli dldanoptimizedchinesespeechrecognitionmodelbasedondeeplearning AT heowpuehlee dldanoptimizedchinesespeechrecognitionmodelbasedondeeplearning

DLD: An Optimized Chinese Speech Recognition Model Based on Deep Learning

Similar Items