End-to-end audiovisual speech recognition based on attention fusion of SDBN and BLSTM

An end-to-end audiovisual speech recognition algorithm was proposed.In algorithm,a sparse DBN was constructed by introducing mixed l<sub>1/2</sub>norm and l<sub>1</sub>norm into Deep Belief Network with bottleneck structure to extract the spars...

Full description

Saved in:

Bibliographic Details
Main Authors:	Yiming WANG, Ken CHEN, Aihaiti ABUDUSALAMU
Format:	Article
Language:	zho
Published:	Beijing Xintong Media Co., Ltd 2019-12-01
Series:	Dianxin kexue
Subjects:	end-to-end audiovisual speech recognition sparse bottleneck features attention mechanism
Online Access:	http://www.telecomsci.com/zh/article/doi/10.11959/j.issn.1000-0801.2019290/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1841530637531480064
author	Yiming WANG Ken CHEN Aihaiti ABUDUSALAMU
author_facet	Yiming WANG Ken CHEN Aihaiti ABUDUSALAMU
author_sort	Yiming WANG
collection	DOAJ
description	An end-to-end audiovisual speech recognition algorithm was proposed.In algorithm,a sparse DBN was constructed by introducing mixed l<sub>1/2</sub>norm and l<sub>1</sub>norm into Deep Belief Network with bottleneck structure to extract the sparse bottleneck features,so as to reduce the dimension of data features,and then a BLSTM was used to model the feature in time series.Then,a attention mechanism was used to align and fuse the lip visual information and audio auditory information automatically.Finally,the fused audiovisual information was classified and identified by a BLSTM with a Softmax layer attached.Experiments show that the algorithm can effectively identify visual and auditory information,and has good recognition rate and robustness in similar algorithms.
format	Article
id	doaj-art-fb4c49696475486e80ec2c68012af815
institution	Kabale University
issn	1000-0801
language	zho
publishDate	2019-12-01
publisher	Beijing Xintong Media Co., Ltd
record_format	Article
series	Dianxin kexue
spelling	doaj-art-fb4c49696475486e80ec2c68012af8152025-01-15T03:01:54ZzhoBeijing Xintong Media Co., LtdDianxin kexue1000-08012019-12-0135798959585627End-to-end audiovisual speech recognition based on attention fusion of SDBN and BLSTMYiming WANGKen CHENAihaiti ABUDUSALAMUAn end-to-end audiovisual speech recognition algorithm was proposed.In algorithm,a sparse DBN was constructed by introducing mixed l<sub>1/2</sub>norm and l<sub>1</sub>norm into Deep Belief Network with bottleneck structure to extract the sparse bottleneck features,so as to reduce the dimension of data features,and then a BLSTM was used to model the feature in time series.Then,a attention mechanism was used to align and fuse the lip visual information and audio auditory information automatically.Finally,the fused audiovisual information was classified and identified by a BLSTM with a Softmax layer attached.Experiments show that the algorithm can effectively identify visual and auditory information,and has good recognition rate and robustness in similar algorithms.http://www.telecomsci.com/zh/article/doi/10.11959/j.issn.1000-0801.2019290/end-to-endaudiovisual speech recognitionsparse bottleneck featuresattention mechanism
spellingShingle	Yiming WANG Ken CHEN Aihaiti ABUDUSALAMU End-to-end audiovisual speech recognition based on attention fusion of SDBN and BLSTM Dianxin kexue end-to-end audiovisual speech recognition sparse bottleneck features attention mechanism
title	End-to-end audiovisual speech recognition based on attention fusion of SDBN and BLSTM
title_full	End-to-end audiovisual speech recognition based on attention fusion of SDBN and BLSTM
title_fullStr	End-to-end audiovisual speech recognition based on attention fusion of SDBN and BLSTM
title_full_unstemmed	End-to-end audiovisual speech recognition based on attention fusion of SDBN and BLSTM
title_short	End-to-end audiovisual speech recognition based on attention fusion of SDBN and BLSTM
title_sort	end to end audiovisual speech recognition based on attention fusion of sdbn and blstm
topic	end-to-end audiovisual speech recognition sparse bottleneck features attention mechanism
url	http://www.telecomsci.com/zh/article/doi/10.11959/j.issn.1000-0801.2019290/
work_keys_str_mv	AT yimingwang endtoendaudiovisualspeechrecognitionbasedonattentionfusionofsdbnandblstm AT kenchen endtoendaudiovisualspeechrecognitionbasedonattentionfusionofsdbnandblstm AT aihaitiabudusalamu endtoendaudiovisualspeechrecognitionbasedonattentionfusionofsdbnandblstm

End-to-end audiovisual speech recognition based on attention fusion of SDBN and BLSTM

Similar Items