End-to-end audiovisual speech recognition based on attention fusion of SDBN and BLSTM

An end-to-end audiovisual speech recognition algorithm was proposed.In algorithm,a sparse DBN was constructed by introducing mixed l<sub>1/2</sub>norm and l<sub>1</sub>norm into Deep Belief Network with bottleneck structure to extract the spars...

Full description

Saved in:

Bibliographic Details
Main Authors:	Yiming WANG, Ken CHEN, Aihaiti ABUDUSALAMU
Format:	Article
Language:	zho
Published:	Beijing Xintong Media Co., Ltd 2019-12-01
Series:	Dianxin kexue
Subjects:	end-to-end audiovisual speech recognition sparse bottleneck features attention mechanism
Online Access:	http://www.telecomsci.com/zh/article/doi/10.11959/j.issn.1000-0801.2019290/
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	An end-to-end audiovisual speech recognition algorithm was proposed.In algorithm,a sparse DBN was constructed by introducing mixed l<sub>1/2</sub>norm and l<sub>1</sub>norm into Deep Belief Network with bottleneck structure to extract the sparse bottleneck features,so as to reduce the dimension of data features,and then a BLSTM was used to model the feature in time series.Then,a attention mechanism was used to align and fuse the lip visual information and audio auditory information automatically.Finally,the fused audiovisual information was classified and identified by a BLSTM with a Softmax layer attached.Experiments show that the algorithm can effectively identify visual and auditory information,and has good recognition rate and robustness in similar algorithms.
ISSN:	1000-0801

End-to-end audiovisual speech recognition based on attention fusion of SDBN and BLSTM

Similar Items