End-to-end audiovisual speech recognition based on attention fusion of SDBN and BLSTM
An end-to-end audiovisual speech recognition algorithm was proposed.In algorithm,a sparse DBN was constructed by introducing mixed l<sub>1/2</sub>norm and l<sub>1</sub>norm into Deep Belief Network with bottleneck structure to extract the spars...
Saved in:
Main Authors: | , , |
---|---|
Format: | Article |
Language: | zho |
Published: |
Beijing Xintong Media Co., Ltd
2019-12-01
|
Series: | Dianxin kexue |
Subjects: | |
Online Access: | http://www.telecomsci.com/zh/article/doi/10.11959/j.issn.1000-0801.2019290/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1841530637531480064 |
---|---|
author | Yiming WANG Ken CHEN Aihaiti ABUDUSALAMU |
author_facet | Yiming WANG Ken CHEN Aihaiti ABUDUSALAMU |
author_sort | Yiming WANG |
collection | DOAJ |
description | An end-to-end audiovisual speech recognition algorithm was proposed.In algorithm,a sparse DBN was constructed by introducing mixed l<sub>1/2</sub>norm and l<sub>1</sub>norm into Deep Belief Network with bottleneck structure to extract the sparse bottleneck features,so as to reduce the dimension of data features,and then a BLSTM was used to model the feature in time series.Then,a attention mechanism was used to align and fuse the lip visual information and audio auditory information automatically.Finally,the fused audiovisual information was classified and identified by a BLSTM with a Softmax layer attached.Experiments show that the algorithm can effectively identify visual and auditory information,and has good recognition rate and robustness in similar algorithms. |
format | Article |
id | doaj-art-fb4c49696475486e80ec2c68012af815 |
institution | Kabale University |
issn | 1000-0801 |
language | zho |
publishDate | 2019-12-01 |
publisher | Beijing Xintong Media Co., Ltd |
record_format | Article |
series | Dianxin kexue |
spelling | doaj-art-fb4c49696475486e80ec2c68012af8152025-01-15T03:01:54ZzhoBeijing Xintong Media Co., LtdDianxin kexue1000-08012019-12-0135798959585627End-to-end audiovisual speech recognition based on attention fusion of SDBN and BLSTMYiming WANGKen CHENAihaiti ABUDUSALAMUAn end-to-end audiovisual speech recognition algorithm was proposed.In algorithm,a sparse DBN was constructed by introducing mixed l<sub>1/2</sub>norm and l<sub>1</sub>norm into Deep Belief Network with bottleneck structure to extract the sparse bottleneck features,so as to reduce the dimension of data features,and then a BLSTM was used to model the feature in time series.Then,a attention mechanism was used to align and fuse the lip visual information and audio auditory information automatically.Finally,the fused audiovisual information was classified and identified by a BLSTM with a Softmax layer attached.Experiments show that the algorithm can effectively identify visual and auditory information,and has good recognition rate and robustness in similar algorithms.http://www.telecomsci.com/zh/article/doi/10.11959/j.issn.1000-0801.2019290/end-to-endaudiovisual speech recognitionsparse bottleneck featuresattention mechanism |
spellingShingle | Yiming WANG Ken CHEN Aihaiti ABUDUSALAMU End-to-end audiovisual speech recognition based on attention fusion of SDBN and BLSTM Dianxin kexue end-to-end audiovisual speech recognition sparse bottleneck features attention mechanism |
title | End-to-end audiovisual speech recognition based on attention fusion of SDBN and BLSTM |
title_full | End-to-end audiovisual speech recognition based on attention fusion of SDBN and BLSTM |
title_fullStr | End-to-end audiovisual speech recognition based on attention fusion of SDBN and BLSTM |
title_full_unstemmed | End-to-end audiovisual speech recognition based on attention fusion of SDBN and BLSTM |
title_short | End-to-end audiovisual speech recognition based on attention fusion of SDBN and BLSTM |
title_sort | end to end audiovisual speech recognition based on attention fusion of sdbn and blstm |
topic | end-to-end audiovisual speech recognition sparse bottleneck features attention mechanism |
url | http://www.telecomsci.com/zh/article/doi/10.11959/j.issn.1000-0801.2019290/ |
work_keys_str_mv | AT yimingwang endtoendaudiovisualspeechrecognitionbasedonattentionfusionofsdbnandblstm AT kenchen endtoendaudiovisualspeechrecognitionbasedonattentionfusionofsdbnandblstm AT aihaitiabudusalamu endtoendaudiovisualspeechrecognitionbasedonattentionfusionofsdbnandblstm |