Audio-visual speech enhancement with multi-level feature deep fusion under low signal-to-noise ratio

To address the limitations in feature extraction and cross-modal fusion in audio-visual speech enhancement, a multistage deep fusion method was proposed for low signal-to-noise ratio (SNR) conditions. The method consisted of an audio-visual encoding network, a fusion network, and an auditory decodin...

Full description

Saved in:
Bibliographic Details
Main Authors: ZHANG Tianqi, SHEN Xiwen, TANG Juan, TAN Shuang
Format: Article
Language:zho
Published: Editorial Department of Journal on Communications 2025-05-01
Series:Tongxin xuebao
Subjects:
Online Access:http://www.joconline.com.cn/thesisDetails#10.11959/j.issn.1000-436x.2025075
Tags: Add Tag
No Tags, Be the first to tag this record!