Dense-Fusion2Net a more efficient and lightweight short speech speaker recognition system with time-frequency channel attention

Abstract In short speech situations, the performance of existing speaker recognition systems degrades significantly due to factors such as short speech segment length, scarce speaker identity information, and noise interference. In this paper, a short speech speaker recognition system based on Dense...

Full description

Saved in:
Bibliographic Details
Main Authors: Fei Deng, Rui Huang, Peifan Jiang, Lin Yu, Lihong Deng
Format: Article
Language:English
Published: Nature Portfolio 2025-03-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-025-93873-x
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract In short speech situations, the performance of existing speaker recognition systems degrades significantly due to factors such as short speech segment length, scarce speaker identity information, and noise interference. In this paper, a short speech speaker recognition system based on Dense-Fusion2Net and the Time-Frequency Channel Attention (TFCA) is proposed to address the problems of current short speech speaker recognition systems. We propose the Dense-Fusion2Net network architecture to more efficiently utilize the limited acoustic features in short speech segments. We designed the Time-Frequency Channel Attention (TFCA). It can effectively learn the relationship between time and frequency domains and channels, and enhance the global feature extraction capability of the network. We conducted validation experiments using the publicly available dataset Voxceleb. The experimental results show that the proposed Dense-Fusion2Net and the TFCA attention exhibit higher performance and better robustness in short speech situations. In addition, we conducted experiments with different window lengths for short speech and obtained the most suitable window length for short speech.
ISSN:2045-2322