Dense-Fusion2Net a more efficient and lightweight short speech speaker recognition system with time-frequency channel attention
Abstract In short speech situations, the performance of existing speaker recognition systems degrades significantly due to factors such as short speech segment length, scarce speaker identity information, and noise interference. In this paper, a short speech speaker recognition system based on Dense...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2025-03-01
|
| Series: | Scientific Reports |
| Subjects: | |
| Online Access: | https://doi.org/10.1038/s41598-025-93873-x |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Abstract In short speech situations, the performance of existing speaker recognition systems degrades significantly due to factors such as short speech segment length, scarce speaker identity information, and noise interference. In this paper, a short speech speaker recognition system based on Dense-Fusion2Net and the Time-Frequency Channel Attention (TFCA) is proposed to address the problems of current short speech speaker recognition systems. We propose the Dense-Fusion2Net network architecture to more efficiently utilize the limited acoustic features in short speech segments. We designed the Time-Frequency Channel Attention (TFCA). It can effectively learn the relationship between time and frequency domains and channels, and enhance the global feature extraction capability of the network. We conducted validation experiments using the publicly available dataset Voxceleb. The experimental results show that the proposed Dense-Fusion2Net and the TFCA attention exhibit higher performance and better robustness in short speech situations. In addition, we conducted experiments with different window lengths for short speech and obtained the most suitable window length for short speech. |
|---|---|
| ISSN: | 2045-2322 |