Dense-Fusion2Net a more efficient and lightweight short speech speaker recognition system with time-frequency channel attention
Abstract In short speech situations, the performance of existing speaker recognition systems degrades significantly due to factors such as short speech segment length, scarce speaker identity information, and noise interference. In this paper, a short speech speaker recognition system based on Dense...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2025-03-01
|
| Series: | Scientific Reports |
| Subjects: | |
| Online Access: | https://doi.org/10.1038/s41598-025-93873-x |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849389973236613120 |
|---|---|
| author | Fei Deng Rui Huang Peifan Jiang Lin Yu Lihong Deng |
| author_facet | Fei Deng Rui Huang Peifan Jiang Lin Yu Lihong Deng |
| author_sort | Fei Deng |
| collection | DOAJ |
| description | Abstract In short speech situations, the performance of existing speaker recognition systems degrades significantly due to factors such as short speech segment length, scarce speaker identity information, and noise interference. In this paper, a short speech speaker recognition system based on Dense-Fusion2Net and the Time-Frequency Channel Attention (TFCA) is proposed to address the problems of current short speech speaker recognition systems. We propose the Dense-Fusion2Net network architecture to more efficiently utilize the limited acoustic features in short speech segments. We designed the Time-Frequency Channel Attention (TFCA). It can effectively learn the relationship between time and frequency domains and channels, and enhance the global feature extraction capability of the network. We conducted validation experiments using the publicly available dataset Voxceleb. The experimental results show that the proposed Dense-Fusion2Net and the TFCA attention exhibit higher performance and better robustness in short speech situations. In addition, we conducted experiments with different window lengths for short speech and obtained the most suitable window length for short speech. |
| format | Article |
| id | doaj-art-9a79e820b65545d48b6ca3a899c132df |
| institution | Kabale University |
| issn | 2045-2322 |
| language | English |
| publishDate | 2025-03-01 |
| publisher | Nature Portfolio |
| record_format | Article |
| series | Scientific Reports |
| spelling | doaj-art-9a79e820b65545d48b6ca3a899c132df2025-08-20T03:41:47ZengNature PortfolioScientific Reports2045-23222025-03-0115111510.1038/s41598-025-93873-xDense-Fusion2Net a more efficient and lightweight short speech speaker recognition system with time-frequency channel attentionFei Deng0Rui Huang1Peifan Jiang2Lin Yu3Lihong Deng4College of Computer Science and Cyber Security, Chengdu University of TechnologyCollege of Computer Science and Cyber Security, Chengdu University of TechnologyCollege of Geophysics, Chengdu University of TechnologyCollege of Computer Science and Cyber Security, Chengdu University of TechnologySchool of Computing and Artificial Intelligence, Southwest Jiaotong UniversityAbstract In short speech situations, the performance of existing speaker recognition systems degrades significantly due to factors such as short speech segment length, scarce speaker identity information, and noise interference. In this paper, a short speech speaker recognition system based on Dense-Fusion2Net and the Time-Frequency Channel Attention (TFCA) is proposed to address the problems of current short speech speaker recognition systems. We propose the Dense-Fusion2Net network architecture to more efficiently utilize the limited acoustic features in short speech segments. We designed the Time-Frequency Channel Attention (TFCA). It can effectively learn the relationship between time and frequency domains and channels, and enhance the global feature extraction capability of the network. We conducted validation experiments using the publicly available dataset Voxceleb. The experimental results show that the proposed Dense-Fusion2Net and the TFCA attention exhibit higher performance and better robustness in short speech situations. In addition, we conducted experiments with different window lengths for short speech and obtained the most suitable window length for short speech.https://doi.org/10.1038/s41598-025-93873-xShort speech speaker recognitionDense-Fusion2NetAttention mechanismTime-Frequency Channel Attention (TFCA) |
| spellingShingle | Fei Deng Rui Huang Peifan Jiang Lin Yu Lihong Deng Dense-Fusion2Net a more efficient and lightweight short speech speaker recognition system with time-frequency channel attention Scientific Reports Short speech speaker recognition Dense-Fusion2Net Attention mechanism Time-Frequency Channel Attention (TFCA) |
| title | Dense-Fusion2Net a more efficient and lightweight short speech speaker recognition system with time-frequency channel attention |
| title_full | Dense-Fusion2Net a more efficient and lightweight short speech speaker recognition system with time-frequency channel attention |
| title_fullStr | Dense-Fusion2Net a more efficient and lightweight short speech speaker recognition system with time-frequency channel attention |
| title_full_unstemmed | Dense-Fusion2Net a more efficient and lightweight short speech speaker recognition system with time-frequency channel attention |
| title_short | Dense-Fusion2Net a more efficient and lightweight short speech speaker recognition system with time-frequency channel attention |
| title_sort | dense fusion2net a more efficient and lightweight short speech speaker recognition system with time frequency channel attention |
| topic | Short speech speaker recognition Dense-Fusion2Net Attention mechanism Time-Frequency Channel Attention (TFCA) |
| url | https://doi.org/10.1038/s41598-025-93873-x |
| work_keys_str_mv | AT feideng densefusion2netamoreefficientandlightweightshortspeechspeakerrecognitionsystemwithtimefrequencychannelattention AT ruihuang densefusion2netamoreefficientandlightweightshortspeechspeakerrecognitionsystemwithtimefrequencychannelattention AT peifanjiang densefusion2netamoreefficientandlightweightshortspeechspeakerrecognitionsystemwithtimefrequencychannelattention AT linyu densefusion2netamoreefficientandlightweightshortspeechspeakerrecognitionsystemwithtimefrequencychannelattention AT lihongdeng densefusion2netamoreefficientandlightweightshortspeechspeakerrecognitionsystemwithtimefrequencychannelattention |