Dense-Fusion2Net a more efficient and lightweight short speech speaker recognition system with time-frequency channel attention

Abstract In short speech situations, the performance of existing speaker recognition systems degrades significantly due to factors such as short speech segment length, scarce speaker identity information, and noise interference. In this paper, a short speech speaker recognition system based on Dense...

Full description

Saved in:
Bibliographic Details
Main Authors: Fei Deng, Rui Huang, Peifan Jiang, Lin Yu, Lihong Deng
Format: Article
Language:English
Published: Nature Portfolio 2025-03-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-025-93873-x
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849389973236613120
author Fei Deng
Rui Huang
Peifan Jiang
Lin Yu
Lihong Deng
author_facet Fei Deng
Rui Huang
Peifan Jiang
Lin Yu
Lihong Deng
author_sort Fei Deng
collection DOAJ
description Abstract In short speech situations, the performance of existing speaker recognition systems degrades significantly due to factors such as short speech segment length, scarce speaker identity information, and noise interference. In this paper, a short speech speaker recognition system based on Dense-Fusion2Net and the Time-Frequency Channel Attention (TFCA) is proposed to address the problems of current short speech speaker recognition systems. We propose the Dense-Fusion2Net network architecture to more efficiently utilize the limited acoustic features in short speech segments. We designed the Time-Frequency Channel Attention (TFCA). It can effectively learn the relationship between time and frequency domains and channels, and enhance the global feature extraction capability of the network. We conducted validation experiments using the publicly available dataset Voxceleb. The experimental results show that the proposed Dense-Fusion2Net and the TFCA attention exhibit higher performance and better robustness in short speech situations. In addition, we conducted experiments with different window lengths for short speech and obtained the most suitable window length for short speech.
format Article
id doaj-art-9a79e820b65545d48b6ca3a899c132df
institution Kabale University
issn 2045-2322
language English
publishDate 2025-03-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-9a79e820b65545d48b6ca3a899c132df2025-08-20T03:41:47ZengNature PortfolioScientific Reports2045-23222025-03-0115111510.1038/s41598-025-93873-xDense-Fusion2Net a more efficient and lightweight short speech speaker recognition system with time-frequency channel attentionFei Deng0Rui Huang1Peifan Jiang2Lin Yu3Lihong Deng4College of Computer Science and Cyber Security, Chengdu University of TechnologyCollege of Computer Science and Cyber Security, Chengdu University of TechnologyCollege of Geophysics, Chengdu University of TechnologyCollege of Computer Science and Cyber Security, Chengdu University of TechnologySchool of Computing and Artificial Intelligence, Southwest Jiaotong UniversityAbstract In short speech situations, the performance of existing speaker recognition systems degrades significantly due to factors such as short speech segment length, scarce speaker identity information, and noise interference. In this paper, a short speech speaker recognition system based on Dense-Fusion2Net and the Time-Frequency Channel Attention (TFCA) is proposed to address the problems of current short speech speaker recognition systems. We propose the Dense-Fusion2Net network architecture to more efficiently utilize the limited acoustic features in short speech segments. We designed the Time-Frequency Channel Attention (TFCA). It can effectively learn the relationship between time and frequency domains and channels, and enhance the global feature extraction capability of the network. We conducted validation experiments using the publicly available dataset Voxceleb. The experimental results show that the proposed Dense-Fusion2Net and the TFCA attention exhibit higher performance and better robustness in short speech situations. In addition, we conducted experiments with different window lengths for short speech and obtained the most suitable window length for short speech.https://doi.org/10.1038/s41598-025-93873-xShort speech speaker recognitionDense-Fusion2NetAttention mechanismTime-Frequency Channel Attention (TFCA)
spellingShingle Fei Deng
Rui Huang
Peifan Jiang
Lin Yu
Lihong Deng
Dense-Fusion2Net a more efficient and lightweight short speech speaker recognition system with time-frequency channel attention
Scientific Reports
Short speech speaker recognition
Dense-Fusion2Net
Attention mechanism
Time-Frequency Channel Attention (TFCA)
title Dense-Fusion2Net a more efficient and lightweight short speech speaker recognition system with time-frequency channel attention
title_full Dense-Fusion2Net a more efficient and lightweight short speech speaker recognition system with time-frequency channel attention
title_fullStr Dense-Fusion2Net a more efficient and lightweight short speech speaker recognition system with time-frequency channel attention
title_full_unstemmed Dense-Fusion2Net a more efficient and lightweight short speech speaker recognition system with time-frequency channel attention
title_short Dense-Fusion2Net a more efficient and lightweight short speech speaker recognition system with time-frequency channel attention
title_sort dense fusion2net a more efficient and lightweight short speech speaker recognition system with time frequency channel attention
topic Short speech speaker recognition
Dense-Fusion2Net
Attention mechanism
Time-Frequency Channel Attention (TFCA)
url https://doi.org/10.1038/s41598-025-93873-x
work_keys_str_mv AT feideng densefusion2netamoreefficientandlightweightshortspeechspeakerrecognitionsystemwithtimefrequencychannelattention
AT ruihuang densefusion2netamoreefficientandlightweightshortspeechspeakerrecognitionsystemwithtimefrequencychannelattention
AT peifanjiang densefusion2netamoreefficientandlightweightshortspeechspeakerrecognitionsystemwithtimefrequencychannelattention
AT linyu densefusion2netamoreefficientandlightweightshortspeechspeakerrecognitionsystemwithtimefrequencychannelattention
AT lihongdeng densefusion2netamoreefficientandlightweightshortspeechspeakerrecognitionsystemwithtimefrequencychannelattention