TFDense-GAN: a generative adversarial network for single-channel speech enhancement

Abstract Research indicates that utilizing the spectrum in the time–frequency domain plays a crucial role in speech enhancement tasks, as it can better extract audio features and reduce computational consumption. For the speech enhancement methods in the time–frequency domain, the introduction of at...

Full description

Saved in:
Bibliographic Details
Main Authors: Haoxiang Chen, Jinxiu Zhang, Yaogang Fu, Xintong Zhou, Ruilong Wang, Yanyan Xu, Dengfeng Ke
Format: Article
Language:English
Published: SpringerOpen 2025-03-01
Series:EURASIP Journal on Advances in Signal Processing
Subjects:
Online Access:https://doi.org/10.1186/s13634-025-01210-1
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849394483275235328
author Haoxiang Chen
Jinxiu Zhang
Yaogang Fu
Xintong Zhou
Ruilong Wang
Yanyan Xu
Dengfeng Ke
author_facet Haoxiang Chen
Jinxiu Zhang
Yaogang Fu
Xintong Zhou
Ruilong Wang
Yanyan Xu
Dengfeng Ke
author_sort Haoxiang Chen
collection DOAJ
description Abstract Research indicates that utilizing the spectrum in the time–frequency domain plays a crucial role in speech enhancement tasks, as it can better extract audio features and reduce computational consumption. For the speech enhancement methods in the time–frequency domain, the introduction of attention mechanisms and the application of DenseBlock have yielded promising results. In particular, the Unet architecture, which comprises three main components, the encoder, the decoder, and the bottleneck, employs DenseBlock in both the encoder and the decoder to achieve powerful feature fusion capabilities with fewer parameters. In this paper, in order to enhance the advantages of the aforementioned methods for speech enhancement, we propose a Unet-based time–frequency domain denoising model called TFDense-Net. It utilizes our improved DenseBlock for feature extraction in both the encoder and the decoder and employs an attention mechanism in the bottleneck for feature fusion and denoising. The model has demonstrated excellent performance for speech enhancement tasks, achieving significant improvements in the Si-SDR metric compared to other state-of-the-art models. Additionally, to further enhance the denoising performance and increase the receptive field of the model, we introduce a multi-spectrogram discriminator based on multiple STFTs. Since the discriminator loss can observe the correlations between spectra that traditional loss functions cannot detect, we train TFDense-Net as a generator against the multi-spectrogram discriminator, resulting in a significant improvement in the denoising performance, and we name this enhanced model TFDense-GAN. We evaluate our proposed TFDense-Net and TFDense-GAN on two public datasets: the VCTK + DEMAND dataset and the Interspeech Deep Noise Suppression Challenge dataset. Experimental results show that TFDense-GAN outperforms most existing models in terms of STOI, PESQ, and Si-SDR, achieving state-of-the-art results. The comparison samples of TFDense-GAN and other models can be accessed from https://github.com/yhsjoker/TFDense-GAN .
format Article
id doaj-art-d25ab9325b92464ebc4b53e2e8ad4cd3
institution Kabale University
issn 1687-6180
language English
publishDate 2025-03-01
publisher SpringerOpen
record_format Article
series EURASIP Journal on Advances in Signal Processing
spelling doaj-art-d25ab9325b92464ebc4b53e2e8ad4cd32025-08-20T03:39:57ZengSpringerOpenEURASIP Journal on Advances in Signal Processing1687-61802025-03-012025112410.1186/s13634-025-01210-1TFDense-GAN: a generative adversarial network for single-channel speech enhancementHaoxiang Chen0Jinxiu Zhang1Yaogang Fu2Xintong Zhou3Ruilong Wang4Yanyan Xu5Dengfeng Ke6Beijing Forestry UniversityBeijing Forestry UniversityBeijing Forestry UniversityBeijing Forestry UniversityBeijing Forestry UniversityBeijing Forestry UniversityBeijing Language and Culture UniversityAbstract Research indicates that utilizing the spectrum in the time–frequency domain plays a crucial role in speech enhancement tasks, as it can better extract audio features and reduce computational consumption. For the speech enhancement methods in the time–frequency domain, the introduction of attention mechanisms and the application of DenseBlock have yielded promising results. In particular, the Unet architecture, which comprises three main components, the encoder, the decoder, and the bottleneck, employs DenseBlock in both the encoder and the decoder to achieve powerful feature fusion capabilities with fewer parameters. In this paper, in order to enhance the advantages of the aforementioned methods for speech enhancement, we propose a Unet-based time–frequency domain denoising model called TFDense-Net. It utilizes our improved DenseBlock for feature extraction in both the encoder and the decoder and employs an attention mechanism in the bottleneck for feature fusion and denoising. The model has demonstrated excellent performance for speech enhancement tasks, achieving significant improvements in the Si-SDR metric compared to other state-of-the-art models. Additionally, to further enhance the denoising performance and increase the receptive field of the model, we introduce a multi-spectrogram discriminator based on multiple STFTs. Since the discriminator loss can observe the correlations between spectra that traditional loss functions cannot detect, we train TFDense-Net as a generator against the multi-spectrogram discriminator, resulting in a significant improvement in the denoising performance, and we name this enhanced model TFDense-GAN. We evaluate our proposed TFDense-Net and TFDense-GAN on two public datasets: the VCTK + DEMAND dataset and the Interspeech Deep Noise Suppression Challenge dataset. Experimental results show that TFDense-GAN outperforms most existing models in terms of STOI, PESQ, and Si-SDR, achieving state-of-the-art results. The comparison samples of TFDense-GAN and other models can be accessed from https://github.com/yhsjoker/TFDense-GAN .https://doi.org/10.1186/s13634-025-01210-1Speech enhancementTime–frequency domainGenerative adversarial networkImproved DenseBlockTime–frequency transformer
spellingShingle Haoxiang Chen
Jinxiu Zhang
Yaogang Fu
Xintong Zhou
Ruilong Wang
Yanyan Xu
Dengfeng Ke
TFDense-GAN: a generative adversarial network for single-channel speech enhancement
EURASIP Journal on Advances in Signal Processing
Speech enhancement
Time–frequency domain
Generative adversarial network
Improved DenseBlock
Time–frequency transformer
title TFDense-GAN: a generative adversarial network for single-channel speech enhancement
title_full TFDense-GAN: a generative adversarial network for single-channel speech enhancement
title_fullStr TFDense-GAN: a generative adversarial network for single-channel speech enhancement
title_full_unstemmed TFDense-GAN: a generative adversarial network for single-channel speech enhancement
title_short TFDense-GAN: a generative adversarial network for single-channel speech enhancement
title_sort tfdense gan a generative adversarial network for single channel speech enhancement
topic Speech enhancement
Time–frequency domain
Generative adversarial network
Improved DenseBlock
Time–frequency transformer
url https://doi.org/10.1186/s13634-025-01210-1
work_keys_str_mv AT haoxiangchen tfdenseganagenerativeadversarialnetworkforsinglechannelspeechenhancement
AT jinxiuzhang tfdenseganagenerativeadversarialnetworkforsinglechannelspeechenhancement
AT yaogangfu tfdenseganagenerativeadversarialnetworkforsinglechannelspeechenhancement
AT xintongzhou tfdenseganagenerativeadversarialnetworkforsinglechannelspeechenhancement
AT ruilongwang tfdenseganagenerativeadversarialnetworkforsinglechannelspeechenhancement
AT yanyanxu tfdenseganagenerativeadversarialnetworkforsinglechannelspeechenhancement
AT dengfengke tfdenseganagenerativeadversarialnetworkforsinglechannelspeechenhancement