LSENet: A Lightweight Spectral Enhancement Network for High-Quality Speech Processing on Resource-Constrained Platforms

Although recent deep-learning-based speech enhancement (SE) methods significantly outperform traditional approaches, their computational demands often scale proportionally with their performance. This scaling typically makes them impractical for deployment on data throughput-sensitive and resource-c...

Full description

Saved in:
Bibliographic Details
Main Authors: Hyeong Il Koh, Sungdae Na, Myoung Nam Kim
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11071541/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Although recent deep-learning-based speech enhancement (SE) methods significantly outperform traditional approaches, their computational demands often scale proportionally with their performance. This scaling typically makes them impractical for deployment on data throughput-sensitive and resource-constrained edge devices. In this paper, we propose a novel lightweight spectral enhancement network (LSENet) designed to estimate high-quality speech with minimal computational overhead. The network consists of an encoder-decoder architecture enhanced by a group-dilated convolutional module, which efficiently leverages time-frequency domain information while significantly reducing resource consumption through dilated convolutional groups and spectral-wise attention modules. Additionally, to capture the long-range contextual dependencies of the extracted features, an improved dual-path recurrent neural network is introduced between the encoder and decoder structures. Experimental results show that the proposed model achieves competitive performance with state-of-the-art baseline models on the Voicebank + Demand and DNS-Challenge datasets while requiring only 39.4 thousand model parameters and 237 million multiply-accumulate operations.
ISSN:2169-3536