LSENet: A Lightweight Spectral Enhancement Network for High-Quality Speech Processing on Resource-Constrained Platforms

Although recent deep-learning-based speech enhancement (SE) methods significantly outperform traditional approaches, their computational demands often scale proportionally with their performance. This scaling typically makes them impractical for deployment on data throughput-sensitive and resource-c...

Full description

Saved in:
Bibliographic Details
Main Authors: Hyeong Il Koh, Sungdae Na, Myoung Nam Kim
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11071541/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849319784485748736
author Hyeong Il Koh
Sungdae Na
Myoung Nam Kim
author_facet Hyeong Il Koh
Sungdae Na
Myoung Nam Kim
author_sort Hyeong Il Koh
collection DOAJ
description Although recent deep-learning-based speech enhancement (SE) methods significantly outperform traditional approaches, their computational demands often scale proportionally with their performance. This scaling typically makes them impractical for deployment on data throughput-sensitive and resource-constrained edge devices. In this paper, we propose a novel lightweight spectral enhancement network (LSENet) designed to estimate high-quality speech with minimal computational overhead. The network consists of an encoder-decoder architecture enhanced by a group-dilated convolutional module, which efficiently leverages time-frequency domain information while significantly reducing resource consumption through dilated convolutional groups and spectral-wise attention modules. Additionally, to capture the long-range contextual dependencies of the extracted features, an improved dual-path recurrent neural network is introduced between the encoder and decoder structures. Experimental results show that the proposed model achieves competitive performance with state-of-the-art baseline models on the Voicebank + Demand and DNS-Challenge datasets while requiring only 39.4 thousand model parameters and 237 million multiply-accumulate operations.
format Article
id doaj-art-78872d8a979d4887a71b2155f02f6e93
institution Kabale University
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-78872d8a979d4887a71b2155f02f6e932025-08-20T03:50:20ZengIEEEIEEE Access2169-35362025-01-011311693411694310.1109/ACCESS.2025.358595811071541LSENet: A Lightweight Spectral Enhancement Network for High-Quality Speech Processing on Resource-Constrained PlatformsHyeong Il Koh0https://orcid.org/0009-0004-7890-2810Sungdae Na1https://orcid.org/0000-0002-9261-0492Myoung Nam Kim2https://orcid.org/0009-0006-4889-878XDepartment of Medical and Biological Engineering, Graduate School, Kyungpook National University, Daegu, Republic of KoreaDepartment of Biomedical Engineering, Kyungpook National University Hospital, Daegu, Republic of KoreaDepartment of Biomedical Engineering, School of Medicine, Kyungpook National University, Daegu, Republic of KoreaAlthough recent deep-learning-based speech enhancement (SE) methods significantly outperform traditional approaches, their computational demands often scale proportionally with their performance. This scaling typically makes them impractical for deployment on data throughput-sensitive and resource-constrained edge devices. In this paper, we propose a novel lightweight spectral enhancement network (LSENet) designed to estimate high-quality speech with minimal computational overhead. The network consists of an encoder-decoder architecture enhanced by a group-dilated convolutional module, which efficiently leverages time-frequency domain information while significantly reducing resource consumption through dilated convolutional groups and spectral-wise attention modules. Additionally, to capture the long-range contextual dependencies of the extracted features, an improved dual-path recurrent neural network is introduced between the encoder and decoder structures. Experimental results show that the proposed model achieves competitive performance with state-of-the-art baseline models on the Voicebank + Demand and DNS-Challenge datasets while requiring only 39.4 thousand model parameters and 237 million multiply-accumulate operations.https://ieeexplore.ieee.org/document/11071541/Deep learningspeech enhancementlightweight networkattention mechanismsfactorized convolution
spellingShingle Hyeong Il Koh
Sungdae Na
Myoung Nam Kim
LSENet: A Lightweight Spectral Enhancement Network for High-Quality Speech Processing on Resource-Constrained Platforms
IEEE Access
Deep learning
speech enhancement
lightweight network
attention mechanisms
factorized convolution
title LSENet: A Lightweight Spectral Enhancement Network for High-Quality Speech Processing on Resource-Constrained Platforms
title_full LSENet: A Lightweight Spectral Enhancement Network for High-Quality Speech Processing on Resource-Constrained Platforms
title_fullStr LSENet: A Lightweight Spectral Enhancement Network for High-Quality Speech Processing on Resource-Constrained Platforms
title_full_unstemmed LSENet: A Lightweight Spectral Enhancement Network for High-Quality Speech Processing on Resource-Constrained Platforms
title_short LSENet: A Lightweight Spectral Enhancement Network for High-Quality Speech Processing on Resource-Constrained Platforms
title_sort lsenet a lightweight spectral enhancement network for high quality speech processing on resource constrained platforms
topic Deep learning
speech enhancement
lightweight network
attention mechanisms
factorized convolution
url https://ieeexplore.ieee.org/document/11071541/
work_keys_str_mv AT hyeongilkoh lsenetalightweightspectralenhancementnetworkforhighqualityspeechprocessingonresourceconstrainedplatforms
AT sungdaena lsenetalightweightspectralenhancementnetworkforhighqualityspeechprocessingonresourceconstrainedplatforms
AT myoungnamkim lsenetalightweightspectralenhancementnetworkforhighqualityspeechprocessingonresourceconstrainedplatforms