LSENet: A Lightweight Spectral Enhancement Network for High-Quality Speech Processing on Resource-Constrained Platforms
Although recent deep-learning-based speech enhancement (SE) methods significantly outperform traditional approaches, their computational demands often scale proportionally with their performance. This scaling typically makes them impractical for deployment on data throughput-sensitive and resource-c...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11071541/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849319784485748736 |
|---|---|
| author | Hyeong Il Koh Sungdae Na Myoung Nam Kim |
| author_facet | Hyeong Il Koh Sungdae Na Myoung Nam Kim |
| author_sort | Hyeong Il Koh |
| collection | DOAJ |
| description | Although recent deep-learning-based speech enhancement (SE) methods significantly outperform traditional approaches, their computational demands often scale proportionally with their performance. This scaling typically makes them impractical for deployment on data throughput-sensitive and resource-constrained edge devices. In this paper, we propose a novel lightweight spectral enhancement network (LSENet) designed to estimate high-quality speech with minimal computational overhead. The network consists of an encoder-decoder architecture enhanced by a group-dilated convolutional module, which efficiently leverages time-frequency domain information while significantly reducing resource consumption through dilated convolutional groups and spectral-wise attention modules. Additionally, to capture the long-range contextual dependencies of the extracted features, an improved dual-path recurrent neural network is introduced between the encoder and decoder structures. Experimental results show that the proposed model achieves competitive performance with state-of-the-art baseline models on the Voicebank + Demand and DNS-Challenge datasets while requiring only 39.4 thousand model parameters and 237 million multiply-accumulate operations. |
| format | Article |
| id | doaj-art-78872d8a979d4887a71b2155f02f6e93 |
| institution | Kabale University |
| issn | 2169-3536 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-78872d8a979d4887a71b2155f02f6e932025-08-20T03:50:20ZengIEEEIEEE Access2169-35362025-01-011311693411694310.1109/ACCESS.2025.358595811071541LSENet: A Lightweight Spectral Enhancement Network for High-Quality Speech Processing on Resource-Constrained PlatformsHyeong Il Koh0https://orcid.org/0009-0004-7890-2810Sungdae Na1https://orcid.org/0000-0002-9261-0492Myoung Nam Kim2https://orcid.org/0009-0006-4889-878XDepartment of Medical and Biological Engineering, Graduate School, Kyungpook National University, Daegu, Republic of KoreaDepartment of Biomedical Engineering, Kyungpook National University Hospital, Daegu, Republic of KoreaDepartment of Biomedical Engineering, School of Medicine, Kyungpook National University, Daegu, Republic of KoreaAlthough recent deep-learning-based speech enhancement (SE) methods significantly outperform traditional approaches, their computational demands often scale proportionally with their performance. This scaling typically makes them impractical for deployment on data throughput-sensitive and resource-constrained edge devices. In this paper, we propose a novel lightweight spectral enhancement network (LSENet) designed to estimate high-quality speech with minimal computational overhead. The network consists of an encoder-decoder architecture enhanced by a group-dilated convolutional module, which efficiently leverages time-frequency domain information while significantly reducing resource consumption through dilated convolutional groups and spectral-wise attention modules. Additionally, to capture the long-range contextual dependencies of the extracted features, an improved dual-path recurrent neural network is introduced between the encoder and decoder structures. Experimental results show that the proposed model achieves competitive performance with state-of-the-art baseline models on the Voicebank + Demand and DNS-Challenge datasets while requiring only 39.4 thousand model parameters and 237 million multiply-accumulate operations.https://ieeexplore.ieee.org/document/11071541/Deep learningspeech enhancementlightweight networkattention mechanismsfactorized convolution |
| spellingShingle | Hyeong Il Koh Sungdae Na Myoung Nam Kim LSENet: A Lightweight Spectral Enhancement Network for High-Quality Speech Processing on Resource-Constrained Platforms IEEE Access Deep learning speech enhancement lightweight network attention mechanisms factorized convolution |
| title | LSENet: A Lightweight Spectral Enhancement Network for High-Quality Speech Processing on Resource-Constrained Platforms |
| title_full | LSENet: A Lightweight Spectral Enhancement Network for High-Quality Speech Processing on Resource-Constrained Platforms |
| title_fullStr | LSENet: A Lightweight Spectral Enhancement Network for High-Quality Speech Processing on Resource-Constrained Platforms |
| title_full_unstemmed | LSENet: A Lightweight Spectral Enhancement Network for High-Quality Speech Processing on Resource-Constrained Platforms |
| title_short | LSENet: A Lightweight Spectral Enhancement Network for High-Quality Speech Processing on Resource-Constrained Platforms |
| title_sort | lsenet a lightweight spectral enhancement network for high quality speech processing on resource constrained platforms |
| topic | Deep learning speech enhancement lightweight network attention mechanisms factorized convolution |
| url | https://ieeexplore.ieee.org/document/11071541/ |
| work_keys_str_mv | AT hyeongilkoh lsenetalightweightspectralenhancementnetworkforhighqualityspeechprocessingonresourceconstrainedplatforms AT sungdaena lsenetalightweightspectralenhancementnetworkforhighqualityspeechprocessingonresourceconstrainedplatforms AT myoungnamkim lsenetalightweightspectralenhancementnetworkforhighqualityspeechprocessingonresourceconstrainedplatforms |