Generic speech enhancement with self-supervised representation space loss
Single-channel speech enhancement is utilized in various tasks to mitigate the effect of interfering signals. Conventionally, to ensure the speech enhancement performs optimally, the speech enhancement has needed to be tuned for each task. Thus, generalizing speech enhancement models to unknown down...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Frontiers Media S.A.
2025-07-01
|
| Series: | Frontiers in Signal Processing |
| Subjects: | |
| Online Access: | https://www.frontiersin.org/articles/10.3389/frsip.2025.1587969/full |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849428044600573952 |
|---|---|
| author | Hiroshi Sato Tsubasa Ochiai Marc Delcroix Takafumi Moriya Takanori Ashihara Ryo Masumura |
| author_facet | Hiroshi Sato Tsubasa Ochiai Marc Delcroix Takafumi Moriya Takanori Ashihara Ryo Masumura |
| author_sort | Hiroshi Sato |
| collection | DOAJ |
| description | Single-channel speech enhancement is utilized in various tasks to mitigate the effect of interfering signals. Conventionally, to ensure the speech enhancement performs optimally, the speech enhancement has needed to be tuned for each task. Thus, generalizing speech enhancement models to unknown downstream tasks has been challenging. This study aims to construct a generic speech enhancement front-end that can improve the performance of back-ends to solve multiple downstream tasks. To this end, we propose a novel training criterion that minimizes the distance between the enhanced and the ground truth clean signal in the feature representation domain of self-supervised learning models. Since self-supervised learning feature representations effectively express high-level speech information useful for solving various downstream tasks, the proposal is expected to make speech enhancement models preserve such information. Experimental validation demonstrates that the proposal improves the performance of multiple speech tasks while maintaining the perceptual quality of the enhanced signal. |
| format | Article |
| id | doaj-art-421169964ec845128bfd1141dbb4a46e |
| institution | Kabale University |
| issn | 2673-8198 |
| language | English |
| publishDate | 2025-07-01 |
| publisher | Frontiers Media S.A. |
| record_format | Article |
| series | Frontiers in Signal Processing |
| spelling | doaj-art-421169964ec845128bfd1141dbb4a46e2025-08-20T03:28:50ZengFrontiers Media S.A.Frontiers in Signal Processing2673-81982025-07-01510.3389/frsip.2025.15879691587969Generic speech enhancement with self-supervised representation space lossHiroshi SatoTsubasa OchiaiMarc DelcroixTakafumi MoriyaTakanori AshiharaRyo MasumuraSingle-channel speech enhancement is utilized in various tasks to mitigate the effect of interfering signals. Conventionally, to ensure the speech enhancement performs optimally, the speech enhancement has needed to be tuned for each task. Thus, generalizing speech enhancement models to unknown downstream tasks has been challenging. This study aims to construct a generic speech enhancement front-end that can improve the performance of back-ends to solve multiple downstream tasks. To this end, we propose a novel training criterion that minimizes the distance between the enhanced and the ground truth clean signal in the feature representation domain of self-supervised learning models. Since self-supervised learning feature representations effectively express high-level speech information useful for solving various downstream tasks, the proposal is expected to make speech enhancement models preserve such information. Experimental validation demonstrates that the proposal improves the performance of multiple speech tasks while maintaining the perceptual quality of the enhanced signal.https://www.frontiersin.org/articles/10.3389/frsip.2025.1587969/fullself-supervised learningloss functionSUPERB benchmarksignal denoisingspeech enhancementdeep learning |
| spellingShingle | Hiroshi Sato Tsubasa Ochiai Marc Delcroix Takafumi Moriya Takanori Ashihara Ryo Masumura Generic speech enhancement with self-supervised representation space loss Frontiers in Signal Processing self-supervised learning loss function SUPERB benchmark signal denoising speech enhancement deep learning |
| title | Generic speech enhancement with self-supervised representation space loss |
| title_full | Generic speech enhancement with self-supervised representation space loss |
| title_fullStr | Generic speech enhancement with self-supervised representation space loss |
| title_full_unstemmed | Generic speech enhancement with self-supervised representation space loss |
| title_short | Generic speech enhancement with self-supervised representation space loss |
| title_sort | generic speech enhancement with self supervised representation space loss |
| topic | self-supervised learning loss function SUPERB benchmark signal denoising speech enhancement deep learning |
| url | https://www.frontiersin.org/articles/10.3389/frsip.2025.1587969/full |
| work_keys_str_mv | AT hiroshisato genericspeechenhancementwithselfsupervisedrepresentationspaceloss AT tsubasaochiai genericspeechenhancementwithselfsupervisedrepresentationspaceloss AT marcdelcroix genericspeechenhancementwithselfsupervisedrepresentationspaceloss AT takafumimoriya genericspeechenhancementwithselfsupervisedrepresentationspaceloss AT takanoriashihara genericspeechenhancementwithselfsupervisedrepresentationspaceloss AT ryomasumura genericspeechenhancementwithselfsupervisedrepresentationspaceloss |