Generic speech enhancement with self-supervised representation space loss

Single-channel speech enhancement is utilized in various tasks to mitigate the effect of interfering signals. Conventionally, to ensure the speech enhancement performs optimally, the speech enhancement has needed to be tuned for each task. Thus, generalizing speech enhancement models to unknown down...

Full description

Saved in:
Bibliographic Details
Main Authors: Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Takafumi Moriya, Takanori Ashihara, Ryo Masumura
Format: Article
Language:English
Published: Frontiers Media S.A. 2025-07-01
Series:Frontiers in Signal Processing
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/frsip.2025.1587969/full
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849428044600573952
author Hiroshi Sato
Tsubasa Ochiai
Marc Delcroix
Takafumi Moriya
Takanori Ashihara
Ryo Masumura
author_facet Hiroshi Sato
Tsubasa Ochiai
Marc Delcroix
Takafumi Moriya
Takanori Ashihara
Ryo Masumura
author_sort Hiroshi Sato
collection DOAJ
description Single-channel speech enhancement is utilized in various tasks to mitigate the effect of interfering signals. Conventionally, to ensure the speech enhancement performs optimally, the speech enhancement has needed to be tuned for each task. Thus, generalizing speech enhancement models to unknown downstream tasks has been challenging. This study aims to construct a generic speech enhancement front-end that can improve the performance of back-ends to solve multiple downstream tasks. To this end, we propose a novel training criterion that minimizes the distance between the enhanced and the ground truth clean signal in the feature representation domain of self-supervised learning models. Since self-supervised learning feature representations effectively express high-level speech information useful for solving various downstream tasks, the proposal is expected to make speech enhancement models preserve such information. Experimental validation demonstrates that the proposal improves the performance of multiple speech tasks while maintaining the perceptual quality of the enhanced signal.
format Article
id doaj-art-421169964ec845128bfd1141dbb4a46e
institution Kabale University
issn 2673-8198
language English
publishDate 2025-07-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Signal Processing
spelling doaj-art-421169964ec845128bfd1141dbb4a46e2025-08-20T03:28:50ZengFrontiers Media S.A.Frontiers in Signal Processing2673-81982025-07-01510.3389/frsip.2025.15879691587969Generic speech enhancement with self-supervised representation space lossHiroshi SatoTsubasa OchiaiMarc DelcroixTakafumi MoriyaTakanori AshiharaRyo MasumuraSingle-channel speech enhancement is utilized in various tasks to mitigate the effect of interfering signals. Conventionally, to ensure the speech enhancement performs optimally, the speech enhancement has needed to be tuned for each task. Thus, generalizing speech enhancement models to unknown downstream tasks has been challenging. This study aims to construct a generic speech enhancement front-end that can improve the performance of back-ends to solve multiple downstream tasks. To this end, we propose a novel training criterion that minimizes the distance between the enhanced and the ground truth clean signal in the feature representation domain of self-supervised learning models. Since self-supervised learning feature representations effectively express high-level speech information useful for solving various downstream tasks, the proposal is expected to make speech enhancement models preserve such information. Experimental validation demonstrates that the proposal improves the performance of multiple speech tasks while maintaining the perceptual quality of the enhanced signal.https://www.frontiersin.org/articles/10.3389/frsip.2025.1587969/fullself-supervised learningloss functionSUPERB benchmarksignal denoisingspeech enhancementdeep learning
spellingShingle Hiroshi Sato
Tsubasa Ochiai
Marc Delcroix
Takafumi Moriya
Takanori Ashihara
Ryo Masumura
Generic speech enhancement with self-supervised representation space loss
Frontiers in Signal Processing
self-supervised learning
loss function
SUPERB benchmark
signal denoising
speech enhancement
deep learning
title Generic speech enhancement with self-supervised representation space loss
title_full Generic speech enhancement with self-supervised representation space loss
title_fullStr Generic speech enhancement with self-supervised representation space loss
title_full_unstemmed Generic speech enhancement with self-supervised representation space loss
title_short Generic speech enhancement with self-supervised representation space loss
title_sort generic speech enhancement with self supervised representation space loss
topic self-supervised learning
loss function
SUPERB benchmark
signal denoising
speech enhancement
deep learning
url https://www.frontiersin.org/articles/10.3389/frsip.2025.1587969/full
work_keys_str_mv AT hiroshisato genericspeechenhancementwithselfsupervisedrepresentationspaceloss
AT tsubasaochiai genericspeechenhancementwithselfsupervisedrepresentationspaceloss
AT marcdelcroix genericspeechenhancementwithselfsupervisedrepresentationspaceloss
AT takafumimoriya genericspeechenhancementwithselfsupervisedrepresentationspaceloss
AT takanoriashihara genericspeechenhancementwithselfsupervisedrepresentationspaceloss
AT ryomasumura genericspeechenhancementwithselfsupervisedrepresentationspaceloss