Enhancing depression recognition through a mixed expert model by integrating speaker-related and emotion-related features

Abstract The World Health Organization predicts that by 2030, depression will be the most common mental disorder, significantly affecting individuals, families, and society. Speech, as a sensitive indicator, reveals noticeable acoustic changes linked to physiological and cognitive variations, making...

Full description

Saved in:
Bibliographic Details
Main Authors: Weitong Guo, Qian He, Ziyu Lin, Xiaolong Bu, Ziyang Wang, Dong Li, Hongwu Yang
Format: Article
Language:English
Published: Nature Portfolio 2025-02-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-025-88313-9
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1823862139626979328
author Weitong Guo
Qian He
Ziyu Lin
Xiaolong Bu
Ziyang Wang
Dong Li
Hongwu Yang
author_facet Weitong Guo
Qian He
Ziyu Lin
Xiaolong Bu
Ziyang Wang
Dong Li
Hongwu Yang
author_sort Weitong Guo
collection DOAJ
description Abstract The World Health Organization predicts that by 2030, depression will be the most common mental disorder, significantly affecting individuals, families, and society. Speech, as a sensitive indicator, reveals noticeable acoustic changes linked to physiological and cognitive variations, making it a crucial behavioral marker for detecting depression. However, existing studies often overlook the separation of speaker-related and emotion-related features in speech when recognizing depression. To tackle this challenge, we propose a Mixture-of-Experts (MoE) method that integrates speaker-related and emotion-related features for depression recognition. Our approach begins with a Time Delay Neural Network to pre-train a speaker-related feature extractor using a large-scale speaker recognition dataset while simultaneously pre-training a speaker’s emotion-related feature extractor with a speech emotion dataset. We then apply transfer learning to extract both features from a depression dataset, followed by fusion. A multi-domain adaptation algorithm trains the MoE model for depression recognition. Experimental results demonstrate that our method achieves 74.3% accuracy on a self-built Chinese localized depression dataset and an MAE of 6.32 on the AVEC2014 dataset. Thus, it outperforms state-of-the-art deep learning methods that use speech features. Additionally, our approach shows strong performance across Chinese and English speech datasets, highlighting its effectiveness in addressing cultural variations.
format Article
id doaj-art-3115de41bef340f0bbf4ba8ea8ccdf06
institution Kabale University
issn 2045-2322
language English
publishDate 2025-02-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-3115de41bef340f0bbf4ba8ea8ccdf062025-02-09T12:37:49ZengNature PortfolioScientific Reports2045-23222025-02-0115111510.1038/s41598-025-88313-9Enhancing depression recognition through a mixed expert model by integrating speaker-related and emotion-related featuresWeitong Guo0Qian He1Ziyu Lin2Xiaolong Bu3Ziyang Wang4Dong Li5Hongwu Yang6School of Educational Technology, Northwest Normal UniversitySchool of Educational Technology, Northwest Normal UniversitySchool of Educational Technology, Northwest Normal UniversitySchool of Educational Technology, Northwest Normal UniversitySchool of Educational Technology, Northwest Normal UniversityFaculty of Artificial Intelligence in Education, Central China Normal UniversitySchool of Educational Technology, Northwest Normal UniversityAbstract The World Health Organization predicts that by 2030, depression will be the most common mental disorder, significantly affecting individuals, families, and society. Speech, as a sensitive indicator, reveals noticeable acoustic changes linked to physiological and cognitive variations, making it a crucial behavioral marker for detecting depression. However, existing studies often overlook the separation of speaker-related and emotion-related features in speech when recognizing depression. To tackle this challenge, we propose a Mixture-of-Experts (MoE) method that integrates speaker-related and emotion-related features for depression recognition. Our approach begins with a Time Delay Neural Network to pre-train a speaker-related feature extractor using a large-scale speaker recognition dataset while simultaneously pre-training a speaker’s emotion-related feature extractor with a speech emotion dataset. We then apply transfer learning to extract both features from a depression dataset, followed by fusion. A multi-domain adaptation algorithm trains the MoE model for depression recognition. Experimental results demonstrate that our method achieves 74.3% accuracy on a self-built Chinese localized depression dataset and an MAE of 6.32 on the AVEC2014 dataset. Thus, it outperforms state-of-the-art deep learning methods that use speech features. Additionally, our approach shows strong performance across Chinese and English speech datasets, highlighting its effectiveness in addressing cultural variations.https://doi.org/10.1038/s41598-025-88313-9
spellingShingle Weitong Guo
Qian He
Ziyu Lin
Xiaolong Bu
Ziyang Wang
Dong Li
Hongwu Yang
Enhancing depression recognition through a mixed expert model by integrating speaker-related and emotion-related features
Scientific Reports
title Enhancing depression recognition through a mixed expert model by integrating speaker-related and emotion-related features
title_full Enhancing depression recognition through a mixed expert model by integrating speaker-related and emotion-related features
title_fullStr Enhancing depression recognition through a mixed expert model by integrating speaker-related and emotion-related features
title_full_unstemmed Enhancing depression recognition through a mixed expert model by integrating speaker-related and emotion-related features
title_short Enhancing depression recognition through a mixed expert model by integrating speaker-related and emotion-related features
title_sort enhancing depression recognition through a mixed expert model by integrating speaker related and emotion related features
url https://doi.org/10.1038/s41598-025-88313-9
work_keys_str_mv AT weitongguo enhancingdepressionrecognitionthroughamixedexpertmodelbyintegratingspeakerrelatedandemotionrelatedfeatures
AT qianhe enhancingdepressionrecognitionthroughamixedexpertmodelbyintegratingspeakerrelatedandemotionrelatedfeatures
AT ziyulin enhancingdepressionrecognitionthroughamixedexpertmodelbyintegratingspeakerrelatedandemotionrelatedfeatures
AT xiaolongbu enhancingdepressionrecognitionthroughamixedexpertmodelbyintegratingspeakerrelatedandemotionrelatedfeatures
AT ziyangwang enhancingdepressionrecognitionthroughamixedexpertmodelbyintegratingspeakerrelatedandemotionrelatedfeatures
AT dongli enhancingdepressionrecognitionthroughamixedexpertmodelbyintegratingspeakerrelatedandemotionrelatedfeatures
AT hongwuyang enhancingdepressionrecognitionthroughamixedexpertmodelbyintegratingspeakerrelatedandemotionrelatedfeatures