Cross-modal knowledge distillation for enhanced depression detection

Abstract Depression is a severe mental disorder characterized by high prevalence, high recurrence, high disability, and high mortality rates. Consequently, timely detection and treatment are crucial. In recent years, speech-based methods for depression detection have been widely applied in clinical...

Full description

Saved in:

Bibliographic Details
Main Authors:	Huang Huang, Xinhui Li, Minchao Wu, Zhao Lv, Yong Peng
Format:	Article
Language:	English
Published:	Springer 2025-08-01
Series:	Complex & Intelligent Systems
Subjects:	Brain information Speech signals Depression detection Cross-modal Knowledge distillation
Online Access:	https://doi.org/10.1007/s40747-025-02035-z
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849225786380255232
author	Huang Huang Xinhui Li Minchao Wu Zhao Lv Yong Peng
author_facet	Huang Huang Xinhui Li Minchao Wu Zhao Lv Yong Peng
author_sort	Huang Huang
collection	DOAJ
description	Abstract Depression is a severe mental disorder characterized by high prevalence, high recurrence, high disability, and high mortality rates. Consequently, timely detection and treatment are crucial. In recent years, speech-based methods for depression detection have been widely applied in clinical diagnostics. This is largely because they feature simple data collection and provide a positive user experience. However, these methods are limited by their susceptibility to deception and low accuracy rates. In contrast, brain information can provide neurobiological markers for depression, offering greater objectivity and higher accuracy in detection. To enhance the accuracy of speech-based detection, we propose a cross-modal knowledge distillation method that integrates speech signals with brain information, resulting in a more reliable and universally applicable approach for detecting depression. Specifically, a new multimodal model was first constructed as the teacher model, while the unimodal speech model served as the student model. Through knowledge distillation, the rich knowledge encapsulated in the teacher model was transferred to the student model, thereby enhancing its performance. Finally, experiments were conducted on the public MODMA dataset, achieving an accuracy of 83.19% for the distilled model, which represents a 3.47% improvement compared to traditional speech recognition methods. These results validate the effectiveness and feasibility of our proposed method, offering more effective support for the clinical diagnosis of depression.
format	Article
id	doaj-art-b1933aa97be24f2a9c686a9d28a7d804
institution	Kabale University
issn	2199-4536 2198-6053
language	English
publishDate	2025-08-01
publisher	Springer
record_format	Article
series	Complex & Intelligent Systems
spelling	doaj-art-b1933aa97be24f2a9c686a9d28a7d8042025-08-24T12:02:24ZengSpringerComplex & Intelligent Systems2199-45362198-60532025-08-01111011510.1007/s40747-025-02035-zCross-modal knowledge distillation for enhanced depression detectionHuang Huang0Xinhui Li1Minchao Wu2Zhao Lv3Yong Peng4Anhui Province Key Laboratory of Multimodal Cognitive Computation, School of Computer Science and Technology, Anhui UniversityAnhui Province Key Laboratory of Multimodal Cognitive Computation, School of Computer Science and Technology, Anhui UniversitySchool of Computer and Artificial Intelligence, Hefei Normal UniversityAnhui Province Key Laboratory of Multimodal Cognitive Computation, School of Computer Science and Technology, Anhui UniversitySchool of Computer Science and Technology, Hangzhou Dianzi UniversityAbstract Depression is a severe mental disorder characterized by high prevalence, high recurrence, high disability, and high mortality rates. Consequently, timely detection and treatment are crucial. In recent years, speech-based methods for depression detection have been widely applied in clinical diagnostics. This is largely because they feature simple data collection and provide a positive user experience. However, these methods are limited by their susceptibility to deception and low accuracy rates. In contrast, brain information can provide neurobiological markers for depression, offering greater objectivity and higher accuracy in detection. To enhance the accuracy of speech-based detection, we propose a cross-modal knowledge distillation method that integrates speech signals with brain information, resulting in a more reliable and universally applicable approach for detecting depression. Specifically, a new multimodal model was first constructed as the teacher model, while the unimodal speech model served as the student model. Through knowledge distillation, the rich knowledge encapsulated in the teacher model was transferred to the student model, thereby enhancing its performance. Finally, experiments were conducted on the public MODMA dataset, achieving an accuracy of 83.19% for the distilled model, which represents a 3.47% improvement compared to traditional speech recognition methods. These results validate the effectiveness and feasibility of our proposed method, offering more effective support for the clinical diagnosis of depression.https://doi.org/10.1007/s40747-025-02035-zBrain informationSpeech signalsDepression detectionCross-modalKnowledge distillation
spellingShingle	Huang Huang Xinhui Li Minchao Wu Zhao Lv Yong Peng Cross-modal knowledge distillation for enhanced depression detection Complex & Intelligent Systems Brain information Speech signals Depression detection Cross-modal Knowledge distillation
title	Cross-modal knowledge distillation for enhanced depression detection
title_full	Cross-modal knowledge distillation for enhanced depression detection
title_fullStr	Cross-modal knowledge distillation for enhanced depression detection
title_full_unstemmed	Cross-modal knowledge distillation for enhanced depression detection
title_short	Cross-modal knowledge distillation for enhanced depression detection
title_sort	cross modal knowledge distillation for enhanced depression detection
topic	Brain information Speech signals Depression detection Cross-modal Knowledge distillation
url	https://doi.org/10.1007/s40747-025-02035-z
work_keys_str_mv	AT huanghuang crossmodalknowledgedistillationforenhanceddepressiondetection AT xinhuili crossmodalknowledgedistillationforenhanceddepressiondetection AT minchaowu crossmodalknowledgedistillationforenhanceddepressiondetection AT zhaolv crossmodalknowledgedistillationforenhanceddepressiondetection AT yongpeng crossmodalknowledgedistillationforenhanceddepressiondetection

Cross-modal knowledge distillation for enhanced depression detection

Similar Items