Cross-modal knowledge distillation for enhanced depression detection

Abstract Depression is a severe mental disorder characterized by high prevalence, high recurrence, high disability, and high mortality rates. Consequently, timely detection and treatment are crucial. In recent years, speech-based methods for depression detection have been widely applied in clinical...

Full description

Saved in:
Bibliographic Details
Main Authors: Huang Huang, Xinhui Li, Minchao Wu, Zhao Lv, Yong Peng
Format: Article
Language:English
Published: Springer 2025-08-01
Series:Complex & Intelligent Systems
Subjects:
Online Access:https://doi.org/10.1007/s40747-025-02035-z
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849225786380255232
author Huang Huang
Xinhui Li
Minchao Wu
Zhao Lv
Yong Peng
author_facet Huang Huang
Xinhui Li
Minchao Wu
Zhao Lv
Yong Peng
author_sort Huang Huang
collection DOAJ
description Abstract Depression is a severe mental disorder characterized by high prevalence, high recurrence, high disability, and high mortality rates. Consequently, timely detection and treatment are crucial. In recent years, speech-based methods for depression detection have been widely applied in clinical diagnostics. This is largely because they feature simple data collection and provide a positive user experience. However, these methods are limited by their susceptibility to deception and low accuracy rates. In contrast, brain information can provide neurobiological markers for depression, offering greater objectivity and higher accuracy in detection. To enhance the accuracy of speech-based detection, we propose a cross-modal knowledge distillation method that integrates speech signals with brain information, resulting in a more reliable and universally applicable approach for detecting depression. Specifically, a new multimodal model was first constructed as the teacher model, while the unimodal speech model served as the student model. Through knowledge distillation, the rich knowledge encapsulated in the teacher model was transferred to the student model, thereby enhancing its performance. Finally, experiments were conducted on the public MODMA dataset, achieving an accuracy of 83.19% for the distilled model, which represents a 3.47% improvement compared to traditional speech recognition methods. These results validate the effectiveness and feasibility of our proposed method, offering more effective support for the clinical diagnosis of depression.
format Article
id doaj-art-b1933aa97be24f2a9c686a9d28a7d804
institution Kabale University
issn 2199-4536
2198-6053
language English
publishDate 2025-08-01
publisher Springer
record_format Article
series Complex & Intelligent Systems
spelling doaj-art-b1933aa97be24f2a9c686a9d28a7d8042025-08-24T12:02:24ZengSpringerComplex & Intelligent Systems2199-45362198-60532025-08-01111011510.1007/s40747-025-02035-zCross-modal knowledge distillation for enhanced depression detectionHuang Huang0Xinhui Li1Minchao Wu2Zhao Lv3Yong Peng4Anhui Province Key Laboratory of Multimodal Cognitive Computation, School of Computer Science and Technology, Anhui UniversityAnhui Province Key Laboratory of Multimodal Cognitive Computation, School of Computer Science and Technology, Anhui UniversitySchool of Computer and Artificial Intelligence, Hefei Normal UniversityAnhui Province Key Laboratory of Multimodal Cognitive Computation, School of Computer Science and Technology, Anhui UniversitySchool of Computer Science and Technology, Hangzhou Dianzi UniversityAbstract Depression is a severe mental disorder characterized by high prevalence, high recurrence, high disability, and high mortality rates. Consequently, timely detection and treatment are crucial. In recent years, speech-based methods for depression detection have been widely applied in clinical diagnostics. This is largely because they feature simple data collection and provide a positive user experience. However, these methods are limited by their susceptibility to deception and low accuracy rates. In contrast, brain information can provide neurobiological markers for depression, offering greater objectivity and higher accuracy in detection. To enhance the accuracy of speech-based detection, we propose a cross-modal knowledge distillation method that integrates speech signals with brain information, resulting in a more reliable and universally applicable approach for detecting depression. Specifically, a new multimodal model was first constructed as the teacher model, while the unimodal speech model served as the student model. Through knowledge distillation, the rich knowledge encapsulated in the teacher model was transferred to the student model, thereby enhancing its performance. Finally, experiments were conducted on the public MODMA dataset, achieving an accuracy of 83.19% for the distilled model, which represents a 3.47% improvement compared to traditional speech recognition methods. These results validate the effectiveness and feasibility of our proposed method, offering more effective support for the clinical diagnosis of depression.https://doi.org/10.1007/s40747-025-02035-zBrain informationSpeech signalsDepression detectionCross-modalKnowledge distillation
spellingShingle Huang Huang
Xinhui Li
Minchao Wu
Zhao Lv
Yong Peng
Cross-modal knowledge distillation for enhanced depression detection
Complex & Intelligent Systems
Brain information
Speech signals
Depression detection
Cross-modal
Knowledge distillation
title Cross-modal knowledge distillation for enhanced depression detection
title_full Cross-modal knowledge distillation for enhanced depression detection
title_fullStr Cross-modal knowledge distillation for enhanced depression detection
title_full_unstemmed Cross-modal knowledge distillation for enhanced depression detection
title_short Cross-modal knowledge distillation for enhanced depression detection
title_sort cross modal knowledge distillation for enhanced depression detection
topic Brain information
Speech signals
Depression detection
Cross-modal
Knowledge distillation
url https://doi.org/10.1007/s40747-025-02035-z
work_keys_str_mv AT huanghuang crossmodalknowledgedistillationforenhanceddepressiondetection
AT xinhuili crossmodalknowledgedistillationforenhanceddepressiondetection
AT minchaowu crossmodalknowledgedistillationforenhanceddepressiondetection
AT zhaolv crossmodalknowledgedistillationforenhanceddepressiondetection
AT yongpeng crossmodalknowledgedistillationforenhanceddepressiondetection