Cross-modal knowledge distillation for enhanced depression detection
Abstract Depression is a severe mental disorder characterized by high prevalence, high recurrence, high disability, and high mortality rates. Consequently, timely detection and treatment are crucial. In recent years, speech-based methods for depression detection have been widely applied in clinical...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Springer
2025-08-01
|
| Series: | Complex & Intelligent Systems |
| Subjects: | |
| Online Access: | https://doi.org/10.1007/s40747-025-02035-z |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849225786380255232 |
|---|---|
| author | Huang Huang Xinhui Li Minchao Wu Zhao Lv Yong Peng |
| author_facet | Huang Huang Xinhui Li Minchao Wu Zhao Lv Yong Peng |
| author_sort | Huang Huang |
| collection | DOAJ |
| description | Abstract Depression is a severe mental disorder characterized by high prevalence, high recurrence, high disability, and high mortality rates. Consequently, timely detection and treatment are crucial. In recent years, speech-based methods for depression detection have been widely applied in clinical diagnostics. This is largely because they feature simple data collection and provide a positive user experience. However, these methods are limited by their susceptibility to deception and low accuracy rates. In contrast, brain information can provide neurobiological markers for depression, offering greater objectivity and higher accuracy in detection. To enhance the accuracy of speech-based detection, we propose a cross-modal knowledge distillation method that integrates speech signals with brain information, resulting in a more reliable and universally applicable approach for detecting depression. Specifically, a new multimodal model was first constructed as the teacher model, while the unimodal speech model served as the student model. Through knowledge distillation, the rich knowledge encapsulated in the teacher model was transferred to the student model, thereby enhancing its performance. Finally, experiments were conducted on the public MODMA dataset, achieving an accuracy of 83.19% for the distilled model, which represents a 3.47% improvement compared to traditional speech recognition methods. These results validate the effectiveness and feasibility of our proposed method, offering more effective support for the clinical diagnosis of depression. |
| format | Article |
| id | doaj-art-b1933aa97be24f2a9c686a9d28a7d804 |
| institution | Kabale University |
| issn | 2199-4536 2198-6053 |
| language | English |
| publishDate | 2025-08-01 |
| publisher | Springer |
| record_format | Article |
| series | Complex & Intelligent Systems |
| spelling | doaj-art-b1933aa97be24f2a9c686a9d28a7d8042025-08-24T12:02:24ZengSpringerComplex & Intelligent Systems2199-45362198-60532025-08-01111011510.1007/s40747-025-02035-zCross-modal knowledge distillation for enhanced depression detectionHuang Huang0Xinhui Li1Minchao Wu2Zhao Lv3Yong Peng4Anhui Province Key Laboratory of Multimodal Cognitive Computation, School of Computer Science and Technology, Anhui UniversityAnhui Province Key Laboratory of Multimodal Cognitive Computation, School of Computer Science and Technology, Anhui UniversitySchool of Computer and Artificial Intelligence, Hefei Normal UniversityAnhui Province Key Laboratory of Multimodal Cognitive Computation, School of Computer Science and Technology, Anhui UniversitySchool of Computer Science and Technology, Hangzhou Dianzi UniversityAbstract Depression is a severe mental disorder characterized by high prevalence, high recurrence, high disability, and high mortality rates. Consequently, timely detection and treatment are crucial. In recent years, speech-based methods for depression detection have been widely applied in clinical diagnostics. This is largely because they feature simple data collection and provide a positive user experience. However, these methods are limited by their susceptibility to deception and low accuracy rates. In contrast, brain information can provide neurobiological markers for depression, offering greater objectivity and higher accuracy in detection. To enhance the accuracy of speech-based detection, we propose a cross-modal knowledge distillation method that integrates speech signals with brain information, resulting in a more reliable and universally applicable approach for detecting depression. Specifically, a new multimodal model was first constructed as the teacher model, while the unimodal speech model served as the student model. Through knowledge distillation, the rich knowledge encapsulated in the teacher model was transferred to the student model, thereby enhancing its performance. Finally, experiments were conducted on the public MODMA dataset, achieving an accuracy of 83.19% for the distilled model, which represents a 3.47% improvement compared to traditional speech recognition methods. These results validate the effectiveness and feasibility of our proposed method, offering more effective support for the clinical diagnosis of depression.https://doi.org/10.1007/s40747-025-02035-zBrain informationSpeech signalsDepression detectionCross-modalKnowledge distillation |
| spellingShingle | Huang Huang Xinhui Li Minchao Wu Zhao Lv Yong Peng Cross-modal knowledge distillation for enhanced depression detection Complex & Intelligent Systems Brain information Speech signals Depression detection Cross-modal Knowledge distillation |
| title | Cross-modal knowledge distillation for enhanced depression detection |
| title_full | Cross-modal knowledge distillation for enhanced depression detection |
| title_fullStr | Cross-modal knowledge distillation for enhanced depression detection |
| title_full_unstemmed | Cross-modal knowledge distillation for enhanced depression detection |
| title_short | Cross-modal knowledge distillation for enhanced depression detection |
| title_sort | cross modal knowledge distillation for enhanced depression detection |
| topic | Brain information Speech signals Depression detection Cross-modal Knowledge distillation |
| url | https://doi.org/10.1007/s40747-025-02035-z |
| work_keys_str_mv | AT huanghuang crossmodalknowledgedistillationforenhanceddepressiondetection AT xinhuili crossmodalknowledgedistillationforenhanceddepressiondetection AT minchaowu crossmodalknowledgedistillationforenhanceddepressiondetection AT zhaolv crossmodalknowledgedistillationforenhanceddepressiondetection AT yongpeng crossmodalknowledgedistillationforenhanceddepressiondetection |