Explainable Machine Learning in the Prediction of Depression

<b>Background:</b> Depression constitutes a major public health issue, being one of the leading causes of the burden of disease worldwide. The risk of depression is determined by both genetic and environmental factors. While genetic factors cannot be altered, the identification of potent...

Full description

Saved in:
Bibliographic Details
Main Authors: Christina Mimikou, Christos Kokkotis, Dimitrios Tsiptsios, Konstantinos Tsamakis, Stella Savvidou, Lillian Modig, Foteini Christidi, Antonia Kaltsatou, Triantafyllos Doskas, Christoph Mueller, Aspasia Serdari, Kostas Anagnostopoulos, Gregory Tripsianis
Format: Article
Language:English
Published: MDPI AG 2025-06-01
Series:Diagnostics
Subjects:
Online Access:https://www.mdpi.com/2075-4418/15/11/1412
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:<b>Background:</b> Depression constitutes a major public health issue, being one of the leading causes of the burden of disease worldwide. The risk of depression is determined by both genetic and environmental factors. While genetic factors cannot be altered, the identification of potentially reversible environmental factors is crucial in order to try and limit the prevalence of depression. <b>Aim:</b> A cross-sectional, questionnaire-based study on a sample from the multicultural region of Thrace in northeast Greece was designed to assess the potential association of depression with several sociodemographic characteristics, lifestyle, and health status. The study employed four machine learning (ML) methods to assess depression: logistic regression (LR), support vector machine (SVM), XGBoost, and neural networks (NNs). These models were compared to identify the best-performing approach. Additionally, a genetic algorithm (GA) was utilized for feature selection and SHAP (SHapley Additive exPlanations) for interpreting the contributions of each employed feature. <b>Results:</b> The XGBoost classifier demonstrated the highest performance on the test dataset to predict depression with excellent accuracy (97.83%), with NNs a close second (accuracy, 97.02%). The XGBoost classifier utilized the 15 most significant risk factors identified by the GA algorithm. Additionally, the SHAP analysis revealed that anxiety, education level, alcohol consumption, and body mass index were the most influential predictors of depression. <b>Conclusions:</b> These findings provide valuable insights for the development of personalized public health interventions and clinical strategies, ultimately promoting improved mental well-being for individuals. Future research should expand datasets to enhance model accuracy, enabling early detection and personalized mental healthcare systems for better intervention.
ISSN:2075-4418