Prediction of depressive disorder using machine learning approaches: findings from the NHANES
Abstract Background Depressive disorder, particularly major depressive disorder (MDD), significantly impact individuals and society. Traditional analysis methods often suffer from subjectivity and may not capture complex, non-linear relationships between risk factors. Machine learning (ML) offers a...
Saved in:
| Main Authors: | , , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
BMC
2025-02-01
|
| Series: | BMC Medical Informatics and Decision Making |
| Subjects: | |
| Online Access: | https://doi.org/10.1186/s12911-025-02903-1 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849723785685499904 |
|---|---|
| author | Thien Vu Research Dawadi Masaki Yamamoto Jie Ting Tay Naoki Watanabe Yuki Kuriya Ai Oya Phap Ngoc Hoang Tran Michihiro Araki |
| author_facet | Thien Vu Research Dawadi Masaki Yamamoto Jie Ting Tay Naoki Watanabe Yuki Kuriya Ai Oya Phap Ngoc Hoang Tran Michihiro Araki |
| author_sort | Thien Vu |
| collection | DOAJ |
| description | Abstract Background Depressive disorder, particularly major depressive disorder (MDD), significantly impact individuals and society. Traditional analysis methods often suffer from subjectivity and may not capture complex, non-linear relationships between risk factors. Machine learning (ML) offers a data-driven approach to predict and diagnose depression more accurately by analyzing large and complex datasets. Methods This study utilized data from the National Health and Nutrition Examination Survey (NHANES) 2013–2014 to predict depression using six supervised ML models: Logistic Regression, Random Forest, Naive Bayes, Support Vector Machine (SVM), Extreme Gradient Boost (XGBoost), and Light Gradient Boosting Machine (LightGBM). Depression was assessed using the Patient Health Questionnaire (PHQ-9), with a score of 10 or higher indicating moderate to severe depression. The dataset was split into training and testing sets (80% and 20%, respectively), and model performance was evaluated using accuracy, sensitivity, specificity, precision, AUC, and F1 score. SHAP (SHapley Additive exPlanations) values were used to identify the critical risk factors and interpret the contributions of each feature to the prediction. Results XGBoost was identified as the best-performing model, achieving the highest accuracy, sensitivity, specificity, precision, AUC, and F1 score. SHAP analysis highlighted the most significant predictors of depression: the ratio family income to poverty (PIR), sex, hypertension, serum cotinine and hydroxycotine, BMI, education level, glucose levels, age, marital status, and renal function (eGFR). Conclusion We developed ML models to predict depression and utilized SHAP for interpretation. This approach identifies key factors associated with depression, encompassing socioeconomic, demographic, and health-related aspects. |
| format | Article |
| id | doaj-art-32e68a74f4cf4e708c7a39c9307d8032 |
| institution | DOAJ |
| issn | 1472-6947 |
| language | English |
| publishDate | 2025-02-01 |
| publisher | BMC |
| record_format | Article |
| series | BMC Medical Informatics and Decision Making |
| spelling | doaj-art-32e68a74f4cf4e708c7a39c9307d80322025-08-20T03:10:55ZengBMCBMC Medical Informatics and Decision Making1472-69472025-02-0125111210.1186/s12911-025-02903-1Prediction of depressive disorder using machine learning approaches: findings from the NHANESThien Vu0Research Dawadi1Masaki Yamamoto2Jie Ting Tay3Naoki Watanabe4Yuki Kuriya5Ai Oya6Phap Ngoc Hoang Tran7Michihiro Araki8Artificial Intelligence Center for Health and Biomedical Research, National Institutes of Biomedical Innovation, Health and NutritionArtificial Intelligence Center for Health and Biomedical Research, National Institutes of Biomedical Innovation, Health and NutritionArtificial Intelligence Center for Health and Biomedical Research, National Institutes of Biomedical Innovation, Health and NutritionArtificial Intelligence Center for Health and Biomedical Research, National Institutes of Biomedical Innovation, Health and NutritionArtificial Intelligence Center for Health and Biomedical Research, National Institutes of Biomedical Innovation, Health and NutritionArtificial Intelligence Center for Health and Biomedical Research, National Institutes of Biomedical Innovation, Health and NutritionArtificial Intelligence Center for Health and Biomedical Research, National Institutes of Biomedical Innovation, Health and NutritionArtificial Intelligence Center for Health and Biomedical Research, National Institutes of Biomedical Innovation, Health and NutritionArtificial Intelligence Center for Health and Biomedical Research, National Institutes of Biomedical Innovation, Health and NutritionAbstract Background Depressive disorder, particularly major depressive disorder (MDD), significantly impact individuals and society. Traditional analysis methods often suffer from subjectivity and may not capture complex, non-linear relationships between risk factors. Machine learning (ML) offers a data-driven approach to predict and diagnose depression more accurately by analyzing large and complex datasets. Methods This study utilized data from the National Health and Nutrition Examination Survey (NHANES) 2013–2014 to predict depression using six supervised ML models: Logistic Regression, Random Forest, Naive Bayes, Support Vector Machine (SVM), Extreme Gradient Boost (XGBoost), and Light Gradient Boosting Machine (LightGBM). Depression was assessed using the Patient Health Questionnaire (PHQ-9), with a score of 10 or higher indicating moderate to severe depression. The dataset was split into training and testing sets (80% and 20%, respectively), and model performance was evaluated using accuracy, sensitivity, specificity, precision, AUC, and F1 score. SHAP (SHapley Additive exPlanations) values were used to identify the critical risk factors and interpret the contributions of each feature to the prediction. Results XGBoost was identified as the best-performing model, achieving the highest accuracy, sensitivity, specificity, precision, AUC, and F1 score. SHAP analysis highlighted the most significant predictors of depression: the ratio family income to poverty (PIR), sex, hypertension, serum cotinine and hydroxycotine, BMI, education level, glucose levels, age, marital status, and renal function (eGFR). Conclusion We developed ML models to predict depression and utilized SHAP for interpretation. This approach identifies key factors associated with depression, encompassing socioeconomic, demographic, and health-related aspects.https://doi.org/10.1186/s12911-025-02903-1DepressionDepressive disorderSupervised machine learningLogistic regressionRandom forestNaïve bayes |
| spellingShingle | Thien Vu Research Dawadi Masaki Yamamoto Jie Ting Tay Naoki Watanabe Yuki Kuriya Ai Oya Phap Ngoc Hoang Tran Michihiro Araki Prediction of depressive disorder using machine learning approaches: findings from the NHANES BMC Medical Informatics and Decision Making Depression Depressive disorder Supervised machine learning Logistic regression Random forest Naïve bayes |
| title | Prediction of depressive disorder using machine learning approaches: findings from the NHANES |
| title_full | Prediction of depressive disorder using machine learning approaches: findings from the NHANES |
| title_fullStr | Prediction of depressive disorder using machine learning approaches: findings from the NHANES |
| title_full_unstemmed | Prediction of depressive disorder using machine learning approaches: findings from the NHANES |
| title_short | Prediction of depressive disorder using machine learning approaches: findings from the NHANES |
| title_sort | prediction of depressive disorder using machine learning approaches findings from the nhanes |
| topic | Depression Depressive disorder Supervised machine learning Logistic regression Random forest Naïve bayes |
| url | https://doi.org/10.1186/s12911-025-02903-1 |
| work_keys_str_mv | AT thienvu predictionofdepressivedisorderusingmachinelearningapproachesfindingsfromthenhanes AT researchdawadi predictionofdepressivedisorderusingmachinelearningapproachesfindingsfromthenhanes AT masakiyamamoto predictionofdepressivedisorderusingmachinelearningapproachesfindingsfromthenhanes AT jietingtay predictionofdepressivedisorderusingmachinelearningapproachesfindingsfromthenhanes AT naokiwatanabe predictionofdepressivedisorderusingmachinelearningapproachesfindingsfromthenhanes AT yukikuriya predictionofdepressivedisorderusingmachinelearningapproachesfindingsfromthenhanes AT aioya predictionofdepressivedisorderusingmachinelearningapproachesfindingsfromthenhanes AT phapngochoangtran predictionofdepressivedisorderusingmachinelearningapproachesfindingsfromthenhanes AT michihiroaraki predictionofdepressivedisorderusingmachinelearningapproachesfindingsfromthenhanes |