Prediction of depressive disorder using machine learning approaches: findings from the NHANES

Abstract Background Depressive disorder, particularly major depressive disorder (MDD), significantly impact individuals and society. Traditional analysis methods often suffer from subjectivity and may not capture complex, non-linear relationships between risk factors. Machine learning (ML) offers a...

Full description

Saved in:
Bibliographic Details
Main Authors: Thien Vu, Research Dawadi, Masaki Yamamoto, Jie Ting Tay, Naoki Watanabe, Yuki Kuriya, Ai Oya, Phap Ngoc Hoang Tran, Michihiro Araki
Format: Article
Language:English
Published: BMC 2025-02-01
Series:BMC Medical Informatics and Decision Making
Subjects:
Online Access:https://doi.org/10.1186/s12911-025-02903-1
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849723785685499904
author Thien Vu
Research Dawadi
Masaki Yamamoto
Jie Ting Tay
Naoki Watanabe
Yuki Kuriya
Ai Oya
Phap Ngoc Hoang Tran
Michihiro Araki
author_facet Thien Vu
Research Dawadi
Masaki Yamamoto
Jie Ting Tay
Naoki Watanabe
Yuki Kuriya
Ai Oya
Phap Ngoc Hoang Tran
Michihiro Araki
author_sort Thien Vu
collection DOAJ
description Abstract Background Depressive disorder, particularly major depressive disorder (MDD), significantly impact individuals and society. Traditional analysis methods often suffer from subjectivity and may not capture complex, non-linear relationships between risk factors. Machine learning (ML) offers a data-driven approach to predict and diagnose depression more accurately by analyzing large and complex datasets. Methods This study utilized data from the National Health and Nutrition Examination Survey (NHANES) 2013–2014 to predict depression using six supervised ML models: Logistic Regression, Random Forest, Naive Bayes, Support Vector Machine (SVM), Extreme Gradient Boost (XGBoost), and Light Gradient Boosting Machine (LightGBM). Depression was assessed using the Patient Health Questionnaire (PHQ-9), with a score of 10 or higher indicating moderate to severe depression. The dataset was split into training and testing sets (80% and 20%, respectively), and model performance was evaluated using accuracy, sensitivity, specificity, precision, AUC, and F1 score. SHAP (SHapley Additive exPlanations) values were used to identify the critical risk factors and interpret the contributions of each feature to the prediction. Results XGBoost was identified as the best-performing model, achieving the highest accuracy, sensitivity, specificity, precision, AUC, and F1 score. SHAP analysis highlighted the most significant predictors of depression: the ratio family income to poverty (PIR), sex, hypertension, serum cotinine and hydroxycotine, BMI, education level, glucose levels, age, marital status, and renal function (eGFR). Conclusion We developed ML models to predict depression and utilized SHAP for interpretation. This approach identifies key factors associated with depression, encompassing socioeconomic, demographic, and health-related aspects.
format Article
id doaj-art-32e68a74f4cf4e708c7a39c9307d8032
institution DOAJ
issn 1472-6947
language English
publishDate 2025-02-01
publisher BMC
record_format Article
series BMC Medical Informatics and Decision Making
spelling doaj-art-32e68a74f4cf4e708c7a39c9307d80322025-08-20T03:10:55ZengBMCBMC Medical Informatics and Decision Making1472-69472025-02-0125111210.1186/s12911-025-02903-1Prediction of depressive disorder using machine learning approaches: findings from the NHANESThien Vu0Research Dawadi1Masaki Yamamoto2Jie Ting Tay3Naoki Watanabe4Yuki Kuriya5Ai Oya6Phap Ngoc Hoang Tran7Michihiro Araki8Artificial Intelligence Center for Health and Biomedical Research, National Institutes of Biomedical Innovation, Health and NutritionArtificial Intelligence Center for Health and Biomedical Research, National Institutes of Biomedical Innovation, Health and NutritionArtificial Intelligence Center for Health and Biomedical Research, National Institutes of Biomedical Innovation, Health and NutritionArtificial Intelligence Center for Health and Biomedical Research, National Institutes of Biomedical Innovation, Health and NutritionArtificial Intelligence Center for Health and Biomedical Research, National Institutes of Biomedical Innovation, Health and NutritionArtificial Intelligence Center for Health and Biomedical Research, National Institutes of Biomedical Innovation, Health and NutritionArtificial Intelligence Center for Health and Biomedical Research, National Institutes of Biomedical Innovation, Health and NutritionArtificial Intelligence Center for Health and Biomedical Research, National Institutes of Biomedical Innovation, Health and NutritionArtificial Intelligence Center for Health and Biomedical Research, National Institutes of Biomedical Innovation, Health and NutritionAbstract Background Depressive disorder, particularly major depressive disorder (MDD), significantly impact individuals and society. Traditional analysis methods often suffer from subjectivity and may not capture complex, non-linear relationships between risk factors. Machine learning (ML) offers a data-driven approach to predict and diagnose depression more accurately by analyzing large and complex datasets. Methods This study utilized data from the National Health and Nutrition Examination Survey (NHANES) 2013–2014 to predict depression using six supervised ML models: Logistic Regression, Random Forest, Naive Bayes, Support Vector Machine (SVM), Extreme Gradient Boost (XGBoost), and Light Gradient Boosting Machine (LightGBM). Depression was assessed using the Patient Health Questionnaire (PHQ-9), with a score of 10 or higher indicating moderate to severe depression. The dataset was split into training and testing sets (80% and 20%, respectively), and model performance was evaluated using accuracy, sensitivity, specificity, precision, AUC, and F1 score. SHAP (SHapley Additive exPlanations) values were used to identify the critical risk factors and interpret the contributions of each feature to the prediction. Results XGBoost was identified as the best-performing model, achieving the highest accuracy, sensitivity, specificity, precision, AUC, and F1 score. SHAP analysis highlighted the most significant predictors of depression: the ratio family income to poverty (PIR), sex, hypertension, serum cotinine and hydroxycotine, BMI, education level, glucose levels, age, marital status, and renal function (eGFR). Conclusion We developed ML models to predict depression and utilized SHAP for interpretation. This approach identifies key factors associated with depression, encompassing socioeconomic, demographic, and health-related aspects.https://doi.org/10.1186/s12911-025-02903-1DepressionDepressive disorderSupervised machine learningLogistic regressionRandom forestNaïve bayes
spellingShingle Thien Vu
Research Dawadi
Masaki Yamamoto
Jie Ting Tay
Naoki Watanabe
Yuki Kuriya
Ai Oya
Phap Ngoc Hoang Tran
Michihiro Araki
Prediction of depressive disorder using machine learning approaches: findings from the NHANES
BMC Medical Informatics and Decision Making
Depression
Depressive disorder
Supervised machine learning
Logistic regression
Random forest
Naïve bayes
title Prediction of depressive disorder using machine learning approaches: findings from the NHANES
title_full Prediction of depressive disorder using machine learning approaches: findings from the NHANES
title_fullStr Prediction of depressive disorder using machine learning approaches: findings from the NHANES
title_full_unstemmed Prediction of depressive disorder using machine learning approaches: findings from the NHANES
title_short Prediction of depressive disorder using machine learning approaches: findings from the NHANES
title_sort prediction of depressive disorder using machine learning approaches findings from the nhanes
topic Depression
Depressive disorder
Supervised machine learning
Logistic regression
Random forest
Naïve bayes
url https://doi.org/10.1186/s12911-025-02903-1
work_keys_str_mv AT thienvu predictionofdepressivedisorderusingmachinelearningapproachesfindingsfromthenhanes
AT researchdawadi predictionofdepressivedisorderusingmachinelearningapproachesfindingsfromthenhanes
AT masakiyamamoto predictionofdepressivedisorderusingmachinelearningapproachesfindingsfromthenhanes
AT jietingtay predictionofdepressivedisorderusingmachinelearningapproachesfindingsfromthenhanes
AT naokiwatanabe predictionofdepressivedisorderusingmachinelearningapproachesfindingsfromthenhanes
AT yukikuriya predictionofdepressivedisorderusingmachinelearningapproachesfindingsfromthenhanes
AT aioya predictionofdepressivedisorderusingmachinelearningapproachesfindingsfromthenhanes
AT phapngochoangtran predictionofdepressivedisorderusingmachinelearningapproachesfindingsfromthenhanes
AT michihiroaraki predictionofdepressivedisorderusingmachinelearningapproachesfindingsfromthenhanes