Development of a deep learning model to predict smoking status in patients with chronic obstructive pulmonary disease: A secondary analysis of cross-sectional national survey
Objective This study aims to develop and validate a deep learning model to predict smoking status in patients with chronic obstructive pulmonary disease (COPD) using data from a national survey. Methods Data from the Korea National Health and Nutrition Examination Survey (2007–2018) were used to ext...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
SAGE Publishing
2025-04-01
|
| Series: | Digital Health |
| Online Access: | https://doi.org/10.1177/20552076251333660 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849712274536660992 |
|---|---|
| author | Sudarshan Pant Hyung Jeong Yang Sehyun Cho EuiJeong Ryu Ja Yun Choi |
| author_facet | Sudarshan Pant Hyung Jeong Yang Sehyun Cho EuiJeong Ryu Ja Yun Choi |
| author_sort | Sudarshan Pant |
| collection | DOAJ |
| description | Objective This study aims to develop and validate a deep learning model to predict smoking status in patients with chronic obstructive pulmonary disease (COPD) using data from a national survey. Methods Data from the Korea National Health and Nutrition Examination Survey (2007–2018) were used to extract 5466 COPD-eligible cases. The data collection involved demographic, behavioral, and clinical variables, including 21 predictors such as age, sex, and pulmonary function test results. The dependent variable, smoking status, was categorized as smoker or nonsmoker. A residual neural network (ResNN) model was developed and compared with five machine learning algorithms (random forest, decision tree, Gaussian Naive Bayes, K-nearest neighbor, and AdaBoost) and two deep learning models (multilayer perceptron and TabNet). Internal validation was performed using five-fold cross-validation, and model performance was evaluated using the area under the receiver operating characteristic (AUROC) curve, sensitivity, specificity, and F1-score. Results The ResNN achieved an AUROC, sensitivity, specificity, and F1-score of 0.73, 70.1%, 75.2%, and 0.67, respectively, outperforming previous machine learning and deep learning models in predicting smoking status in patients with COPD. Explainable artificial intelligence (Shapley additive explanations) identified key predictors, including sex, age, and perceived health status. Conclusion This deep learning model accurately predicts smoking status in patients with COPD, offering potential as a decision-support tool to detect high-risk persistent smokers for targeted interventions. Future studies should focus on external validation and incorporate additional behavioral and psychological variables to improve its generalizability and performance. |
| format | Article |
| id | doaj-art-3decaf0ea98b4f7586cc1327aae9ecef |
| institution | DOAJ |
| issn | 2055-2076 |
| language | English |
| publishDate | 2025-04-01 |
| publisher | SAGE Publishing |
| record_format | Article |
| series | Digital Health |
| spelling | doaj-art-3decaf0ea98b4f7586cc1327aae9ecef2025-08-20T03:14:19ZengSAGE PublishingDigital Health2055-20762025-04-011110.1177/20552076251333660Development of a deep learning model to predict smoking status in patients with chronic obstructive pulmonary disease: A secondary analysis of cross-sectional national surveySudarshan Pant0Hyung Jeong Yang1Sehyun Cho2EuiJeong Ryu3Ja Yun Choi4 Department of Artificial Intelligence Convergence, , Gwangju, Republic of Korea Department of Artificial Intelligence Convergence, , Gwangju, Republic of Korea College of Nursing, , Gwangju, Republic of Korea Department of Nursing, , Naju, Republic of Korea College of Nursing, , Chonnam Research Institute of Nursing Science, Gwangju, Republic of KoreaObjective This study aims to develop and validate a deep learning model to predict smoking status in patients with chronic obstructive pulmonary disease (COPD) using data from a national survey. Methods Data from the Korea National Health and Nutrition Examination Survey (2007–2018) were used to extract 5466 COPD-eligible cases. The data collection involved demographic, behavioral, and clinical variables, including 21 predictors such as age, sex, and pulmonary function test results. The dependent variable, smoking status, was categorized as smoker or nonsmoker. A residual neural network (ResNN) model was developed and compared with five machine learning algorithms (random forest, decision tree, Gaussian Naive Bayes, K-nearest neighbor, and AdaBoost) and two deep learning models (multilayer perceptron and TabNet). Internal validation was performed using five-fold cross-validation, and model performance was evaluated using the area under the receiver operating characteristic (AUROC) curve, sensitivity, specificity, and F1-score. Results The ResNN achieved an AUROC, sensitivity, specificity, and F1-score of 0.73, 70.1%, 75.2%, and 0.67, respectively, outperforming previous machine learning and deep learning models in predicting smoking status in patients with COPD. Explainable artificial intelligence (Shapley additive explanations) identified key predictors, including sex, age, and perceived health status. Conclusion This deep learning model accurately predicts smoking status in patients with COPD, offering potential as a decision-support tool to detect high-risk persistent smokers for targeted interventions. Future studies should focus on external validation and incorporate additional behavioral and psychological variables to improve its generalizability and performance.https://doi.org/10.1177/20552076251333660 |
| spellingShingle | Sudarshan Pant Hyung Jeong Yang Sehyun Cho EuiJeong Ryu Ja Yun Choi Development of a deep learning model to predict smoking status in patients with chronic obstructive pulmonary disease: A secondary analysis of cross-sectional national survey Digital Health |
| title | Development of a deep learning model to predict smoking status in patients with chronic obstructive pulmonary disease: A secondary analysis of cross-sectional national survey |
| title_full | Development of a deep learning model to predict smoking status in patients with chronic obstructive pulmonary disease: A secondary analysis of cross-sectional national survey |
| title_fullStr | Development of a deep learning model to predict smoking status in patients with chronic obstructive pulmonary disease: A secondary analysis of cross-sectional national survey |
| title_full_unstemmed | Development of a deep learning model to predict smoking status in patients with chronic obstructive pulmonary disease: A secondary analysis of cross-sectional national survey |
| title_short | Development of a deep learning model to predict smoking status in patients with chronic obstructive pulmonary disease: A secondary analysis of cross-sectional national survey |
| title_sort | development of a deep learning model to predict smoking status in patients with chronic obstructive pulmonary disease a secondary analysis of cross sectional national survey |
| url | https://doi.org/10.1177/20552076251333660 |
| work_keys_str_mv | AT sudarshanpant developmentofadeeplearningmodeltopredictsmokingstatusinpatientswithchronicobstructivepulmonarydiseaseasecondaryanalysisofcrosssectionalnationalsurvey AT hyungjeongyang developmentofadeeplearningmodeltopredictsmokingstatusinpatientswithchronicobstructivepulmonarydiseaseasecondaryanalysisofcrosssectionalnationalsurvey AT sehyuncho developmentofadeeplearningmodeltopredictsmokingstatusinpatientswithchronicobstructivepulmonarydiseaseasecondaryanalysisofcrosssectionalnationalsurvey AT euijeongryu developmentofadeeplearningmodeltopredictsmokingstatusinpatientswithchronicobstructivepulmonarydiseaseasecondaryanalysisofcrosssectionalnationalsurvey AT jayunchoi developmentofadeeplearningmodeltopredictsmokingstatusinpatientswithchronicobstructivepulmonarydiseaseasecondaryanalysisofcrosssectionalnationalsurvey |