Development of a deep learning model to predict smoking status in patients with chronic obstructive pulmonary disease: A secondary analysis of cross-sectional national survey

Objective This study aims to develop and validate a deep learning model to predict smoking status in patients with chronic obstructive pulmonary disease (COPD) using data from a national survey. Methods Data from the Korea National Health and Nutrition Examination Survey (2007–2018) were used to ext...

Full description

Saved in:
Bibliographic Details
Main Authors: Sudarshan Pant, Hyung Jeong Yang, Sehyun Cho, EuiJeong Ryu, Ja Yun Choi
Format: Article
Language:English
Published: SAGE Publishing 2025-04-01
Series:Digital Health
Online Access:https://doi.org/10.1177/20552076251333660
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849712274536660992
author Sudarshan Pant
Hyung Jeong Yang
Sehyun Cho
EuiJeong Ryu
Ja Yun Choi
author_facet Sudarshan Pant
Hyung Jeong Yang
Sehyun Cho
EuiJeong Ryu
Ja Yun Choi
author_sort Sudarshan Pant
collection DOAJ
description Objective This study aims to develop and validate a deep learning model to predict smoking status in patients with chronic obstructive pulmonary disease (COPD) using data from a national survey. Methods Data from the Korea National Health and Nutrition Examination Survey (2007–2018) were used to extract 5466 COPD-eligible cases. The data collection involved demographic, behavioral, and clinical variables, including 21 predictors such as age, sex, and pulmonary function test results. The dependent variable, smoking status, was categorized as smoker or nonsmoker. A residual neural network (ResNN) model was developed and compared with five machine learning algorithms (random forest, decision tree, Gaussian Naive Bayes, K-nearest neighbor, and AdaBoost) and two deep learning models (multilayer perceptron and TabNet). Internal validation was performed using five-fold cross-validation, and model performance was evaluated using the area under the receiver operating characteristic (AUROC) curve, sensitivity, specificity, and F1-score. Results The ResNN achieved an AUROC, sensitivity, specificity, and F1-score of 0.73, 70.1%, 75.2%, and 0.67, respectively, outperforming previous machine learning and deep learning models in predicting smoking status in patients with COPD. Explainable artificial intelligence (Shapley additive explanations) identified key predictors, including sex, age, and perceived health status. Conclusion This deep learning model accurately predicts smoking status in patients with COPD, offering potential as a decision-support tool to detect high-risk persistent smokers for targeted interventions. Future studies should focus on external validation and incorporate additional behavioral and psychological variables to improve its generalizability and performance.
format Article
id doaj-art-3decaf0ea98b4f7586cc1327aae9ecef
institution DOAJ
issn 2055-2076
language English
publishDate 2025-04-01
publisher SAGE Publishing
record_format Article
series Digital Health
spelling doaj-art-3decaf0ea98b4f7586cc1327aae9ecef2025-08-20T03:14:19ZengSAGE PublishingDigital Health2055-20762025-04-011110.1177/20552076251333660Development of a deep learning model to predict smoking status in patients with chronic obstructive pulmonary disease: A secondary analysis of cross-sectional national surveySudarshan Pant0Hyung Jeong Yang1Sehyun Cho2EuiJeong Ryu3Ja Yun Choi4 Department of Artificial Intelligence Convergence, , Gwangju, Republic of Korea Department of Artificial Intelligence Convergence, , Gwangju, Republic of Korea College of Nursing, , Gwangju, Republic of Korea Department of Nursing, , Naju, Republic of Korea College of Nursing, , Chonnam Research Institute of Nursing Science, Gwangju, Republic of KoreaObjective This study aims to develop and validate a deep learning model to predict smoking status in patients with chronic obstructive pulmonary disease (COPD) using data from a national survey. Methods Data from the Korea National Health and Nutrition Examination Survey (2007–2018) were used to extract 5466 COPD-eligible cases. The data collection involved demographic, behavioral, and clinical variables, including 21 predictors such as age, sex, and pulmonary function test results. The dependent variable, smoking status, was categorized as smoker or nonsmoker. A residual neural network (ResNN) model was developed and compared with five machine learning algorithms (random forest, decision tree, Gaussian Naive Bayes, K-nearest neighbor, and AdaBoost) and two deep learning models (multilayer perceptron and TabNet). Internal validation was performed using five-fold cross-validation, and model performance was evaluated using the area under the receiver operating characteristic (AUROC) curve, sensitivity, specificity, and F1-score. Results The ResNN achieved an AUROC, sensitivity, specificity, and F1-score of 0.73, 70.1%, 75.2%, and 0.67, respectively, outperforming previous machine learning and deep learning models in predicting smoking status in patients with COPD. Explainable artificial intelligence (Shapley additive explanations) identified key predictors, including sex, age, and perceived health status. Conclusion This deep learning model accurately predicts smoking status in patients with COPD, offering potential as a decision-support tool to detect high-risk persistent smokers for targeted interventions. Future studies should focus on external validation and incorporate additional behavioral and psychological variables to improve its generalizability and performance.https://doi.org/10.1177/20552076251333660
spellingShingle Sudarshan Pant
Hyung Jeong Yang
Sehyun Cho
EuiJeong Ryu
Ja Yun Choi
Development of a deep learning model to predict smoking status in patients with chronic obstructive pulmonary disease: A secondary analysis of cross-sectional national survey
Digital Health
title Development of a deep learning model to predict smoking status in patients with chronic obstructive pulmonary disease: A secondary analysis of cross-sectional national survey
title_full Development of a deep learning model to predict smoking status in patients with chronic obstructive pulmonary disease: A secondary analysis of cross-sectional national survey
title_fullStr Development of a deep learning model to predict smoking status in patients with chronic obstructive pulmonary disease: A secondary analysis of cross-sectional national survey
title_full_unstemmed Development of a deep learning model to predict smoking status in patients with chronic obstructive pulmonary disease: A secondary analysis of cross-sectional national survey
title_short Development of a deep learning model to predict smoking status in patients with chronic obstructive pulmonary disease: A secondary analysis of cross-sectional national survey
title_sort development of a deep learning model to predict smoking status in patients with chronic obstructive pulmonary disease a secondary analysis of cross sectional national survey
url https://doi.org/10.1177/20552076251333660
work_keys_str_mv AT sudarshanpant developmentofadeeplearningmodeltopredictsmokingstatusinpatientswithchronicobstructivepulmonarydiseaseasecondaryanalysisofcrosssectionalnationalsurvey
AT hyungjeongyang developmentofadeeplearningmodeltopredictsmokingstatusinpatientswithchronicobstructivepulmonarydiseaseasecondaryanalysisofcrosssectionalnationalsurvey
AT sehyuncho developmentofadeeplearningmodeltopredictsmokingstatusinpatientswithchronicobstructivepulmonarydiseaseasecondaryanalysisofcrosssectionalnationalsurvey
AT euijeongryu developmentofadeeplearningmodeltopredictsmokingstatusinpatientswithchronicobstructivepulmonarydiseaseasecondaryanalysisofcrosssectionalnationalsurvey
AT jayunchoi developmentofadeeplearningmodeltopredictsmokingstatusinpatientswithchronicobstructivepulmonarydiseaseasecondaryanalysisofcrosssectionalnationalsurvey