Developing an interpretable machine learning predictive model of chronic obstructive pulmonary disease by serum PFAS concentration

BackgroundChronic obstructive pulmonary disease (COPD) is a leading cause of morbidity and mortality worldwide, with limited early detection strategies. While previous studies have examined the relationship between per- and polyfluoroalkyl substances (PFAS) and COPD, limited research has applied int...

Full description

Saved in:
Bibliographic Details
Main Authors: Xiaomei Shao, Ling Zhang, Yuting Wang, Youmei Ying, Xueqin Chen
Format: Article
Language:English
Published: Frontiers Media S.A. 2025-07-01
Series:Frontiers in Public Health
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fpubh.2025.1602566/full
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849319425462763520
author Xiaomei Shao
Ling Zhang
Yuting Wang
Youmei Ying
Xueqin Chen
author_facet Xiaomei Shao
Ling Zhang
Yuting Wang
Youmei Ying
Xueqin Chen
author_sort Xiaomei Shao
collection DOAJ
description BackgroundChronic obstructive pulmonary disease (COPD) is a leading cause of morbidity and mortality worldwide, with limited early detection strategies. While previous studies have examined the relationship between per- and polyfluoroalkyl substances (PFAS) and COPD, limited research has applied interpretable machine learning (ML) techniques to this association.MethodsWe investigated the association between PFAS exposure and COPD risk in 4,450 National Health and Nutrition Examination Survey (NHANES) participants from 2013 to 2018. After excluding missing covariates and extreme PFAS values and applying K-nearest neighbors (KNN) imputation, nine ML models, including CatBoost, were built and evaluated using metrics like accuracy, area under the curve (AUC), sensitivity, and specificity. The best-performing model was further analyzed using partial dependence plots (PDP) and SHapley additive exPlanations (SHAP) analysis. To enhance clinical applicability, the final model was deployed as a publicly accessible web-based risk calculator.ResultsCatBoost emerged as the best model, achieving an accuracy of 84%, AUC of 0.89, sensitivity of 81%, and specificity of 84%. PDP revealed that higher perfluorooctane sulfonic acid (PFOS) and perfluoroundecanoic acid (PFUA) levels were associated with reduced COPD risk, whereas perfluorooctanoic acid (PFOA) and 2-(N-Methyl-perfluorooctane sulfonamido) acetic acid (MPAH) showed positive associations with COPD. perfluorononanoic acid (PFNA), perfluorodecanoic acid (PFDE), and perfluorohexane sulfonic acid (PFHxS) demonstrated mixed or non-linear effects. SHAP analysis provided insights into individual predictions and overall variable contributions, clarifying the complex PFAS-COPD relationship. The deployed web-based calculator enables interactive prediction and risk interpretation, supporting potential public health applications.ConclusionCatBoost identified PFOS and PFUA as protective factors against COPD, while PFOA and MPAH increased risk of COPD. These findings emphasize the need for stricter PFAS regulation and highlight the potential of machine learning in guiding prevention strategies.
format Article
id doaj-art-cee12e2ba40547d283a0f3d2cf7bb862
institution Kabale University
issn 2296-2565
language English
publishDate 2025-07-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Public Health
spelling doaj-art-cee12e2ba40547d283a0f3d2cf7bb8622025-08-20T03:50:26ZengFrontiers Media S.A.Frontiers in Public Health2296-25652025-07-011310.3389/fpubh.2025.16025661602566Developing an interpretable machine learning predictive model of chronic obstructive pulmonary disease by serum PFAS concentrationXiaomei Shao0Ling Zhang1Yuting Wang2Youmei Ying3Xueqin Chen4Nanjing Jiangbei Hospital, Affiliated Nanjing Jiangbei Hospital of Xinglin College, Nantong University, Jiangsu, ChinaHuai'an No. 3 People's Hospital, Huaian Second Clinical College of Xuzhou Medical University, Jiangsu, ChinaNanjing Jiangbei Hospital, Affiliated Nanjing Jiangbei Hospital of Xinglin College, Nantong University, Jiangsu, ChinaNanjing Jiangbei Hospital, Affiliated Nanjing Jiangbei Hospital of Xinglin College, Nantong University, Jiangsu, ChinaThe Affiliated Taizhou People's Hospital of Nanjing Medical University, Taizhou School of Clinical Medicine, Nanjing Medical University, Taizhou, Jiangsu, ChinaBackgroundChronic obstructive pulmonary disease (COPD) is a leading cause of morbidity and mortality worldwide, with limited early detection strategies. While previous studies have examined the relationship between per- and polyfluoroalkyl substances (PFAS) and COPD, limited research has applied interpretable machine learning (ML) techniques to this association.MethodsWe investigated the association between PFAS exposure and COPD risk in 4,450 National Health and Nutrition Examination Survey (NHANES) participants from 2013 to 2018. After excluding missing covariates and extreme PFAS values and applying K-nearest neighbors (KNN) imputation, nine ML models, including CatBoost, were built and evaluated using metrics like accuracy, area under the curve (AUC), sensitivity, and specificity. The best-performing model was further analyzed using partial dependence plots (PDP) and SHapley additive exPlanations (SHAP) analysis. To enhance clinical applicability, the final model was deployed as a publicly accessible web-based risk calculator.ResultsCatBoost emerged as the best model, achieving an accuracy of 84%, AUC of 0.89, sensitivity of 81%, and specificity of 84%. PDP revealed that higher perfluorooctane sulfonic acid (PFOS) and perfluoroundecanoic acid (PFUA) levels were associated with reduced COPD risk, whereas perfluorooctanoic acid (PFOA) and 2-(N-Methyl-perfluorooctane sulfonamido) acetic acid (MPAH) showed positive associations with COPD. perfluorononanoic acid (PFNA), perfluorodecanoic acid (PFDE), and perfluorohexane sulfonic acid (PFHxS) demonstrated mixed or non-linear effects. SHAP analysis provided insights into individual predictions and overall variable contributions, clarifying the complex PFAS-COPD relationship. The deployed web-based calculator enables interactive prediction and risk interpretation, supporting potential public health applications.ConclusionCatBoost identified PFOS and PFUA as protective factors against COPD, while PFOA and MPAH increased risk of COPD. These findings emphasize the need for stricter PFAS regulation and highlight the potential of machine learning in guiding prevention strategies.https://www.frontiersin.org/articles/10.3389/fpubh.2025.1602566/fullchronic obstructive pulmonary diseasemachine learningpartial dependence plotSHapley additive exPlanationsenvironment pollution
spellingShingle Xiaomei Shao
Ling Zhang
Yuting Wang
Youmei Ying
Xueqin Chen
Developing an interpretable machine learning predictive model of chronic obstructive pulmonary disease by serum PFAS concentration
Frontiers in Public Health
chronic obstructive pulmonary disease
machine learning
partial dependence plot
SHapley additive exPlanations
environment pollution
title Developing an interpretable machine learning predictive model of chronic obstructive pulmonary disease by serum PFAS concentration
title_full Developing an interpretable machine learning predictive model of chronic obstructive pulmonary disease by serum PFAS concentration
title_fullStr Developing an interpretable machine learning predictive model of chronic obstructive pulmonary disease by serum PFAS concentration
title_full_unstemmed Developing an interpretable machine learning predictive model of chronic obstructive pulmonary disease by serum PFAS concentration
title_short Developing an interpretable machine learning predictive model of chronic obstructive pulmonary disease by serum PFAS concentration
title_sort developing an interpretable machine learning predictive model of chronic obstructive pulmonary disease by serum pfas concentration
topic chronic obstructive pulmonary disease
machine learning
partial dependence plot
SHapley additive exPlanations
environment pollution
url https://www.frontiersin.org/articles/10.3389/fpubh.2025.1602566/full
work_keys_str_mv AT xiaomeishao developinganinterpretablemachinelearningpredictivemodelofchronicobstructivepulmonarydiseasebyserumpfasconcentration
AT lingzhang developinganinterpretablemachinelearningpredictivemodelofchronicobstructivepulmonarydiseasebyserumpfasconcentration
AT yutingwang developinganinterpretablemachinelearningpredictivemodelofchronicobstructivepulmonarydiseasebyserumpfasconcentration
AT youmeiying developinganinterpretablemachinelearningpredictivemodelofchronicobstructivepulmonarydiseasebyserumpfasconcentration
AT xueqinchen developinganinterpretablemachinelearningpredictivemodelofchronicobstructivepulmonarydiseasebyserumpfasconcentration