Developing an interpretable machine learning predictive model of chronic obstructive pulmonary disease by serum PFAS concentration
BackgroundChronic obstructive pulmonary disease (COPD) is a leading cause of morbidity and mortality worldwide, with limited early detection strategies. While previous studies have examined the relationship between per- and polyfluoroalkyl substances (PFAS) and COPD, limited research has applied int...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Frontiers Media S.A.
2025-07-01
|
| Series: | Frontiers in Public Health |
| Subjects: | |
| Online Access: | https://www.frontiersin.org/articles/10.3389/fpubh.2025.1602566/full |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849319425462763520 |
|---|---|
| author | Xiaomei Shao Ling Zhang Yuting Wang Youmei Ying Xueqin Chen |
| author_facet | Xiaomei Shao Ling Zhang Yuting Wang Youmei Ying Xueqin Chen |
| author_sort | Xiaomei Shao |
| collection | DOAJ |
| description | BackgroundChronic obstructive pulmonary disease (COPD) is a leading cause of morbidity and mortality worldwide, with limited early detection strategies. While previous studies have examined the relationship between per- and polyfluoroalkyl substances (PFAS) and COPD, limited research has applied interpretable machine learning (ML) techniques to this association.MethodsWe investigated the association between PFAS exposure and COPD risk in 4,450 National Health and Nutrition Examination Survey (NHANES) participants from 2013 to 2018. After excluding missing covariates and extreme PFAS values and applying K-nearest neighbors (KNN) imputation, nine ML models, including CatBoost, were built and evaluated using metrics like accuracy, area under the curve (AUC), sensitivity, and specificity. The best-performing model was further analyzed using partial dependence plots (PDP) and SHapley additive exPlanations (SHAP) analysis. To enhance clinical applicability, the final model was deployed as a publicly accessible web-based risk calculator.ResultsCatBoost emerged as the best model, achieving an accuracy of 84%, AUC of 0.89, sensitivity of 81%, and specificity of 84%. PDP revealed that higher perfluorooctane sulfonic acid (PFOS) and perfluoroundecanoic acid (PFUA) levels were associated with reduced COPD risk, whereas perfluorooctanoic acid (PFOA) and 2-(N-Methyl-perfluorooctane sulfonamido) acetic acid (MPAH) showed positive associations with COPD. perfluorononanoic acid (PFNA), perfluorodecanoic acid (PFDE), and perfluorohexane sulfonic acid (PFHxS) demonstrated mixed or non-linear effects. SHAP analysis provided insights into individual predictions and overall variable contributions, clarifying the complex PFAS-COPD relationship. The deployed web-based calculator enables interactive prediction and risk interpretation, supporting potential public health applications.ConclusionCatBoost identified PFOS and PFUA as protective factors against COPD, while PFOA and MPAH increased risk of COPD. These findings emphasize the need for stricter PFAS regulation and highlight the potential of machine learning in guiding prevention strategies. |
| format | Article |
| id | doaj-art-cee12e2ba40547d283a0f3d2cf7bb862 |
| institution | Kabale University |
| issn | 2296-2565 |
| language | English |
| publishDate | 2025-07-01 |
| publisher | Frontiers Media S.A. |
| record_format | Article |
| series | Frontiers in Public Health |
| spelling | doaj-art-cee12e2ba40547d283a0f3d2cf7bb8622025-08-20T03:50:26ZengFrontiers Media S.A.Frontiers in Public Health2296-25652025-07-011310.3389/fpubh.2025.16025661602566Developing an interpretable machine learning predictive model of chronic obstructive pulmonary disease by serum PFAS concentrationXiaomei Shao0Ling Zhang1Yuting Wang2Youmei Ying3Xueqin Chen4Nanjing Jiangbei Hospital, Affiliated Nanjing Jiangbei Hospital of Xinglin College, Nantong University, Jiangsu, ChinaHuai'an No. 3 People's Hospital, Huaian Second Clinical College of Xuzhou Medical University, Jiangsu, ChinaNanjing Jiangbei Hospital, Affiliated Nanjing Jiangbei Hospital of Xinglin College, Nantong University, Jiangsu, ChinaNanjing Jiangbei Hospital, Affiliated Nanjing Jiangbei Hospital of Xinglin College, Nantong University, Jiangsu, ChinaThe Affiliated Taizhou People's Hospital of Nanjing Medical University, Taizhou School of Clinical Medicine, Nanjing Medical University, Taizhou, Jiangsu, ChinaBackgroundChronic obstructive pulmonary disease (COPD) is a leading cause of morbidity and mortality worldwide, with limited early detection strategies. While previous studies have examined the relationship between per- and polyfluoroalkyl substances (PFAS) and COPD, limited research has applied interpretable machine learning (ML) techniques to this association.MethodsWe investigated the association between PFAS exposure and COPD risk in 4,450 National Health and Nutrition Examination Survey (NHANES) participants from 2013 to 2018. After excluding missing covariates and extreme PFAS values and applying K-nearest neighbors (KNN) imputation, nine ML models, including CatBoost, were built and evaluated using metrics like accuracy, area under the curve (AUC), sensitivity, and specificity. The best-performing model was further analyzed using partial dependence plots (PDP) and SHapley additive exPlanations (SHAP) analysis. To enhance clinical applicability, the final model was deployed as a publicly accessible web-based risk calculator.ResultsCatBoost emerged as the best model, achieving an accuracy of 84%, AUC of 0.89, sensitivity of 81%, and specificity of 84%. PDP revealed that higher perfluorooctane sulfonic acid (PFOS) and perfluoroundecanoic acid (PFUA) levels were associated with reduced COPD risk, whereas perfluorooctanoic acid (PFOA) and 2-(N-Methyl-perfluorooctane sulfonamido) acetic acid (MPAH) showed positive associations with COPD. perfluorononanoic acid (PFNA), perfluorodecanoic acid (PFDE), and perfluorohexane sulfonic acid (PFHxS) demonstrated mixed or non-linear effects. SHAP analysis provided insights into individual predictions and overall variable contributions, clarifying the complex PFAS-COPD relationship. The deployed web-based calculator enables interactive prediction and risk interpretation, supporting potential public health applications.ConclusionCatBoost identified PFOS and PFUA as protective factors against COPD, while PFOA and MPAH increased risk of COPD. These findings emphasize the need for stricter PFAS regulation and highlight the potential of machine learning in guiding prevention strategies.https://www.frontiersin.org/articles/10.3389/fpubh.2025.1602566/fullchronic obstructive pulmonary diseasemachine learningpartial dependence plotSHapley additive exPlanationsenvironment pollution |
| spellingShingle | Xiaomei Shao Ling Zhang Yuting Wang Youmei Ying Xueqin Chen Developing an interpretable machine learning predictive model of chronic obstructive pulmonary disease by serum PFAS concentration Frontiers in Public Health chronic obstructive pulmonary disease machine learning partial dependence plot SHapley additive exPlanations environment pollution |
| title | Developing an interpretable machine learning predictive model of chronic obstructive pulmonary disease by serum PFAS concentration |
| title_full | Developing an interpretable machine learning predictive model of chronic obstructive pulmonary disease by serum PFAS concentration |
| title_fullStr | Developing an interpretable machine learning predictive model of chronic obstructive pulmonary disease by serum PFAS concentration |
| title_full_unstemmed | Developing an interpretable machine learning predictive model of chronic obstructive pulmonary disease by serum PFAS concentration |
| title_short | Developing an interpretable machine learning predictive model of chronic obstructive pulmonary disease by serum PFAS concentration |
| title_sort | developing an interpretable machine learning predictive model of chronic obstructive pulmonary disease by serum pfas concentration |
| topic | chronic obstructive pulmonary disease machine learning partial dependence plot SHapley additive exPlanations environment pollution |
| url | https://www.frontiersin.org/articles/10.3389/fpubh.2025.1602566/full |
| work_keys_str_mv | AT xiaomeishao developinganinterpretablemachinelearningpredictivemodelofchronicobstructivepulmonarydiseasebyserumpfasconcentration AT lingzhang developinganinterpretablemachinelearningpredictivemodelofchronicobstructivepulmonarydiseasebyserumpfasconcentration AT yutingwang developinganinterpretablemachinelearningpredictivemodelofchronicobstructivepulmonarydiseasebyserumpfasconcentration AT youmeiying developinganinterpretablemachinelearningpredictivemodelofchronicobstructivepulmonarydiseasebyserumpfasconcentration AT xueqinchen developinganinterpretablemachinelearningpredictivemodelofchronicobstructivepulmonarydiseasebyserumpfasconcentration |