Predictive model of malignancy probability in pulmonary nodules based on multicenter data

ObjectivesTo study the characteristic factors associated with the occurrence of malignant nodules in patients presenting with pulmonary nodules, develop a predictive model, and evaluate its diagnostic performance.MethodsThis study analyzed the clinical and imaging data of 830 patients with pulmonary...

Full description

Saved in:
Bibliographic Details
Main Authors: Yuyan Huang, Yong Chen, Fang He, Li Jiang
Format: Article
Language:English
Published: Frontiers Media S.A. 2025-05-01
Series:Frontiers in Oncology
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fonc.2025.1588147/full
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849762000232513536
author Yuyan Huang
Yong Chen
Fang He
Li Jiang
author_facet Yuyan Huang
Yong Chen
Fang He
Li Jiang
author_sort Yuyan Huang
collection DOAJ
description ObjectivesTo study the characteristic factors associated with the occurrence of malignant nodules in patients presenting with pulmonary nodules, develop a predictive model, and evaluate its diagnostic performance.MethodsThis study analyzed the clinical and imaging data of 830 patients with pulmonary nodules from the Affiliated Hospital of North Sichuan Medical College. The Least Absolute Shrinkage and Selection Operator (LASSO) and multivariate logistic regression analysis were utilized to identify characteristic predictors. Multiple machine learning classification models were employed for analysis, with the optimal model ultimately selected. A Shapley Additive Explanations (SHAP) framework was developed for personalized risk assessment. Finally, external testing was performed using data from 330 pulmonary nodule patients at Guang’an People’s Hospital.ResultsThe predictive factors for malignant pulmonary nodules included: age, gender, nodule diameter, spiculation, lobulation, calcification, vacuole, vascular convergence sign, air bronchogram sign, pleural traction, and density of the nodule. The Gradient Boosting Decision Tree (GBDT) classification model demonstrated optimal performance, with an area under the curve (AUC) of 0.873 (95% confidence interval [CI]: 0.840–0.906) on the internal test set and 0.726 (95% CI: 0.668–0.784) on the external test set. Both the calibration curve and clinical decision curve analysis (DCA) indicated excellent model calibration and substantial clinical benefits.ConclusionsWe developed a GBDT model that provides a basis for differentiating malignant pulmonary nodules, which may assist in the diagnosis and treatment of patients with pulmonary nodules.
format Article
id doaj-art-9e3cbe1a413e4055ade4f68a3f0abf1d
institution DOAJ
issn 2234-943X
language English
publishDate 2025-05-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Oncology
spelling doaj-art-9e3cbe1a413e4055ade4f68a3f0abf1d2025-08-20T03:05:52ZengFrontiers Media S.A.Frontiers in Oncology2234-943X2025-05-011510.3389/fonc.2025.15881471588147Predictive model of malignancy probability in pulmonary nodules based on multicenter dataYuyan HuangYong ChenFang HeLi JiangObjectivesTo study the characteristic factors associated with the occurrence of malignant nodules in patients presenting with pulmonary nodules, develop a predictive model, and evaluate its diagnostic performance.MethodsThis study analyzed the clinical and imaging data of 830 patients with pulmonary nodules from the Affiliated Hospital of North Sichuan Medical College. The Least Absolute Shrinkage and Selection Operator (LASSO) and multivariate logistic regression analysis were utilized to identify characteristic predictors. Multiple machine learning classification models were employed for analysis, with the optimal model ultimately selected. A Shapley Additive Explanations (SHAP) framework was developed for personalized risk assessment. Finally, external testing was performed using data from 330 pulmonary nodule patients at Guang’an People’s Hospital.ResultsThe predictive factors for malignant pulmonary nodules included: age, gender, nodule diameter, spiculation, lobulation, calcification, vacuole, vascular convergence sign, air bronchogram sign, pleural traction, and density of the nodule. The Gradient Boosting Decision Tree (GBDT) classification model demonstrated optimal performance, with an area under the curve (AUC) of 0.873 (95% confidence interval [CI]: 0.840–0.906) on the internal test set and 0.726 (95% CI: 0.668–0.784) on the external test set. Both the calibration curve and clinical decision curve analysis (DCA) indicated excellent model calibration and substantial clinical benefits.ConclusionsWe developed a GBDT model that provides a basis for differentiating malignant pulmonary nodules, which may assist in the diagnosis and treatment of patients with pulmonary nodules.https://www.frontiersin.org/articles/10.3389/fonc.2025.1588147/fullpulmonary nodulesmalignancymachine learningprediction modelexternal test
spellingShingle Yuyan Huang
Yong Chen
Fang He
Li Jiang
Predictive model of malignancy probability in pulmonary nodules based on multicenter data
Frontiers in Oncology
pulmonary nodules
malignancy
machine learning
prediction model
external test
title Predictive model of malignancy probability in pulmonary nodules based on multicenter data
title_full Predictive model of malignancy probability in pulmonary nodules based on multicenter data
title_fullStr Predictive model of malignancy probability in pulmonary nodules based on multicenter data
title_full_unstemmed Predictive model of malignancy probability in pulmonary nodules based on multicenter data
title_short Predictive model of malignancy probability in pulmonary nodules based on multicenter data
title_sort predictive model of malignancy probability in pulmonary nodules based on multicenter data
topic pulmonary nodules
malignancy
machine learning
prediction model
external test
url https://www.frontiersin.org/articles/10.3389/fonc.2025.1588147/full
work_keys_str_mv AT yuyanhuang predictivemodelofmalignancyprobabilityinpulmonarynodulesbasedonmulticenterdata
AT yongchen predictivemodelofmalignancyprobabilityinpulmonarynodulesbasedonmulticenterdata
AT fanghe predictivemodelofmalignancyprobabilityinpulmonarynodulesbasedonmulticenterdata
AT lijiang predictivemodelofmalignancyprobabilityinpulmonarynodulesbasedonmulticenterdata