Machine learning and discriminant analysis model for predicting benign and malignant pulmonary nodules

Abstract Background Pulmonary Nodules (PNs) are a trend considered as the early manifestation of lung cancer. Among them, PNs that remain stable for more than two years or whose pathological results suggest not being lung cancer are considered benign PNs (BPNs), while PNs that conform to the growth...

Full description

Saved in:
Bibliographic Details
Main Authors: Zhi Li, Wenjing Zhang, Jinyi Huang, Ling Lu, Dongming Xie, Jinrong Zhang, Jiamin Liang, Yuepeng Sui, Linyuan Liu, Jianjun Zou, Ao Lin, Lei Yang, Fuman Qiu, Zhaoting Hu, Mei Wu, Yibin Deng, Xin Zhang, Jiachun Lu
Format: Article
Language:English
Published: BMC 2025-07-01
Series:BMC Medical Informatics and Decision Making
Subjects:
Online Access:https://doi.org/10.1186/s12911-025-03067-8
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849332933459968000
author Zhi Li
Wenjing Zhang
Jinyi Huang
Ling Lu
Dongming Xie
Jinrong Zhang
Jiamin Liang
Yuepeng Sui
Linyuan Liu
Jianjun Zou
Ao Lin
Lei Yang
Fuman Qiu
Zhaoting Hu
Mei Wu
Yibin Deng
Xin Zhang
Jiachun Lu
author_facet Zhi Li
Wenjing Zhang
Jinyi Huang
Ling Lu
Dongming Xie
Jinrong Zhang
Jiamin Liang
Yuepeng Sui
Linyuan Liu
Jianjun Zou
Ao Lin
Lei Yang
Fuman Qiu
Zhaoting Hu
Mei Wu
Yibin Deng
Xin Zhang
Jiachun Lu
author_sort Zhi Li
collection DOAJ
description Abstract Background Pulmonary Nodules (PNs) are a trend considered as the early manifestation of lung cancer. Among them, PNs that remain stable for more than two years or whose pathological results suggest not being lung cancer are considered benign PNs (BPNs), while PNs that conform to the growth pattern of tumors or whose pathological results indicate lung cancer are considered malignant PNs (MPNs). Currently, more than 90% of PNs detected by screening tests are benign, with a false positive rate of up to 96.4%. While a range of predictive models have been developed for the identification of MPNs, there are still some challenges in distinguishing between BPNs and MPNs. Methods We included a total of 5197 patients for the case-control study according to the preset exclusion criteria and sample size. Among them, 4735 with BPNs and 2509 with MPNs were randomly divided into training, validation, and test sets according to a 7:1.5:1.5 ratio. Three widely applicable machine learning algorithms (Random Forests, Gradient Boosting Machine, and XGBoost) were used to screen the metrics, and then the corresponding predictive models were constructed using discriminative analysis, and the best performing model was selected as the target model. The model is internally validated with 10-fold cross validation and compared with PKUPH and Block models. Results We collated information from chest CT examinations performed from 2018 to 2021 in the physical examination population and found that the detection rate of PNs was 21.57% and showed an overall upward trend. The GMU_D model constructed by discriminative analysis based on machine learning screening features had an excellent discriminative performance (AUC = 0.866, 95% CI: 0.858–0.874), and higher accuracy than the PKUPH model (AUC = 0.559, 95% CI: 0.552–0.567) and the Block model (AUC = 0.823, 95% CI: 0.814–0.833). Moreover, the cross-validation results also exhibit excellent performance (AUC = 0.866, 95% CI: 0.858–0.874). Conclusion The detection rate of PNs was 21.57% in the physical examination population undergoing chest CT. Meanwhile, based on real-world studies of PNs, a greater prediction tool was developed and validated that can be used to accurately distinguish between BPNs and MPNs with the excellent predictive performance and differentiation.
format Article
id doaj-art-7b338dea658741feb365098d585d2832
institution Kabale University
issn 1472-6947
language English
publishDate 2025-07-01
publisher BMC
record_format Article
series BMC Medical Informatics and Decision Making
spelling doaj-art-7b338dea658741feb365098d585d28322025-08-20T03:46:03ZengBMCBMC Medical Informatics and Decision Making1472-69472025-07-0125111110.1186/s12911-025-03067-8Machine learning and discriminant analysis model for predicting benign and malignant pulmonary nodulesZhi Li0Wenjing Zhang1Jinyi Huang2Ling Lu3Dongming Xie4Jinrong Zhang5Jiamin Liang6Yuepeng Sui7Linyuan Liu8Jianjun Zou9Ao Lin10Lei Yang11Fuman Qiu12Zhaoting Hu13Mei Wu14Yibin Deng15Xin Zhang16Jiachun Lu17The Key Laboratory of Advanced Interdisciplinary Studies, The First Affiliated Hospital of Guangzhou Medical University, The Institute for Chemical Carcinogenesis, School of Public Health, Guangzhou Medical UniversityThe Health Management Center, The Third Affiliated Hospital of Southern Medical UniversityThe Key Laboratory of Advanced Interdisciplinary Studies, The First Affiliated Hospital of Guangzhou Medical University, The Institute for Chemical Carcinogenesis, School of Public Health, Guangzhou Medical UniversityThe Key Laboratory of Advanced Interdisciplinary Studies, The First Affiliated Hospital of Guangzhou Medical University, The Institute for Chemical Carcinogenesis, School of Public Health, Guangzhou Medical UniversityThe Key Laboratory of Advanced Interdisciplinary Studies, The First Affiliated Hospital of Guangzhou Medical University, The Institute for Chemical Carcinogenesis, School of Public Health, Guangzhou Medical UniversityThe Key Laboratory of Advanced Interdisciplinary Studies, The First Affiliated Hospital of Guangzhou Medical University, The Institute for Chemical Carcinogenesis, School of Public Health, Guangzhou Medical UniversityThe Key Laboratory of Advanced Interdisciplinary Studies, The First Affiliated Hospital of Guangzhou Medical University, The Institute for Chemical Carcinogenesis, School of Public Health, Guangzhou Medical UniversityThe Key Laboratory of Advanced Interdisciplinary Studies, The First Affiliated Hospital of Guangzhou Medical University, The Institute for Chemical Carcinogenesis, School of Public Health, Guangzhou Medical UniversityThe Key Laboratory of Advanced Interdisciplinary Studies, The First Affiliated Hospital of Guangzhou Medical University, The Institute for Chemical Carcinogenesis, School of Public Health, Guangzhou Medical UniversityThe Key Laboratory of Advanced Interdisciplinary Studies, The First Affiliated Hospital of Guangzhou Medical University, The Institute for Chemical Carcinogenesis, School of Public Health, Guangzhou Medical UniversityThe Key Laboratory of Advanced Interdisciplinary Studies, The First Affiliated Hospital of Guangzhou Medical University, The Institute for Chemical Carcinogenesis, School of Public Health, Guangzhou Medical UniversityThe Key Laboratory of Advanced Interdisciplinary Studies, The First Affiliated Hospital of Guangzhou Medical University, The Institute for Chemical Carcinogenesis, School of Public Health, Guangzhou Medical UniversityThe Key Laboratory of Advanced Interdisciplinary Studies, The First Affiliated Hospital of Guangzhou Medical University, The Institute for Chemical Carcinogenesis, School of Public Health, Guangzhou Medical UniversityThe Health Management Center, The Third Affiliated Hospital of Southern Medical UniversityThe Health Management Center, The Third Affiliated Hospital of Southern Medical UniversityKey Laboratory of Research on Clinical Molecular Diagnosis for High Incidence Diseases in Western Guangxi, Center for Medical Laboratory Science, The Afliated Hospital of Youjiang Medical University for NationalitiesThe Key Laboratory of Advanced Interdisciplinary Studies, The First Affiliated Hospital of Guangzhou Medical University, The Institute for Chemical Carcinogenesis, School of Public Health, Guangzhou Medical UniversityThe Key Laboratory of Advanced Interdisciplinary Studies, The First Affiliated Hospital of Guangzhou Medical University, The Institute for Chemical Carcinogenesis, School of Public Health, Guangzhou Medical UniversityAbstract Background Pulmonary Nodules (PNs) are a trend considered as the early manifestation of lung cancer. Among them, PNs that remain stable for more than two years or whose pathological results suggest not being lung cancer are considered benign PNs (BPNs), while PNs that conform to the growth pattern of tumors or whose pathological results indicate lung cancer are considered malignant PNs (MPNs). Currently, more than 90% of PNs detected by screening tests are benign, with a false positive rate of up to 96.4%. While a range of predictive models have been developed for the identification of MPNs, there are still some challenges in distinguishing between BPNs and MPNs. Methods We included a total of 5197 patients for the case-control study according to the preset exclusion criteria and sample size. Among them, 4735 with BPNs and 2509 with MPNs were randomly divided into training, validation, and test sets according to a 7:1.5:1.5 ratio. Three widely applicable machine learning algorithms (Random Forests, Gradient Boosting Machine, and XGBoost) were used to screen the metrics, and then the corresponding predictive models were constructed using discriminative analysis, and the best performing model was selected as the target model. The model is internally validated with 10-fold cross validation and compared with PKUPH and Block models. Results We collated information from chest CT examinations performed from 2018 to 2021 in the physical examination population and found that the detection rate of PNs was 21.57% and showed an overall upward trend. The GMU_D model constructed by discriminative analysis based on machine learning screening features had an excellent discriminative performance (AUC = 0.866, 95% CI: 0.858–0.874), and higher accuracy than the PKUPH model (AUC = 0.559, 95% CI: 0.552–0.567) and the Block model (AUC = 0.823, 95% CI: 0.814–0.833). Moreover, the cross-validation results also exhibit excellent performance (AUC = 0.866, 95% CI: 0.858–0.874). Conclusion The detection rate of PNs was 21.57% in the physical examination population undergoing chest CT. Meanwhile, based on real-world studies of PNs, a greater prediction tool was developed and validated that can be used to accurately distinguish between BPNs and MPNs with the excellent predictive performance and differentiation.https://doi.org/10.1186/s12911-025-03067-8Pulmonary nodulesDetection rateMachine learningDiscriminative analysisPrediction model
spellingShingle Zhi Li
Wenjing Zhang
Jinyi Huang
Ling Lu
Dongming Xie
Jinrong Zhang
Jiamin Liang
Yuepeng Sui
Linyuan Liu
Jianjun Zou
Ao Lin
Lei Yang
Fuman Qiu
Zhaoting Hu
Mei Wu
Yibin Deng
Xin Zhang
Jiachun Lu
Machine learning and discriminant analysis model for predicting benign and malignant pulmonary nodules
BMC Medical Informatics and Decision Making
Pulmonary nodules
Detection rate
Machine learning
Discriminative analysis
Prediction model
title Machine learning and discriminant analysis model for predicting benign and malignant pulmonary nodules
title_full Machine learning and discriminant analysis model for predicting benign and malignant pulmonary nodules
title_fullStr Machine learning and discriminant analysis model for predicting benign and malignant pulmonary nodules
title_full_unstemmed Machine learning and discriminant analysis model for predicting benign and malignant pulmonary nodules
title_short Machine learning and discriminant analysis model for predicting benign and malignant pulmonary nodules
title_sort machine learning and discriminant analysis model for predicting benign and malignant pulmonary nodules
topic Pulmonary nodules
Detection rate
Machine learning
Discriminative analysis
Prediction model
url https://doi.org/10.1186/s12911-025-03067-8
work_keys_str_mv AT zhili machinelearninganddiscriminantanalysismodelforpredictingbenignandmalignantpulmonarynodules
AT wenjingzhang machinelearninganddiscriminantanalysismodelforpredictingbenignandmalignantpulmonarynodules
AT jinyihuang machinelearninganddiscriminantanalysismodelforpredictingbenignandmalignantpulmonarynodules
AT linglu machinelearninganddiscriminantanalysismodelforpredictingbenignandmalignantpulmonarynodules
AT dongmingxie machinelearninganddiscriminantanalysismodelforpredictingbenignandmalignantpulmonarynodules
AT jinrongzhang machinelearninganddiscriminantanalysismodelforpredictingbenignandmalignantpulmonarynodules
AT jiaminliang machinelearninganddiscriminantanalysismodelforpredictingbenignandmalignantpulmonarynodules
AT yuepengsui machinelearninganddiscriminantanalysismodelforpredictingbenignandmalignantpulmonarynodules
AT linyuanliu machinelearninganddiscriminantanalysismodelforpredictingbenignandmalignantpulmonarynodules
AT jianjunzou machinelearninganddiscriminantanalysismodelforpredictingbenignandmalignantpulmonarynodules
AT aolin machinelearninganddiscriminantanalysismodelforpredictingbenignandmalignantpulmonarynodules
AT leiyang machinelearninganddiscriminantanalysismodelforpredictingbenignandmalignantpulmonarynodules
AT fumanqiu machinelearninganddiscriminantanalysismodelforpredictingbenignandmalignantpulmonarynodules
AT zhaotinghu machinelearninganddiscriminantanalysismodelforpredictingbenignandmalignantpulmonarynodules
AT meiwu machinelearninganddiscriminantanalysismodelforpredictingbenignandmalignantpulmonarynodules
AT yibindeng machinelearninganddiscriminantanalysismodelforpredictingbenignandmalignantpulmonarynodules
AT xinzhang machinelearninganddiscriminantanalysismodelforpredictingbenignandmalignantpulmonarynodules
AT jiachunlu machinelearninganddiscriminantanalysismodelforpredictingbenignandmalignantpulmonarynodules