Machine learning and discriminant analysis model for predicting benign and malignant pulmonary nodules
Abstract Background Pulmonary Nodules (PNs) are a trend considered as the early manifestation of lung cancer. Among them, PNs that remain stable for more than two years or whose pathological results suggest not being lung cancer are considered benign PNs (BPNs), while PNs that conform to the growth...
Saved in:
| Main Authors: | , , , , , , , , , , , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
BMC
2025-07-01
|
| Series: | BMC Medical Informatics and Decision Making |
| Subjects: | |
| Online Access: | https://doi.org/10.1186/s12911-025-03067-8 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849332933459968000 |
|---|---|
| author | Zhi Li Wenjing Zhang Jinyi Huang Ling Lu Dongming Xie Jinrong Zhang Jiamin Liang Yuepeng Sui Linyuan Liu Jianjun Zou Ao Lin Lei Yang Fuman Qiu Zhaoting Hu Mei Wu Yibin Deng Xin Zhang Jiachun Lu |
| author_facet | Zhi Li Wenjing Zhang Jinyi Huang Ling Lu Dongming Xie Jinrong Zhang Jiamin Liang Yuepeng Sui Linyuan Liu Jianjun Zou Ao Lin Lei Yang Fuman Qiu Zhaoting Hu Mei Wu Yibin Deng Xin Zhang Jiachun Lu |
| author_sort | Zhi Li |
| collection | DOAJ |
| description | Abstract Background Pulmonary Nodules (PNs) are a trend considered as the early manifestation of lung cancer. Among them, PNs that remain stable for more than two years or whose pathological results suggest not being lung cancer are considered benign PNs (BPNs), while PNs that conform to the growth pattern of tumors or whose pathological results indicate lung cancer are considered malignant PNs (MPNs). Currently, more than 90% of PNs detected by screening tests are benign, with a false positive rate of up to 96.4%. While a range of predictive models have been developed for the identification of MPNs, there are still some challenges in distinguishing between BPNs and MPNs. Methods We included a total of 5197 patients for the case-control study according to the preset exclusion criteria and sample size. Among them, 4735 with BPNs and 2509 with MPNs were randomly divided into training, validation, and test sets according to a 7:1.5:1.5 ratio. Three widely applicable machine learning algorithms (Random Forests, Gradient Boosting Machine, and XGBoost) were used to screen the metrics, and then the corresponding predictive models were constructed using discriminative analysis, and the best performing model was selected as the target model. The model is internally validated with 10-fold cross validation and compared with PKUPH and Block models. Results We collated information from chest CT examinations performed from 2018 to 2021 in the physical examination population and found that the detection rate of PNs was 21.57% and showed an overall upward trend. The GMU_D model constructed by discriminative analysis based on machine learning screening features had an excellent discriminative performance (AUC = 0.866, 95% CI: 0.858–0.874), and higher accuracy than the PKUPH model (AUC = 0.559, 95% CI: 0.552–0.567) and the Block model (AUC = 0.823, 95% CI: 0.814–0.833). Moreover, the cross-validation results also exhibit excellent performance (AUC = 0.866, 95% CI: 0.858–0.874). Conclusion The detection rate of PNs was 21.57% in the physical examination population undergoing chest CT. Meanwhile, based on real-world studies of PNs, a greater prediction tool was developed and validated that can be used to accurately distinguish between BPNs and MPNs with the excellent predictive performance and differentiation. |
| format | Article |
| id | doaj-art-7b338dea658741feb365098d585d2832 |
| institution | Kabale University |
| issn | 1472-6947 |
| language | English |
| publishDate | 2025-07-01 |
| publisher | BMC |
| record_format | Article |
| series | BMC Medical Informatics and Decision Making |
| spelling | doaj-art-7b338dea658741feb365098d585d28322025-08-20T03:46:03ZengBMCBMC Medical Informatics and Decision Making1472-69472025-07-0125111110.1186/s12911-025-03067-8Machine learning and discriminant analysis model for predicting benign and malignant pulmonary nodulesZhi Li0Wenjing Zhang1Jinyi Huang2Ling Lu3Dongming Xie4Jinrong Zhang5Jiamin Liang6Yuepeng Sui7Linyuan Liu8Jianjun Zou9Ao Lin10Lei Yang11Fuman Qiu12Zhaoting Hu13Mei Wu14Yibin Deng15Xin Zhang16Jiachun Lu17The Key Laboratory of Advanced Interdisciplinary Studies, The First Affiliated Hospital of Guangzhou Medical University, The Institute for Chemical Carcinogenesis, School of Public Health, Guangzhou Medical UniversityThe Health Management Center, The Third Affiliated Hospital of Southern Medical UniversityThe Key Laboratory of Advanced Interdisciplinary Studies, The First Affiliated Hospital of Guangzhou Medical University, The Institute for Chemical Carcinogenesis, School of Public Health, Guangzhou Medical UniversityThe Key Laboratory of Advanced Interdisciplinary Studies, The First Affiliated Hospital of Guangzhou Medical University, The Institute for Chemical Carcinogenesis, School of Public Health, Guangzhou Medical UniversityThe Key Laboratory of Advanced Interdisciplinary Studies, The First Affiliated Hospital of Guangzhou Medical University, The Institute for Chemical Carcinogenesis, School of Public Health, Guangzhou Medical UniversityThe Key Laboratory of Advanced Interdisciplinary Studies, The First Affiliated Hospital of Guangzhou Medical University, The Institute for Chemical Carcinogenesis, School of Public Health, Guangzhou Medical UniversityThe Key Laboratory of Advanced Interdisciplinary Studies, The First Affiliated Hospital of Guangzhou Medical University, The Institute for Chemical Carcinogenesis, School of Public Health, Guangzhou Medical UniversityThe Key Laboratory of Advanced Interdisciplinary Studies, The First Affiliated Hospital of Guangzhou Medical University, The Institute for Chemical Carcinogenesis, School of Public Health, Guangzhou Medical UniversityThe Key Laboratory of Advanced Interdisciplinary Studies, The First Affiliated Hospital of Guangzhou Medical University, The Institute for Chemical Carcinogenesis, School of Public Health, Guangzhou Medical UniversityThe Key Laboratory of Advanced Interdisciplinary Studies, The First Affiliated Hospital of Guangzhou Medical University, The Institute for Chemical Carcinogenesis, School of Public Health, Guangzhou Medical UniversityThe Key Laboratory of Advanced Interdisciplinary Studies, The First Affiliated Hospital of Guangzhou Medical University, The Institute for Chemical Carcinogenesis, School of Public Health, Guangzhou Medical UniversityThe Key Laboratory of Advanced Interdisciplinary Studies, The First Affiliated Hospital of Guangzhou Medical University, The Institute for Chemical Carcinogenesis, School of Public Health, Guangzhou Medical UniversityThe Key Laboratory of Advanced Interdisciplinary Studies, The First Affiliated Hospital of Guangzhou Medical University, The Institute for Chemical Carcinogenesis, School of Public Health, Guangzhou Medical UniversityThe Health Management Center, The Third Affiliated Hospital of Southern Medical UniversityThe Health Management Center, The Third Affiliated Hospital of Southern Medical UniversityKey Laboratory of Research on Clinical Molecular Diagnosis for High Incidence Diseases in Western Guangxi, Center for Medical Laboratory Science, The Afliated Hospital of Youjiang Medical University for NationalitiesThe Key Laboratory of Advanced Interdisciplinary Studies, The First Affiliated Hospital of Guangzhou Medical University, The Institute for Chemical Carcinogenesis, School of Public Health, Guangzhou Medical UniversityThe Key Laboratory of Advanced Interdisciplinary Studies, The First Affiliated Hospital of Guangzhou Medical University, The Institute for Chemical Carcinogenesis, School of Public Health, Guangzhou Medical UniversityAbstract Background Pulmonary Nodules (PNs) are a trend considered as the early manifestation of lung cancer. Among them, PNs that remain stable for more than two years or whose pathological results suggest not being lung cancer are considered benign PNs (BPNs), while PNs that conform to the growth pattern of tumors or whose pathological results indicate lung cancer are considered malignant PNs (MPNs). Currently, more than 90% of PNs detected by screening tests are benign, with a false positive rate of up to 96.4%. While a range of predictive models have been developed for the identification of MPNs, there are still some challenges in distinguishing between BPNs and MPNs. Methods We included a total of 5197 patients for the case-control study according to the preset exclusion criteria and sample size. Among them, 4735 with BPNs and 2509 with MPNs were randomly divided into training, validation, and test sets according to a 7:1.5:1.5 ratio. Three widely applicable machine learning algorithms (Random Forests, Gradient Boosting Machine, and XGBoost) were used to screen the metrics, and then the corresponding predictive models were constructed using discriminative analysis, and the best performing model was selected as the target model. The model is internally validated with 10-fold cross validation and compared with PKUPH and Block models. Results We collated information from chest CT examinations performed from 2018 to 2021 in the physical examination population and found that the detection rate of PNs was 21.57% and showed an overall upward trend. The GMU_D model constructed by discriminative analysis based on machine learning screening features had an excellent discriminative performance (AUC = 0.866, 95% CI: 0.858–0.874), and higher accuracy than the PKUPH model (AUC = 0.559, 95% CI: 0.552–0.567) and the Block model (AUC = 0.823, 95% CI: 0.814–0.833). Moreover, the cross-validation results also exhibit excellent performance (AUC = 0.866, 95% CI: 0.858–0.874). Conclusion The detection rate of PNs was 21.57% in the physical examination population undergoing chest CT. Meanwhile, based on real-world studies of PNs, a greater prediction tool was developed and validated that can be used to accurately distinguish between BPNs and MPNs with the excellent predictive performance and differentiation.https://doi.org/10.1186/s12911-025-03067-8Pulmonary nodulesDetection rateMachine learningDiscriminative analysisPrediction model |
| spellingShingle | Zhi Li Wenjing Zhang Jinyi Huang Ling Lu Dongming Xie Jinrong Zhang Jiamin Liang Yuepeng Sui Linyuan Liu Jianjun Zou Ao Lin Lei Yang Fuman Qiu Zhaoting Hu Mei Wu Yibin Deng Xin Zhang Jiachun Lu Machine learning and discriminant analysis model for predicting benign and malignant pulmonary nodules BMC Medical Informatics and Decision Making Pulmonary nodules Detection rate Machine learning Discriminative analysis Prediction model |
| title | Machine learning and discriminant analysis model for predicting benign and malignant pulmonary nodules |
| title_full | Machine learning and discriminant analysis model for predicting benign and malignant pulmonary nodules |
| title_fullStr | Machine learning and discriminant analysis model for predicting benign and malignant pulmonary nodules |
| title_full_unstemmed | Machine learning and discriminant analysis model for predicting benign and malignant pulmonary nodules |
| title_short | Machine learning and discriminant analysis model for predicting benign and malignant pulmonary nodules |
| title_sort | machine learning and discriminant analysis model for predicting benign and malignant pulmonary nodules |
| topic | Pulmonary nodules Detection rate Machine learning Discriminative analysis Prediction model |
| url | https://doi.org/10.1186/s12911-025-03067-8 |
| work_keys_str_mv | AT zhili machinelearninganddiscriminantanalysismodelforpredictingbenignandmalignantpulmonarynodules AT wenjingzhang machinelearninganddiscriminantanalysismodelforpredictingbenignandmalignantpulmonarynodules AT jinyihuang machinelearninganddiscriminantanalysismodelforpredictingbenignandmalignantpulmonarynodules AT linglu machinelearninganddiscriminantanalysismodelforpredictingbenignandmalignantpulmonarynodules AT dongmingxie machinelearninganddiscriminantanalysismodelforpredictingbenignandmalignantpulmonarynodules AT jinrongzhang machinelearninganddiscriminantanalysismodelforpredictingbenignandmalignantpulmonarynodules AT jiaminliang machinelearninganddiscriminantanalysismodelforpredictingbenignandmalignantpulmonarynodules AT yuepengsui machinelearninganddiscriminantanalysismodelforpredictingbenignandmalignantpulmonarynodules AT linyuanliu machinelearninganddiscriminantanalysismodelforpredictingbenignandmalignantpulmonarynodules AT jianjunzou machinelearninganddiscriminantanalysismodelforpredictingbenignandmalignantpulmonarynodules AT aolin machinelearninganddiscriminantanalysismodelforpredictingbenignandmalignantpulmonarynodules AT leiyang machinelearninganddiscriminantanalysismodelforpredictingbenignandmalignantpulmonarynodules AT fumanqiu machinelearninganddiscriminantanalysismodelforpredictingbenignandmalignantpulmonarynodules AT zhaotinghu machinelearninganddiscriminantanalysismodelforpredictingbenignandmalignantpulmonarynodules AT meiwu machinelearninganddiscriminantanalysismodelforpredictingbenignandmalignantpulmonarynodules AT yibindeng machinelearninganddiscriminantanalysismodelforpredictingbenignandmalignantpulmonarynodules AT xinzhang machinelearninganddiscriminantanalysismodelforpredictingbenignandmalignantpulmonarynodules AT jiachunlu machinelearninganddiscriminantanalysismodelforpredictingbenignandmalignantpulmonarynodules |