XGBoost-based machine learning model combining clinical and ultrasound data for personalized prediction of thyroid nodule malignancy

PurposeThyroid ultrasound is a primary tool for screening thyroid nodules (TNs), but existing risk stratification systems have limitations. Nowadays, machine learning (ML) offers advanced capabilities to handle high-dimensional data and complex patterns. This study aimed to develop an ML model integ...

Full description

Saved in:
Bibliographic Details
Main Authors: Wenhan Li, Yajing Zhou, Ziyu Luo, Miao Tan, Rui Yin, Jianhui Li
Format: Article
Language:English
Published: Frontiers Media S.A. 2025-07-01
Series:Frontiers in Endocrinology
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fendo.2025.1639639/full
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849411825855102976
author Wenhan Li
Wenhan Li
Yajing Zhou
Ziyu Luo
Ziyu Luo
Miao Tan
Miao Tan
Rui Yin
Jianhui Li
Jianhui Li
author_facet Wenhan Li
Wenhan Li
Yajing Zhou
Ziyu Luo
Ziyu Luo
Miao Tan
Miao Tan
Rui Yin
Jianhui Li
Jianhui Li
author_sort Wenhan Li
collection DOAJ
description PurposeThyroid ultrasound is a primary tool for screening thyroid nodules (TNs), but existing risk stratification systems have limitations. Nowadays, machine learning (ML) offers advanced capabilities to handle high-dimensional data and complex patterns. This study aimed to develop an ML model integrating clinical data and ultrasound features to improve personalized prediction of TN malignancy.MethodsData from 2,014 patients with TNs (2018.01–2024.01) were retrospectively analyzed, with 1,612 in the training set and 402 in the test set. Features included demographic, ultrasound, and thyroid function indices. Random Forest (RF) and Lasso regression were used for feature selection. Furthermore, six ML models (KNN, Logistic Regression, RF, Classification Tree, SVM, and XGBoost) were developed and validated via 10-fold cross-validation, evaluating performance using area under the receiver operating characteristic curve (AUC), accuracy, sensitivity, specificity, calibration curves, and decision curve analysis (DCA).Results17 variables were influential factors for diagnosing TNs. All six models exhibited satisfactory predictive performance, with their accuracy ranging from 0.761 to 0.851 and AUC from 0.755 to 0.928. Among them, the XGBoost model demonstrated the best performance, achieving an AUC of 0.928, accuracy of 0.851, sensitivity of 0.933, and specificity of 0.650. Calibration curves showed strong agreement between predicted and observed malignancy probabilities, and DCA indicated net clinical benefit across a wide risk threshold range (0.2–0.9). Additionally, we have developed the model as a web-based calculator to facilitate its practical application.ConclusionsThe XGBoost model effectively integrates multi-modal data to predict TN malignancy, offering improved accuracy and clinical utility.
format Article
id doaj-art-1b89175d497e448fb1f5d4fa9308e01a
institution Kabale University
issn 1664-2392
language English
publishDate 2025-07-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Endocrinology
spelling doaj-art-1b89175d497e448fb1f5d4fa9308e01a2025-08-20T03:34:40ZengFrontiers Media S.A.Frontiers in Endocrinology1664-23922025-07-011610.3389/fendo.2025.16396391639639XGBoost-based machine learning model combining clinical and ultrasound data for personalized prediction of thyroid nodule malignancyWenhan Li0Wenhan Li1Yajing Zhou2Ziyu Luo3Ziyu Luo4Miao Tan5Miao Tan6Rui Yin7Jianhui Li8Jianhui Li9Department of Surgical Oncology, Shaanxi Provincial People’s Hospital, Xi’an, Shaanxi, ChinaThe Third Affiliated Hospital, School of Medicine, Xi’an Jiaotong University, Xi’an, Shaanxi, ChinaDepartment of Thyroid and Breast Surgery, The First Affiliated Hospital of Henan Polytechnic University (The Second People’s Hospital of Jiaozuo City), Jiaozuo, Henan, ChinaDepartment of Surgical Oncology, Shaanxi Provincial People’s Hospital, Xi’an, Shaanxi, ChinaThe Third Affiliated Hospital, School of Medicine, Xi’an Jiaotong University, Xi’an, Shaanxi, ChinaDepartment of Surgical Oncology, Shaanxi Provincial People’s Hospital, Xi’an, Shaanxi, ChinaThe Third Affiliated Hospital, School of Medicine, Xi’an Jiaotong University, Xi’an, Shaanxi, ChinaDepartment of General Surgery Ward 1, Hospital of Ningshan County, Ankang, Shaanxi, ChinaDepartment of Surgical Oncology, Shaanxi Provincial People’s Hospital, Xi’an, Shaanxi, ChinaThe Third Affiliated Hospital, School of Medicine, Xi’an Jiaotong University, Xi’an, Shaanxi, ChinaPurposeThyroid ultrasound is a primary tool for screening thyroid nodules (TNs), but existing risk stratification systems have limitations. Nowadays, machine learning (ML) offers advanced capabilities to handle high-dimensional data and complex patterns. This study aimed to develop an ML model integrating clinical data and ultrasound features to improve personalized prediction of TN malignancy.MethodsData from 2,014 patients with TNs (2018.01–2024.01) were retrospectively analyzed, with 1,612 in the training set and 402 in the test set. Features included demographic, ultrasound, and thyroid function indices. Random Forest (RF) and Lasso regression were used for feature selection. Furthermore, six ML models (KNN, Logistic Regression, RF, Classification Tree, SVM, and XGBoost) were developed and validated via 10-fold cross-validation, evaluating performance using area under the receiver operating characteristic curve (AUC), accuracy, sensitivity, specificity, calibration curves, and decision curve analysis (DCA).Results17 variables were influential factors for diagnosing TNs. All six models exhibited satisfactory predictive performance, with their accuracy ranging from 0.761 to 0.851 and AUC from 0.755 to 0.928. Among them, the XGBoost model demonstrated the best performance, achieving an AUC of 0.928, accuracy of 0.851, sensitivity of 0.933, and specificity of 0.650. Calibration curves showed strong agreement between predicted and observed malignancy probabilities, and DCA indicated net clinical benefit across a wide risk threshold range (0.2–0.9). Additionally, we have developed the model as a web-based calculator to facilitate its practical application.ConclusionsThe XGBoost model effectively integrates multi-modal data to predict TN malignancy, offering improved accuracy and clinical utility.https://www.frontiersin.org/articles/10.3389/fendo.2025.1639639/fullthyroid nodulesmachine learningXGBoostdiagnosisweb-based calculator
spellingShingle Wenhan Li
Wenhan Li
Yajing Zhou
Ziyu Luo
Ziyu Luo
Miao Tan
Miao Tan
Rui Yin
Jianhui Li
Jianhui Li
XGBoost-based machine learning model combining clinical and ultrasound data for personalized prediction of thyroid nodule malignancy
Frontiers in Endocrinology
thyroid nodules
machine learning
XGBoost
diagnosis
web-based calculator
title XGBoost-based machine learning model combining clinical and ultrasound data for personalized prediction of thyroid nodule malignancy
title_full XGBoost-based machine learning model combining clinical and ultrasound data for personalized prediction of thyroid nodule malignancy
title_fullStr XGBoost-based machine learning model combining clinical and ultrasound data for personalized prediction of thyroid nodule malignancy
title_full_unstemmed XGBoost-based machine learning model combining clinical and ultrasound data for personalized prediction of thyroid nodule malignancy
title_short XGBoost-based machine learning model combining clinical and ultrasound data for personalized prediction of thyroid nodule malignancy
title_sort xgboost based machine learning model combining clinical and ultrasound data for personalized prediction of thyroid nodule malignancy
topic thyroid nodules
machine learning
XGBoost
diagnosis
web-based calculator
url https://www.frontiersin.org/articles/10.3389/fendo.2025.1639639/full
work_keys_str_mv AT wenhanli xgboostbasedmachinelearningmodelcombiningclinicalandultrasounddataforpersonalizedpredictionofthyroidnodulemalignancy
AT wenhanli xgboostbasedmachinelearningmodelcombiningclinicalandultrasounddataforpersonalizedpredictionofthyroidnodulemalignancy
AT yajingzhou xgboostbasedmachinelearningmodelcombiningclinicalandultrasounddataforpersonalizedpredictionofthyroidnodulemalignancy
AT ziyuluo xgboostbasedmachinelearningmodelcombiningclinicalandultrasounddataforpersonalizedpredictionofthyroidnodulemalignancy
AT ziyuluo xgboostbasedmachinelearningmodelcombiningclinicalandultrasounddataforpersonalizedpredictionofthyroidnodulemalignancy
AT miaotan xgboostbasedmachinelearningmodelcombiningclinicalandultrasounddataforpersonalizedpredictionofthyroidnodulemalignancy
AT miaotan xgboostbasedmachinelearningmodelcombiningclinicalandultrasounddataforpersonalizedpredictionofthyroidnodulemalignancy
AT ruiyin xgboostbasedmachinelearningmodelcombiningclinicalandultrasounddataforpersonalizedpredictionofthyroidnodulemalignancy
AT jianhuili xgboostbasedmachinelearningmodelcombiningclinicalandultrasounddataforpersonalizedpredictionofthyroidnodulemalignancy
AT jianhuili xgboostbasedmachinelearningmodelcombiningclinicalandultrasounddataforpersonalizedpredictionofthyroidnodulemalignancy