Pre-operative T-stage discrimination in gallbladder cancer using machine learning and DeepSeek-R1
BackgroundGallbladder cancer (GBC) frequently exhibits non-specific early symptoms, delaying diagnosis. This study (i) assessed whether routine blood biomarkers can distinguish early T stages via machine learning and (ii) compared the T-stage discrimination performance of a large language model (Dee...
Saved in:
| Main Authors: | , , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Frontiers Media S.A.
2025-08-01
|
| Series: | Frontiers in Oncology |
| Subjects: | |
| Online Access: | https://www.frontiersin.org/articles/10.3389/fonc.2025.1613462/full |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849707379012141056 |
|---|---|
| author | Joongwon Chae Zhenyu Wang Duanpo Wu Lian Zhang Alexander Tuzikov Magrupov Talat Madiyevich Min Xu Dongmei Yu Peiwu Qin |
| author_facet | Joongwon Chae Zhenyu Wang Duanpo Wu Lian Zhang Alexander Tuzikov Magrupov Talat Madiyevich Min Xu Dongmei Yu Peiwu Qin |
| author_sort | Joongwon Chae |
| collection | DOAJ |
| description | BackgroundGallbladder cancer (GBC) frequently exhibits non-specific early symptoms, delaying diagnosis. This study (i) assessed whether routine blood biomarkers can distinguish early T stages via machine learning and (ii) compared the T-stage discrimination performance of a large language model (DeepSeek-R1) when supplied with (a) radiology-report text alone versus (b) radiology-report text plus blood-biomarker values.MethodsWe retrospectively analyzed 232 pathologically confirmed GBC patients treated at Lishui Central Hospital between 2023 and 2024 (T1, n = 51; T2, n = 181). Seven blood variables—neutrophil-to-lymphocyte ratio (NLR), monocyte-to-lymphocyte ratio (MLR), platelet-tolymphocyte ratio (PLR), carcino-embryonic antigen (CEA), carbohydrate antigen 19-9 (CA19-9), carbohydrate antigen 125 (CA125), and alpha-fetoprotein (AFP)—were used to train Random forest, Support Vector Machine (SVC), XGBoost, and LightGBM models. Synthetic Minority Over-sampling Technique (SMOTE) was applied only to the training folds in one setting and omitted in another. Model performance was evaluated on an independent test set (N = 47) by the area under the receiver-operating-characteristic curve (AUROC, 95% CI by 1 000-sample bootstrap confidence interval, CI); cross-validation (CV) accuracy served as a supplementary metric. DeepSeek-R1 was prompted in a zero-shot, chain-of-thought manner to classify T1 versus T2 using (a) the radiology report alone or (b) the report plus the patient’s biomarker profile.ResultsBiomarker-based machine-learning models yielded uniformly poor T-stage discrimination. Without SMOTE, individual models such as XGBoost achieved an AUROC of 0.508 on the independent test set, while recall for the T1 class remained low (e.g., 14.3% for some models), indicating performance near random chance. Applying SMOTE to the training data produced statistically significant gains in cross-validation (CV) accuracy for several models (e.g., XGBoost CV Acc. 0.71 → 0.80, p = 0.005; LGBM CV Acc. [No-SMOTE] → [SMOTE], p = 0.004). However, these improvements did not translate to better discrimination on the independent test set; for instance, XGBoost’s AUROC decreased from 0.508 to 0.473 after SMOTE application. Overall, the biomarker models failed to provide clinically meaningful T-stage differentiation. DeepSeek-R1 analyzing radiology text alone reached 89.6% accuracy on the full 232-patient cohort dataset, and consistently flagged T2 cases on phrases such as “gallbladder wall thickening.” Supplying biomarker values did not change accuracy (89.6%)ConclusionsThe evaluated blood biomarkers did not independently aid early T-stage discrimination, and SMOTE offered no meaningful performance gain. Conversely, a radiologytext-driven large language model delivered high accuracy with interpretable rationale, highlighting its potential to guide surgical strategy in GBC. Prospective multi-center studies with larger cohorts are warranted to confirm these findings. |
| format | Article |
| id | doaj-art-d8b2b774c0b04e9b964a89abee8a95a9 |
| institution | DOAJ |
| issn | 2234-943X |
| language | English |
| publishDate | 2025-08-01 |
| publisher | Frontiers Media S.A. |
| record_format | Article |
| series | Frontiers in Oncology |
| spelling | doaj-art-d8b2b774c0b04e9b964a89abee8a95a92025-08-20T03:15:56ZengFrontiers Media S.A.Frontiers in Oncology2234-943X2025-08-011510.3389/fonc.2025.16134621613462Pre-operative T-stage discrimination in gallbladder cancer using machine learning and DeepSeek-R1Joongwon Chae0Zhenyu Wang1Duanpo Wu2Lian Zhang3Alexander Tuzikov4Magrupov Talat Madiyevich5Min Xu6Dongmei Yu7Peiwu Qin8Institute of Biopharmaceutical and Health Engineering, Shenzhen International Graduate School, Tsinghua University, Shenzhen, Guangdong, ChinaInstitute of Biopharmaceutical and Health Engineering, Shenzhen International Graduate School, Tsinghua University, Shenzhen, Guangdong, ChinaSchool of Communication Engineering and the Artificial Intelligence Institute, Hangzhou Dianzi University, Hangzhou, Zhejiang, ChinaThe First Hospital of Hebei Medical University, Shijiazhuang, Hebei, ChinaUnited Institute of Informatics Problems, National Academy of Sciences of Belarus, Minsk, BelarusDepartment of Biomedical Engineering & Tashkent State Technical University, Tashkent, UzbekistanAffiliated Fifth Hospital, Wenzhou Medical University, Wenzhou, Zhejiang, ChinaAffiliated Fifth Hospital, Wenzhou Medical University, Wenzhou, Zhejiang, ChinaInstitute of Biopharmaceutical and Health Engineering, Shenzhen International Graduate School, Tsinghua University, Shenzhen, Guangdong, ChinaBackgroundGallbladder cancer (GBC) frequently exhibits non-specific early symptoms, delaying diagnosis. This study (i) assessed whether routine blood biomarkers can distinguish early T stages via machine learning and (ii) compared the T-stage discrimination performance of a large language model (DeepSeek-R1) when supplied with (a) radiology-report text alone versus (b) radiology-report text plus blood-biomarker values.MethodsWe retrospectively analyzed 232 pathologically confirmed GBC patients treated at Lishui Central Hospital between 2023 and 2024 (T1, n = 51; T2, n = 181). Seven blood variables—neutrophil-to-lymphocyte ratio (NLR), monocyte-to-lymphocyte ratio (MLR), platelet-tolymphocyte ratio (PLR), carcino-embryonic antigen (CEA), carbohydrate antigen 19-9 (CA19-9), carbohydrate antigen 125 (CA125), and alpha-fetoprotein (AFP)—were used to train Random forest, Support Vector Machine (SVC), XGBoost, and LightGBM models. Synthetic Minority Over-sampling Technique (SMOTE) was applied only to the training folds in one setting and omitted in another. Model performance was evaluated on an independent test set (N = 47) by the area under the receiver-operating-characteristic curve (AUROC, 95% CI by 1 000-sample bootstrap confidence interval, CI); cross-validation (CV) accuracy served as a supplementary metric. DeepSeek-R1 was prompted in a zero-shot, chain-of-thought manner to classify T1 versus T2 using (a) the radiology report alone or (b) the report plus the patient’s biomarker profile.ResultsBiomarker-based machine-learning models yielded uniformly poor T-stage discrimination. Without SMOTE, individual models such as XGBoost achieved an AUROC of 0.508 on the independent test set, while recall for the T1 class remained low (e.g., 14.3% for some models), indicating performance near random chance. Applying SMOTE to the training data produced statistically significant gains in cross-validation (CV) accuracy for several models (e.g., XGBoost CV Acc. 0.71 → 0.80, p = 0.005; LGBM CV Acc. [No-SMOTE] → [SMOTE], p = 0.004). However, these improvements did not translate to better discrimination on the independent test set; for instance, XGBoost’s AUROC decreased from 0.508 to 0.473 after SMOTE application. Overall, the biomarker models failed to provide clinically meaningful T-stage differentiation. DeepSeek-R1 analyzing radiology text alone reached 89.6% accuracy on the full 232-patient cohort dataset, and consistently flagged T2 cases on phrases such as “gallbladder wall thickening.” Supplying biomarker values did not change accuracy (89.6%)ConclusionsThe evaluated blood biomarkers did not independently aid early T-stage discrimination, and SMOTE offered no meaningful performance gain. Conversely, a radiologytext-driven large language model delivered high accuracy with interpretable rationale, highlighting its potential to guide surgical strategy in GBC. Prospective multi-center studies with larger cohorts are warranted to confirm these findings.https://www.frontiersin.org/articles/10.3389/fonc.2025.1613462/fullgallbladder cancerGBCmachine learninglarge language modelDeepSeek-R1staging |
| spellingShingle | Joongwon Chae Zhenyu Wang Duanpo Wu Lian Zhang Alexander Tuzikov Magrupov Talat Madiyevich Min Xu Dongmei Yu Peiwu Qin Pre-operative T-stage discrimination in gallbladder cancer using machine learning and DeepSeek-R1 Frontiers in Oncology gallbladder cancer GBC machine learning large language model DeepSeek-R1 staging |
| title | Pre-operative T-stage discrimination in gallbladder cancer using machine learning and DeepSeek-R1 |
| title_full | Pre-operative T-stage discrimination in gallbladder cancer using machine learning and DeepSeek-R1 |
| title_fullStr | Pre-operative T-stage discrimination in gallbladder cancer using machine learning and DeepSeek-R1 |
| title_full_unstemmed | Pre-operative T-stage discrimination in gallbladder cancer using machine learning and DeepSeek-R1 |
| title_short | Pre-operative T-stage discrimination in gallbladder cancer using machine learning and DeepSeek-R1 |
| title_sort | pre operative t stage discrimination in gallbladder cancer using machine learning and deepseek r1 |
| topic | gallbladder cancer GBC machine learning large language model DeepSeek-R1 staging |
| url | https://www.frontiersin.org/articles/10.3389/fonc.2025.1613462/full |
| work_keys_str_mv | AT joongwonchae preoperativetstagediscriminationingallbladdercancerusingmachinelearninganddeepseekr1 AT zhenyuwang preoperativetstagediscriminationingallbladdercancerusingmachinelearninganddeepseekr1 AT duanpowu preoperativetstagediscriminationingallbladdercancerusingmachinelearninganddeepseekr1 AT lianzhang preoperativetstagediscriminationingallbladdercancerusingmachinelearninganddeepseekr1 AT alexandertuzikov preoperativetstagediscriminationingallbladdercancerusingmachinelearninganddeepseekr1 AT magrupovtalatmadiyevich preoperativetstagediscriminationingallbladdercancerusingmachinelearninganddeepseekr1 AT minxu preoperativetstagediscriminationingallbladdercancerusingmachinelearninganddeepseekr1 AT dongmeiyu preoperativetstagediscriminationingallbladdercancerusingmachinelearninganddeepseekr1 AT peiwuqin preoperativetstagediscriminationingallbladdercancerusingmachinelearninganddeepseekr1 |