Pre-operative T-stage discrimination in gallbladder cancer using machine learning and DeepSeek-R1

BackgroundGallbladder cancer (GBC) frequently exhibits non-specific early symptoms, delaying diagnosis. This study (i) assessed whether routine blood biomarkers can distinguish early T stages via machine learning and (ii) compared the T-stage discrimination performance of a large language model (Dee...

Full description

Saved in:
Bibliographic Details
Main Authors: Joongwon Chae, Zhenyu Wang, Duanpo Wu, Lian Zhang, Alexander Tuzikov, Magrupov Talat Madiyevich, Min Xu, Dongmei Yu, Peiwu Qin
Format: Article
Language:English
Published: Frontiers Media S.A. 2025-08-01
Series:Frontiers in Oncology
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fonc.2025.1613462/full
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849707379012141056
author Joongwon Chae
Zhenyu Wang
Duanpo Wu
Lian Zhang
Alexander Tuzikov
Magrupov Talat Madiyevich
Min Xu
Dongmei Yu
Peiwu Qin
author_facet Joongwon Chae
Zhenyu Wang
Duanpo Wu
Lian Zhang
Alexander Tuzikov
Magrupov Talat Madiyevich
Min Xu
Dongmei Yu
Peiwu Qin
author_sort Joongwon Chae
collection DOAJ
description BackgroundGallbladder cancer (GBC) frequently exhibits non-specific early symptoms, delaying diagnosis. This study (i) assessed whether routine blood biomarkers can distinguish early T stages via machine learning and (ii) compared the T-stage discrimination performance of a large language model (DeepSeek-R1) when supplied with (a) radiology-report text alone versus (b) radiology-report text plus blood-biomarker values.MethodsWe retrospectively analyzed 232 pathologically confirmed GBC patients treated at Lishui Central Hospital between 2023 and 2024 (T1, n = 51; T2, n = 181). Seven blood variables—neutrophil-to-lymphocyte ratio (NLR), monocyte-to-lymphocyte ratio (MLR), platelet-tolymphocyte ratio (PLR), carcino-embryonic antigen (CEA), carbohydrate antigen 19-9 (CA19-9), carbohydrate antigen 125 (CA125), and alpha-fetoprotein (AFP)—were used to train Random forest, Support Vector Machine (SVC), XGBoost, and LightGBM models. Synthetic Minority Over-sampling Technique (SMOTE) was applied only to the training folds in one setting and omitted in another. Model performance was evaluated on an independent test set (N = 47) by the area under the receiver-operating-characteristic curve (AUROC, 95% CI by 1 000-sample bootstrap confidence interval, CI); cross-validation (CV) accuracy served as a supplementary metric. DeepSeek-R1 was prompted in a zero-shot, chain-of-thought manner to classify T1 versus T2 using (a) the radiology report alone or (b) the report plus the patient’s biomarker profile.ResultsBiomarker-based machine-learning models yielded uniformly poor T-stage discrimination. Without SMOTE, individual models such as XGBoost achieved an AUROC of 0.508 on the independent test set, while recall for the T1 class remained low (e.g., 14.3% for some models), indicating performance near random chance. Applying SMOTE to the training data produced statistically significant gains in cross-validation (CV) accuracy for several models (e.g., XGBoost CV Acc. 0.71 → 0.80, p = 0.005; LGBM CV Acc. [No-SMOTE] → [SMOTE], p = 0.004). However, these improvements did not translate to better discrimination on the independent test set; for instance, XGBoost’s AUROC decreased from 0.508 to 0.473 after SMOTE application. Overall, the biomarker models failed to provide clinically meaningful T-stage differentiation. DeepSeek-R1 analyzing radiology text alone reached 89.6% accuracy on the full 232-patient cohort dataset, and consistently flagged T2 cases on phrases such as “gallbladder wall thickening.” Supplying biomarker values did not change accuracy (89.6%)ConclusionsThe evaluated blood biomarkers did not independently aid early T-stage discrimination, and SMOTE offered no meaningful performance gain. Conversely, a radiologytext-driven large language model delivered high accuracy with interpretable rationale, highlighting its potential to guide surgical strategy in GBC. Prospective multi-center studies with larger cohorts are warranted to confirm these findings.
format Article
id doaj-art-d8b2b774c0b04e9b964a89abee8a95a9
institution DOAJ
issn 2234-943X
language English
publishDate 2025-08-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Oncology
spelling doaj-art-d8b2b774c0b04e9b964a89abee8a95a92025-08-20T03:15:56ZengFrontiers Media S.A.Frontiers in Oncology2234-943X2025-08-011510.3389/fonc.2025.16134621613462Pre-operative T-stage discrimination in gallbladder cancer using machine learning and DeepSeek-R1Joongwon Chae0Zhenyu Wang1Duanpo Wu2Lian Zhang3Alexander Tuzikov4Magrupov Talat Madiyevich5Min Xu6Dongmei Yu7Peiwu Qin8Institute of Biopharmaceutical and Health Engineering, Shenzhen International Graduate School, Tsinghua University, Shenzhen, Guangdong, ChinaInstitute of Biopharmaceutical and Health Engineering, Shenzhen International Graduate School, Tsinghua University, Shenzhen, Guangdong, ChinaSchool of Communication Engineering and the Artificial Intelligence Institute, Hangzhou Dianzi University, Hangzhou, Zhejiang, ChinaThe First Hospital of Hebei Medical University, Shijiazhuang, Hebei, ChinaUnited Institute of Informatics Problems, National Academy of Sciences of Belarus, Minsk, BelarusDepartment of Biomedical Engineering & Tashkent State Technical University, Tashkent, UzbekistanAffiliated Fifth Hospital, Wenzhou Medical University, Wenzhou, Zhejiang, ChinaAffiliated Fifth Hospital, Wenzhou Medical University, Wenzhou, Zhejiang, ChinaInstitute of Biopharmaceutical and Health Engineering, Shenzhen International Graduate School, Tsinghua University, Shenzhen, Guangdong, ChinaBackgroundGallbladder cancer (GBC) frequently exhibits non-specific early symptoms, delaying diagnosis. This study (i) assessed whether routine blood biomarkers can distinguish early T stages via machine learning and (ii) compared the T-stage discrimination performance of a large language model (DeepSeek-R1) when supplied with (a) radiology-report text alone versus (b) radiology-report text plus blood-biomarker values.MethodsWe retrospectively analyzed 232 pathologically confirmed GBC patients treated at Lishui Central Hospital between 2023 and 2024 (T1, n = 51; T2, n = 181). Seven blood variables—neutrophil-to-lymphocyte ratio (NLR), monocyte-to-lymphocyte ratio (MLR), platelet-tolymphocyte ratio (PLR), carcino-embryonic antigen (CEA), carbohydrate antigen 19-9 (CA19-9), carbohydrate antigen 125 (CA125), and alpha-fetoprotein (AFP)—were used to train Random forest, Support Vector Machine (SVC), XGBoost, and LightGBM models. Synthetic Minority Over-sampling Technique (SMOTE) was applied only to the training folds in one setting and omitted in another. Model performance was evaluated on an independent test set (N = 47) by the area under the receiver-operating-characteristic curve (AUROC, 95% CI by 1 000-sample bootstrap confidence interval, CI); cross-validation (CV) accuracy served as a supplementary metric. DeepSeek-R1 was prompted in a zero-shot, chain-of-thought manner to classify T1 versus T2 using (a) the radiology report alone or (b) the report plus the patient’s biomarker profile.ResultsBiomarker-based machine-learning models yielded uniformly poor T-stage discrimination. Without SMOTE, individual models such as XGBoost achieved an AUROC of 0.508 on the independent test set, while recall for the T1 class remained low (e.g., 14.3% for some models), indicating performance near random chance. Applying SMOTE to the training data produced statistically significant gains in cross-validation (CV) accuracy for several models (e.g., XGBoost CV Acc. 0.71 → 0.80, p = 0.005; LGBM CV Acc. [No-SMOTE] → [SMOTE], p = 0.004). However, these improvements did not translate to better discrimination on the independent test set; for instance, XGBoost’s AUROC decreased from 0.508 to 0.473 after SMOTE application. Overall, the biomarker models failed to provide clinically meaningful T-stage differentiation. DeepSeek-R1 analyzing radiology text alone reached 89.6% accuracy on the full 232-patient cohort dataset, and consistently flagged T2 cases on phrases such as “gallbladder wall thickening.” Supplying biomarker values did not change accuracy (89.6%)ConclusionsThe evaluated blood biomarkers did not independently aid early T-stage discrimination, and SMOTE offered no meaningful performance gain. Conversely, a radiologytext-driven large language model delivered high accuracy with interpretable rationale, highlighting its potential to guide surgical strategy in GBC. Prospective multi-center studies with larger cohorts are warranted to confirm these findings.https://www.frontiersin.org/articles/10.3389/fonc.2025.1613462/fullgallbladder cancerGBCmachine learninglarge language modelDeepSeek-R1staging
spellingShingle Joongwon Chae
Zhenyu Wang
Duanpo Wu
Lian Zhang
Alexander Tuzikov
Magrupov Talat Madiyevich
Min Xu
Dongmei Yu
Peiwu Qin
Pre-operative T-stage discrimination in gallbladder cancer using machine learning and DeepSeek-R1
Frontiers in Oncology
gallbladder cancer
GBC
machine learning
large language model
DeepSeek-R1
staging
title Pre-operative T-stage discrimination in gallbladder cancer using machine learning and DeepSeek-R1
title_full Pre-operative T-stage discrimination in gallbladder cancer using machine learning and DeepSeek-R1
title_fullStr Pre-operative T-stage discrimination in gallbladder cancer using machine learning and DeepSeek-R1
title_full_unstemmed Pre-operative T-stage discrimination in gallbladder cancer using machine learning and DeepSeek-R1
title_short Pre-operative T-stage discrimination in gallbladder cancer using machine learning and DeepSeek-R1
title_sort pre operative t stage discrimination in gallbladder cancer using machine learning and deepseek r1
topic gallbladder cancer
GBC
machine learning
large language model
DeepSeek-R1
staging
url https://www.frontiersin.org/articles/10.3389/fonc.2025.1613462/full
work_keys_str_mv AT joongwonchae preoperativetstagediscriminationingallbladdercancerusingmachinelearninganddeepseekr1
AT zhenyuwang preoperativetstagediscriminationingallbladdercancerusingmachinelearninganddeepseekr1
AT duanpowu preoperativetstagediscriminationingallbladdercancerusingmachinelearninganddeepseekr1
AT lianzhang preoperativetstagediscriminationingallbladdercancerusingmachinelearninganddeepseekr1
AT alexandertuzikov preoperativetstagediscriminationingallbladdercancerusingmachinelearninganddeepseekr1
AT magrupovtalatmadiyevich preoperativetstagediscriminationingallbladdercancerusingmachinelearninganddeepseekr1
AT minxu preoperativetstagediscriminationingallbladdercancerusingmachinelearninganddeepseekr1
AT dongmeiyu preoperativetstagediscriminationingallbladdercancerusingmachinelearninganddeepseekr1
AT peiwuqin preoperativetstagediscriminationingallbladdercancerusingmachinelearninganddeepseekr1