Machine learning models for prediction of lymph node metastasis in patients with gastric cancer: a Chinese single-centre study with external validation in an Asian American population

Objective To develop and validate machine learning (ML)-based models to predict lymph node metastasis (LNM) in patients with gastric cancer (GC).Design Retrospective cohort study.Setting Second Affiliated Hospital of Soochow University.Participants A total of 500 inpatients from the Second Affiliate...

Full description

Saved in:
Bibliographic Details
Main Authors: Qian Li, Yuan Tian, Wei Peng, Shangcheng Yan, Weiran Yang, Zhuan Du, Ming Cheng, Renwei Chen, Qiankun Shao, Mengchao Sheng, Yongyou Wu
Format: Article
Language:English
Published: BMJ Publishing Group 2025-03-01
Series:BMJ Open
Online Access:https://bmjopen.bmj.com/content/15/3/e098476.full
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Objective To develop and validate machine learning (ML)-based models to predict lymph node metastasis (LNM) in patients with gastric cancer (GC).Design Retrospective cohort study.Setting Second Affiliated Hospital of Soochow University.Participants A total of 500 inpatients from the Second Affiliated Hospital of Soochow University, collected retrospectively between 1 April 2018 and 31 March 2023, were used as the training set, while 824 Asian patients from the Surveillance, Epidemiology and End Results database comprised the external validation set.Main outcome measures Prediction models were developed using multiple ML algorithms, including logistic regression, support vector machine, k-nearest neighbours, naive Bayes, decision tree (DT), gradient boosting DT, random forest and artificial neural network (ANN). The predictive value of these models was validated and evaluated through receiver operating characteristic curves, precision-recall (PR) curves, calibration curves, decision curve analysis and accuracy metrics.Results Among the ML algorithms, the ANN outperformed others, achieving the highest accuracy (0.722; 95% CI: 0.692 to 0.751), precision (0.732; 95% CI: 0.694 to 0.776), F1 score (0.733; 95% CI: 0.695 to 0.773), specificity (0.728; 95% CI: 0.684 to 0.770) and area under the PR curve (0.781; 95% CI: 0.740 to 0.821) in the external validation results. Moreover, it demonstrated superior calibration and clinical utility. Shapley Additive Explanations analysis identified the depth of invasion, tumour size and Lauren classification as the most influential predictors of LNM in patients with GC. Furthermore, a user-friendly web application was developed to provide individual prediction results.Conclusions This study introduces an accurate, reliable and clinically applicable approach for predicting the risk of LNM in patients with GC. The model demonstrates its potential to enhance the personalised management of GC in diverse populations, supported by external validation and an accessible web application for practical use.
ISSN:2044-6055