Machine learning for prediction of Helicobacter pylori infection based on basic health examination data in adults: a retrospective study

ObjectiveThis study aimed to investigate the feasibility of developing machine learning models for non-invasive prediction of Helicobacter pylori (H pylori) infection using routinely collected adult health screening data, including demographic characteristics and clinical biomarkers, to establish a...

Full description

Saved in:
Bibliographic Details
Main Authors: Qiaoli Wang, Tao Liang, Yuexi Li, Peng Zhou, Xiaoqin Liu
Format: Article
Language:English
Published: Frontiers Media S.A. 2025-06-01
Series:Frontiers in Medicine
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fmed.2025.1587540/full
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849468571354136576
author Qiaoli Wang
Tao Liang
Yuexi Li
Peng Zhou
Xiaoqin Liu
author_facet Qiaoli Wang
Tao Liang
Yuexi Li
Peng Zhou
Xiaoqin Liu
author_sort Qiaoli Wang
collection DOAJ
description ObjectiveThis study aimed to investigate the feasibility of developing machine learning models for non-invasive prediction of Helicobacter pylori (H pylori) infection using routinely collected adult health screening data, including demographic characteristics and clinical biomarkers, to establish a potential decision-support tool for clinical practice.MethodsThe data was sourced from the adult health examination records within the health management centers of the hospital. The Least Absolute Shrinkage and Selection Operator (LASSO) regression was employed for feature selection. Six distinct machine learning algorithms were utilized to construct the predictive models, and their performance was comprehensively evaluated. Additionally, the SHapley Additive Projection (SHAP) method was adopted to visualize the model features and the prediction results of individual cases.ResultsA total of 10,393 subjects were included in the dataset, with 3,278 (31.54%) having H pylori infection. After feature screening, 10 factors were selected for the prediction model. Among six machine—learning models, the Extra Trees model had the best performance, with an AUC of 0.827, Accuracy of 0.744, and Recall of 0.736. The Random Forest model also did well, with an AUC of 0.810. XGBoost attained an AUC of 0.801, indicating moderate predictive capability. SHAP analysis showed that age, WBC, ALB, gender, and wasit were the top five factors affecting H pylori infection. Higher age, WBC, wasit and lower ALB were linked to a higher infection probability. These results offer insights into H pylori infection risk factors and model performance.ConclusionThe Extra Trees classifier exhibited the optimal performance in predicting H pylori infections among the evaluated models. Additionally, the SHAP analysis enhanced the interpretability of the model, which offers valuable insights for early—stage clinical prediction and intervention strategies.
format Article
id doaj-art-2d15dc65e80249cdac99ce6b43ee06da
institution Kabale University
issn 2296-858X
language English
publishDate 2025-06-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Medicine
spelling doaj-art-2d15dc65e80249cdac99ce6b43ee06da2025-08-20T03:25:49ZengFrontiers Media S.A.Frontiers in Medicine2296-858X2025-06-011210.3389/fmed.2025.15875401587540Machine learning for prediction of Helicobacter pylori infection based on basic health examination data in adults: a retrospective studyQiaoli Wang0Tao Liang1Yuexi Li2Peng Zhou3Xiaoqin Liu4Health Management Center, Deyang People’s Hospital, Deyang, Sichuan, ChinaDepartment of Gastroenterology, Deyang People’s Hospital, Deyang, Sichuan, ChinaHealth Management Center, Deyang People’s Hospital, Deyang, Sichuan, ChinaHealth Management Center, Deyang People’s Hospital, Deyang, Sichuan, ChinaHealth Management Center, Deyang People’s Hospital, Deyang, Sichuan, ChinaObjectiveThis study aimed to investigate the feasibility of developing machine learning models for non-invasive prediction of Helicobacter pylori (H pylori) infection using routinely collected adult health screening data, including demographic characteristics and clinical biomarkers, to establish a potential decision-support tool for clinical practice.MethodsThe data was sourced from the adult health examination records within the health management centers of the hospital. The Least Absolute Shrinkage and Selection Operator (LASSO) regression was employed for feature selection. Six distinct machine learning algorithms were utilized to construct the predictive models, and their performance was comprehensively evaluated. Additionally, the SHapley Additive Projection (SHAP) method was adopted to visualize the model features and the prediction results of individual cases.ResultsA total of 10,393 subjects were included in the dataset, with 3,278 (31.54%) having H pylori infection. After feature screening, 10 factors were selected for the prediction model. Among six machine—learning models, the Extra Trees model had the best performance, with an AUC of 0.827, Accuracy of 0.744, and Recall of 0.736. The Random Forest model also did well, with an AUC of 0.810. XGBoost attained an AUC of 0.801, indicating moderate predictive capability. SHAP analysis showed that age, WBC, ALB, gender, and wasit were the top five factors affecting H pylori infection. Higher age, WBC, wasit and lower ALB were linked to a higher infection probability. These results offer insights into H pylori infection risk factors and model performance.ConclusionThe Extra Trees classifier exhibited the optimal performance in predicting H pylori infections among the evaluated models. Additionally, the SHAP analysis enhanced the interpretability of the model, which offers valuable insights for early—stage clinical prediction and intervention strategies.https://www.frontiersin.org/articles/10.3389/fmed.2025.1587540/fullmachine learningH pylori infectionbasic health examinationSHAP analysishealth examination
spellingShingle Qiaoli Wang
Tao Liang
Yuexi Li
Peng Zhou
Xiaoqin Liu
Machine learning for prediction of Helicobacter pylori infection based on basic health examination data in adults: a retrospective study
Frontiers in Medicine
machine learning
H pylori infection
basic health examination
SHAP analysis
health examination
title Machine learning for prediction of Helicobacter pylori infection based on basic health examination data in adults: a retrospective study
title_full Machine learning for prediction of Helicobacter pylori infection based on basic health examination data in adults: a retrospective study
title_fullStr Machine learning for prediction of Helicobacter pylori infection based on basic health examination data in adults: a retrospective study
title_full_unstemmed Machine learning for prediction of Helicobacter pylori infection based on basic health examination data in adults: a retrospective study
title_short Machine learning for prediction of Helicobacter pylori infection based on basic health examination data in adults: a retrospective study
title_sort machine learning for prediction of helicobacter pylori infection based on basic health examination data in adults a retrospective study
topic machine learning
H pylori infection
basic health examination
SHAP analysis
health examination
url https://www.frontiersin.org/articles/10.3389/fmed.2025.1587540/full
work_keys_str_mv AT qiaoliwang machinelearningforpredictionofhelicobacterpyloriinfectionbasedonbasichealthexaminationdatainadultsaretrospectivestudy
AT taoliang machinelearningforpredictionofhelicobacterpyloriinfectionbasedonbasichealthexaminationdatainadultsaretrospectivestudy
AT yuexili machinelearningforpredictionofhelicobacterpyloriinfectionbasedonbasichealthexaminationdatainadultsaretrospectivestudy
AT pengzhou machinelearningforpredictionofhelicobacterpyloriinfectionbasedonbasichealthexaminationdatainadultsaretrospectivestudy
AT xiaoqinliu machinelearningforpredictionofhelicobacterpyloriinfectionbasedonbasichealthexaminationdatainadultsaretrospectivestudy