Machine Learning-Based Non-Invasive Prediction of Metabolic Dysfunction-Associated Steatohepatitis in Obese Patients: A Retrospective Study

<b>Objectives</b>: We aimed to develop and validate machine learning (ML) models that integrate clinical and laboratory data for the non-invasive prediction of metabolic dysfunction-associated steatohepatitis (MASH) in an obese population. <b>Methods</b>: In this retrospectiv...

Full description

Saved in:
Bibliographic Details
Main Authors: Jie Chen, Bo Zhang, Yong Cheng, Yuanchen Jia, Biao Zhou
Format: Article
Language:English
Published: MDPI AG 2025-04-01
Series:Diagnostics
Subjects:
Online Access:https://www.mdpi.com/2075-4418/15/9/1096
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:<b>Objectives</b>: We aimed to develop and validate machine learning (ML) models that integrate clinical and laboratory data for the non-invasive prediction of metabolic dysfunction-associated steatohepatitis (MASH) in an obese population. <b>Methods</b>: In this retrospective study, clinical and laboratory data were collected from obese patients undergoing bariatric surgery. The cohort was divided using stratified random sampling, and optimal features were selected with SHapley Additive exPlanations (SHAP). Various ML models, including K-nearest neighbors, linear support vector machine, radial basis function support vector machine, Gaussian process, random forest, multilayer perceptron, adaptive boosting, and naïve Bayes, were developed through cross-validation and hyperparameter tuning. Diagnostic performance was assessed via the area under the curve (AUC) in both training and validation sets. <b>Results</b>: A total of 558 patients were analyzed, with 390 in the training set and 168 in the validation set. In the training cohort, the median age was 35 years, the median body mass index (BMI) was 39.8 kg/m<sup>2</sup>, 39.0% were male, 37.9% had diabetes mellitus, and 62.8% were diagnosed with MASH. The validation cohort had a median age of 34.1 years, a median BMI of 42.5 kg/m<sup>2</sup>, 41.7% male, 32.7% with diabetes, and 39.9% with MASH. Among the models, the random forest achieved the highest performance among the models with AUC values of 0.94 in the training set and 0.88 in the validation set. The Gaussian process model attained an AUC of 0.97 in the training cohort but 0.79 in the validation cohort, while the other models achieved AUC values ranging from 0.63 to 0.88 in the training cohort and 0.62 to 0.75 in the validation set. <b>Conclusions</b>: ML models, particularly the random forest, effectively predict MASH using readily available data, offering a promising non-invasive alternative to conventional serological scoring. Prospective studies and external validations are needed to further establish clinical utility.
ISSN:2075-4418