Predicting metabolic dysfunction associated steatotic liver disease using explainable machine learning methods

Abstract Early and accurate identification of patients at high risk of metabolic dysfunction-associated steatotic liver disease (MASLD) is critical to prevent and improve prognosis potentially. We aimed to develop and validate an explainable prediction model based on machine learning (ML) approaches...

Full description

Saved in:
Bibliographic Details
Main Authors: Yihao Yu, Yuqi Yang, Qian Li, Jing Yuan, Yan Zha
Format: Article
Language:English
Published: Nature Portfolio 2025-04-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-025-96478-6
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849737351218069504
author Yihao Yu
Yuqi Yang
Qian Li
Jing Yuan
Yan Zha
author_facet Yihao Yu
Yuqi Yang
Qian Li
Jing Yuan
Yan Zha
author_sort Yihao Yu
collection DOAJ
description Abstract Early and accurate identification of patients at high risk of metabolic dysfunction-associated steatotic liver disease (MASLD) is critical to prevent and improve prognosis potentially. We aimed to develop and validate an explainable prediction model based on machine learning (ML) approaches for MASLD among the adult population. The national cross-sectional study collected data from the National Health and Nutrition Examination Survey from 2017 to 2020, consisting of 13,436 participants, who were randomly split into 70% training, 20% internal validation, and 10% external validation cohorts. MASLD was defined based on transient elastography and cardiometabolic risk factors. With 50 medical characteristics easily obtained, six ML algorithms were used to develop prediction models. Several evaluation parameters were used to compare the predictive performance, including the area under the receiver-operating-characteristic curve (AUC) and precision-recall (P-R) curve. The recursive feature elimination method was applied to select the optimal feature subset. The Shapley Additive exPlanations method offered global and local explanations for the model. The random forest (RF) model performed best in discriminative ability among 6 ML models, and the optimal 10-feature RF model was finally chosen. The final model could accurately predict MASLD in internal and external validation cohorts (AUC: 0.928, 0.918; area under P-R curve: 0.876, 0.863, respectively). The final model performed better than each of the traditional risk indicators for MASLD. An explainable 10-feature prediction model with excellent discrimination and calibration performance was successfully developed and validated for MASLD based on clinical data easily extracted using an RF algorithm.
format Article
id doaj-art-2bd1a04d42964a3195d2dd83ee41e956
institution DOAJ
issn 2045-2322
language English
publishDate 2025-04-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-2bd1a04d42964a3195d2dd83ee41e9562025-08-20T03:06:57ZengNature PortfolioScientific Reports2045-23222025-04-0115111310.1038/s41598-025-96478-6Predicting metabolic dysfunction associated steatotic liver disease using explainable machine learning methodsYihao Yu0Yuqi Yang1Qian Li2Jing Yuan3Yan Zha4Master of Finance, Australian National UniversityDepartment of Nephrology, Guizhou Provincial People’s HospitalDepartment of Nephrology, Guizhou Provincial People’s HospitalDepartment of Nephrology, Guizhou Provincial People’s HospitalDepartment of Nephrology, Guizhou Provincial People’s HospitalAbstract Early and accurate identification of patients at high risk of metabolic dysfunction-associated steatotic liver disease (MASLD) is critical to prevent and improve prognosis potentially. We aimed to develop and validate an explainable prediction model based on machine learning (ML) approaches for MASLD among the adult population. The national cross-sectional study collected data from the National Health and Nutrition Examination Survey from 2017 to 2020, consisting of 13,436 participants, who were randomly split into 70% training, 20% internal validation, and 10% external validation cohorts. MASLD was defined based on transient elastography and cardiometabolic risk factors. With 50 medical characteristics easily obtained, six ML algorithms were used to develop prediction models. Several evaluation parameters were used to compare the predictive performance, including the area under the receiver-operating-characteristic curve (AUC) and precision-recall (P-R) curve. The recursive feature elimination method was applied to select the optimal feature subset. The Shapley Additive exPlanations method offered global and local explanations for the model. The random forest (RF) model performed best in discriminative ability among 6 ML models, and the optimal 10-feature RF model was finally chosen. The final model could accurately predict MASLD in internal and external validation cohorts (AUC: 0.928, 0.918; area under P-R curve: 0.876, 0.863, respectively). The final model performed better than each of the traditional risk indicators for MASLD. An explainable 10-feature prediction model with excellent discrimination and calibration performance was successfully developed and validated for MASLD based on clinical data easily extracted using an RF algorithm.https://doi.org/10.1038/s41598-025-96478-6Metabolic dysfunction-associated steatotic liver diseasePrediction modelMachine learningSHAP
spellingShingle Yihao Yu
Yuqi Yang
Qian Li
Jing Yuan
Yan Zha
Predicting metabolic dysfunction associated steatotic liver disease using explainable machine learning methods
Scientific Reports
Metabolic dysfunction-associated steatotic liver disease
Prediction model
Machine learning
SHAP
title Predicting metabolic dysfunction associated steatotic liver disease using explainable machine learning methods
title_full Predicting metabolic dysfunction associated steatotic liver disease using explainable machine learning methods
title_fullStr Predicting metabolic dysfunction associated steatotic liver disease using explainable machine learning methods
title_full_unstemmed Predicting metabolic dysfunction associated steatotic liver disease using explainable machine learning methods
title_short Predicting metabolic dysfunction associated steatotic liver disease using explainable machine learning methods
title_sort predicting metabolic dysfunction associated steatotic liver disease using explainable machine learning methods
topic Metabolic dysfunction-associated steatotic liver disease
Prediction model
Machine learning
SHAP
url https://doi.org/10.1038/s41598-025-96478-6
work_keys_str_mv AT yihaoyu predictingmetabolicdysfunctionassociatedsteatoticliverdiseaseusingexplainablemachinelearningmethods
AT yuqiyang predictingmetabolicdysfunctionassociatedsteatoticliverdiseaseusingexplainablemachinelearningmethods
AT qianli predictingmetabolicdysfunctionassociatedsteatoticliverdiseaseusingexplainablemachinelearningmethods
AT jingyuan predictingmetabolicdysfunctionassociatedsteatoticliverdiseaseusingexplainablemachinelearningmethods
AT yanzha predictingmetabolicdysfunctionassociatedsteatoticliverdiseaseusingexplainablemachinelearningmethods