Predicting metabolic dysfunction associated steatotic liver disease using explainable machine learning methods
Abstract Early and accurate identification of patients at high risk of metabolic dysfunction-associated steatotic liver disease (MASLD) is critical to prevent and improve prognosis potentially. We aimed to develop and validate an explainable prediction model based on machine learning (ML) approaches...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2025-04-01
|
| Series: | Scientific Reports |
| Subjects: | |
| Online Access: | https://doi.org/10.1038/s41598-025-96478-6 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849737351218069504 |
|---|---|
| author | Yihao Yu Yuqi Yang Qian Li Jing Yuan Yan Zha |
| author_facet | Yihao Yu Yuqi Yang Qian Li Jing Yuan Yan Zha |
| author_sort | Yihao Yu |
| collection | DOAJ |
| description | Abstract Early and accurate identification of patients at high risk of metabolic dysfunction-associated steatotic liver disease (MASLD) is critical to prevent and improve prognosis potentially. We aimed to develop and validate an explainable prediction model based on machine learning (ML) approaches for MASLD among the adult population. The national cross-sectional study collected data from the National Health and Nutrition Examination Survey from 2017 to 2020, consisting of 13,436 participants, who were randomly split into 70% training, 20% internal validation, and 10% external validation cohorts. MASLD was defined based on transient elastography and cardiometabolic risk factors. With 50 medical characteristics easily obtained, six ML algorithms were used to develop prediction models. Several evaluation parameters were used to compare the predictive performance, including the area under the receiver-operating-characteristic curve (AUC) and precision-recall (P-R) curve. The recursive feature elimination method was applied to select the optimal feature subset. The Shapley Additive exPlanations method offered global and local explanations for the model. The random forest (RF) model performed best in discriminative ability among 6 ML models, and the optimal 10-feature RF model was finally chosen. The final model could accurately predict MASLD in internal and external validation cohorts (AUC: 0.928, 0.918; area under P-R curve: 0.876, 0.863, respectively). The final model performed better than each of the traditional risk indicators for MASLD. An explainable 10-feature prediction model with excellent discrimination and calibration performance was successfully developed and validated for MASLD based on clinical data easily extracted using an RF algorithm. |
| format | Article |
| id | doaj-art-2bd1a04d42964a3195d2dd83ee41e956 |
| institution | DOAJ |
| issn | 2045-2322 |
| language | English |
| publishDate | 2025-04-01 |
| publisher | Nature Portfolio |
| record_format | Article |
| series | Scientific Reports |
| spelling | doaj-art-2bd1a04d42964a3195d2dd83ee41e9562025-08-20T03:06:57ZengNature PortfolioScientific Reports2045-23222025-04-0115111310.1038/s41598-025-96478-6Predicting metabolic dysfunction associated steatotic liver disease using explainable machine learning methodsYihao Yu0Yuqi Yang1Qian Li2Jing Yuan3Yan Zha4Master of Finance, Australian National UniversityDepartment of Nephrology, Guizhou Provincial People’s HospitalDepartment of Nephrology, Guizhou Provincial People’s HospitalDepartment of Nephrology, Guizhou Provincial People’s HospitalDepartment of Nephrology, Guizhou Provincial People’s HospitalAbstract Early and accurate identification of patients at high risk of metabolic dysfunction-associated steatotic liver disease (MASLD) is critical to prevent and improve prognosis potentially. We aimed to develop and validate an explainable prediction model based on machine learning (ML) approaches for MASLD among the adult population. The national cross-sectional study collected data from the National Health and Nutrition Examination Survey from 2017 to 2020, consisting of 13,436 participants, who were randomly split into 70% training, 20% internal validation, and 10% external validation cohorts. MASLD was defined based on transient elastography and cardiometabolic risk factors. With 50 medical characteristics easily obtained, six ML algorithms were used to develop prediction models. Several evaluation parameters were used to compare the predictive performance, including the area under the receiver-operating-characteristic curve (AUC) and precision-recall (P-R) curve. The recursive feature elimination method was applied to select the optimal feature subset. The Shapley Additive exPlanations method offered global and local explanations for the model. The random forest (RF) model performed best in discriminative ability among 6 ML models, and the optimal 10-feature RF model was finally chosen. The final model could accurately predict MASLD in internal and external validation cohorts (AUC: 0.928, 0.918; area under P-R curve: 0.876, 0.863, respectively). The final model performed better than each of the traditional risk indicators for MASLD. An explainable 10-feature prediction model with excellent discrimination and calibration performance was successfully developed and validated for MASLD based on clinical data easily extracted using an RF algorithm.https://doi.org/10.1038/s41598-025-96478-6Metabolic dysfunction-associated steatotic liver diseasePrediction modelMachine learningSHAP |
| spellingShingle | Yihao Yu Yuqi Yang Qian Li Jing Yuan Yan Zha Predicting metabolic dysfunction associated steatotic liver disease using explainable machine learning methods Scientific Reports Metabolic dysfunction-associated steatotic liver disease Prediction model Machine learning SHAP |
| title | Predicting metabolic dysfunction associated steatotic liver disease using explainable machine learning methods |
| title_full | Predicting metabolic dysfunction associated steatotic liver disease using explainable machine learning methods |
| title_fullStr | Predicting metabolic dysfunction associated steatotic liver disease using explainable machine learning methods |
| title_full_unstemmed | Predicting metabolic dysfunction associated steatotic liver disease using explainable machine learning methods |
| title_short | Predicting metabolic dysfunction associated steatotic liver disease using explainable machine learning methods |
| title_sort | predicting metabolic dysfunction associated steatotic liver disease using explainable machine learning methods |
| topic | Metabolic dysfunction-associated steatotic liver disease Prediction model Machine learning SHAP |
| url | https://doi.org/10.1038/s41598-025-96478-6 |
| work_keys_str_mv | AT yihaoyu predictingmetabolicdysfunctionassociatedsteatoticliverdiseaseusingexplainablemachinelearningmethods AT yuqiyang predictingmetabolicdysfunctionassociatedsteatoticliverdiseaseusingexplainablemachinelearningmethods AT qianli predictingmetabolicdysfunctionassociatedsteatoticliverdiseaseusingexplainablemachinelearningmethods AT jingyuan predictingmetabolicdysfunctionassociatedsteatoticliverdiseaseusingexplainablemachinelearningmethods AT yanzha predictingmetabolicdysfunctionassociatedsteatoticliverdiseaseusingexplainablemachinelearningmethods |