Machine learning prediction of metabolic dysfunction-associated fatty liver disease risk in American adults using body composition: explainable analysis based on SHapley Additive exPlanations

BackgroundMetabolic dysfunction-associated fatty liver disease (MAFLD) is a prevalent and progressive liver disorder closely linked to obesity and metabolic dysregulation. Traditional anthropometric measures such as body mass index (BMI) are limited in their ability to capture fat distribution and a...

Full description

Saved in:
Bibliographic Details
Main Authors: Yan Hong, Xinrong Chen, Ling Wang, Fan Zhang, ZiYing Zeng, Weining Xie
Format: Article
Language:English
Published: Frontiers Media S.A. 2025-06-01
Series:Frontiers in Nutrition
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fnut.2025.1616229/full
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849431714359672832
author Yan Hong
Xinrong Chen
Ling Wang
Fan Zhang
ZiYing Zeng
Weining Xie
author_facet Yan Hong
Xinrong Chen
Ling Wang
Fan Zhang
ZiYing Zeng
Weining Xie
author_sort Yan Hong
collection DOAJ
description BackgroundMetabolic dysfunction-associated fatty liver disease (MAFLD) is a prevalent and progressive liver disorder closely linked to obesity and metabolic dysregulation. Traditional anthropometric measures such as body mass index (BMI) are limited in their ability to capture fat distribution and associated risk. This study aimed to develop and validate machine learning (ML) models for predicting MAFLD using detailed body composition metrics and to explore the relative contributions of adipose tissue features through explainable ML techniques.MethodsData from the 2017–2018 National Health and Nutrition Examination Survey (NHANES) were used to construct predictive models based on anthropometric, demographic, lifestyle, and clinical variables. Six ML algorithms were implemented: decision tree (DT), support vector machine (SVM), generalized linear model (GLM), gradient boosting machine (GBM), random forest (RF), and XGBoost. The Boruta algorithm was used for feature selection, and model performance was evaluated using cross-validation and a validation set. SHapley Additive exPlanations (SHAP) were employed to interpret feature contributions.ResultsAmong the six models, the GBM algorithm exhibited the best performance, achieving area under the receiver operating characteristic curve (AUC) values of 0.875 (training) and 0.879 (validation), with minimal fluctuations in sensitivity and specificity. SHAP analysis identified visceral adipose tissue (VAT), BMI, and subcutaneous adipose tissue (SAT) as the most influential predictors. VAT had the highest SHAP value, underscoring its central role in MAFLD pathogenesis.ConclusionThis study demonstrates the effectiveness of integrating body composition features with machine learning techniques for MAFLD risk prediction. The GBM model offers robust predictive accuracy and interpretability, with potential applications in clinical decision-making and public health screening strategies. SHAP analysis provides meaningful insights into the relative importance of adiposity measures, reinforcing the value of fat distribution metrics beyond conventional obesity indices.
format Article
id doaj-art-a7b1d2cdea664df8a76d3592c172de18
institution Kabale University
issn 2296-861X
language English
publishDate 2025-06-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Nutrition
spelling doaj-art-a7b1d2cdea664df8a76d3592c172de182025-08-20T03:27:33ZengFrontiers Media S.A.Frontiers in Nutrition2296-861X2025-06-011210.3389/fnut.2025.16162291616229Machine learning prediction of metabolic dysfunction-associated fatty liver disease risk in American adults using body composition: explainable analysis based on SHapley Additive exPlanationsYan Hong0Xinrong Chen1Ling Wang2Fan Zhang3ZiYing Zeng4Weining Xie5Affiliated Guangdong Hospital of Integrated Traditional Chinese and Western Medicine of Guangzhou University of Chinese Medicine, Guangzhou University of Chinese Medicine, Foshan, ChinaFirst Clinical Medical College, Guangzhou University of Chinese Medicine, Guangzhou, ChinaFirst Affiliated Hospital of Guangxi University of Chinese Medicine, Nanning, ChinaAffiliated Guangdong Hospital of Integrated Traditional Chinese and Western Medicine of Guangzhou University of Chinese Medicine, Guangzhou University of Chinese Medicine, Foshan, ChinaAffiliated Guangdong Hospital of Integrated Traditional Chinese and Western Medicine of Guangzhou University of Chinese Medicine, Guangzhou University of Chinese Medicine, Foshan, ChinaInfectious Disease Department, Guangdong Provincial Hospital of Integrated Traditional Chinese and Western Medicine, Foshan, ChinaBackgroundMetabolic dysfunction-associated fatty liver disease (MAFLD) is a prevalent and progressive liver disorder closely linked to obesity and metabolic dysregulation. Traditional anthropometric measures such as body mass index (BMI) are limited in their ability to capture fat distribution and associated risk. This study aimed to develop and validate machine learning (ML) models for predicting MAFLD using detailed body composition metrics and to explore the relative contributions of adipose tissue features through explainable ML techniques.MethodsData from the 2017–2018 National Health and Nutrition Examination Survey (NHANES) were used to construct predictive models based on anthropometric, demographic, lifestyle, and clinical variables. Six ML algorithms were implemented: decision tree (DT), support vector machine (SVM), generalized linear model (GLM), gradient boosting machine (GBM), random forest (RF), and XGBoost. The Boruta algorithm was used for feature selection, and model performance was evaluated using cross-validation and a validation set. SHapley Additive exPlanations (SHAP) were employed to interpret feature contributions.ResultsAmong the six models, the GBM algorithm exhibited the best performance, achieving area under the receiver operating characteristic curve (AUC) values of 0.875 (training) and 0.879 (validation), with minimal fluctuations in sensitivity and specificity. SHAP analysis identified visceral adipose tissue (VAT), BMI, and subcutaneous adipose tissue (SAT) as the most influential predictors. VAT had the highest SHAP value, underscoring its central role in MAFLD pathogenesis.ConclusionThis study demonstrates the effectiveness of integrating body composition features with machine learning techniques for MAFLD risk prediction. The GBM model offers robust predictive accuracy and interpretability, with potential applications in clinical decision-making and public health screening strategies. SHAP analysis provides meaningful insights into the relative importance of adiposity measures, reinforcing the value of fat distribution metrics beyond conventional obesity indices.https://www.frontiersin.org/articles/10.3389/fnut.2025.1616229/fullMAFLDbody compositionmachine learningSHAPNHANES
spellingShingle Yan Hong
Xinrong Chen
Ling Wang
Fan Zhang
ZiYing Zeng
Weining Xie
Machine learning prediction of metabolic dysfunction-associated fatty liver disease risk in American adults using body composition: explainable analysis based on SHapley Additive exPlanations
Frontiers in Nutrition
MAFLD
body composition
machine learning
SHAP
NHANES
title Machine learning prediction of metabolic dysfunction-associated fatty liver disease risk in American adults using body composition: explainable analysis based on SHapley Additive exPlanations
title_full Machine learning prediction of metabolic dysfunction-associated fatty liver disease risk in American adults using body composition: explainable analysis based on SHapley Additive exPlanations
title_fullStr Machine learning prediction of metabolic dysfunction-associated fatty liver disease risk in American adults using body composition: explainable analysis based on SHapley Additive exPlanations
title_full_unstemmed Machine learning prediction of metabolic dysfunction-associated fatty liver disease risk in American adults using body composition: explainable analysis based on SHapley Additive exPlanations
title_short Machine learning prediction of metabolic dysfunction-associated fatty liver disease risk in American adults using body composition: explainable analysis based on SHapley Additive exPlanations
title_sort machine learning prediction of metabolic dysfunction associated fatty liver disease risk in american adults using body composition explainable analysis based on shapley additive explanations
topic MAFLD
body composition
machine learning
SHAP
NHANES
url https://www.frontiersin.org/articles/10.3389/fnut.2025.1616229/full
work_keys_str_mv AT yanhong machinelearningpredictionofmetabolicdysfunctionassociatedfattyliverdiseaseriskinamericanadultsusingbodycompositionexplainableanalysisbasedonshapleyadditiveexplanations
AT xinrongchen machinelearningpredictionofmetabolicdysfunctionassociatedfattyliverdiseaseriskinamericanadultsusingbodycompositionexplainableanalysisbasedonshapleyadditiveexplanations
AT lingwang machinelearningpredictionofmetabolicdysfunctionassociatedfattyliverdiseaseriskinamericanadultsusingbodycompositionexplainableanalysisbasedonshapleyadditiveexplanations
AT fanzhang machinelearningpredictionofmetabolicdysfunctionassociatedfattyliverdiseaseriskinamericanadultsusingbodycompositionexplainableanalysisbasedonshapleyadditiveexplanations
AT ziyingzeng machinelearningpredictionofmetabolicdysfunctionassociatedfattyliverdiseaseriskinamericanadultsusingbodycompositionexplainableanalysisbasedonshapleyadditiveexplanations
AT weiningxie machinelearningpredictionofmetabolicdysfunctionassociatedfattyliverdiseaseriskinamericanadultsusingbodycompositionexplainableanalysisbasedonshapleyadditiveexplanations