Machine learning prediction of metabolic dysfunction-associated fatty liver disease risk in American adults using body composition: explainable analysis based on SHapley Additive exPlanations
BackgroundMetabolic dysfunction-associated fatty liver disease (MAFLD) is a prevalent and progressive liver disorder closely linked to obesity and metabolic dysregulation. Traditional anthropometric measures such as body mass index (BMI) are limited in their ability to capture fat distribution and a...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Frontiers Media S.A.
2025-06-01
|
| Series: | Frontiers in Nutrition |
| Subjects: | |
| Online Access: | https://www.frontiersin.org/articles/10.3389/fnut.2025.1616229/full |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849431714359672832 |
|---|---|
| author | Yan Hong Xinrong Chen Ling Wang Fan Zhang ZiYing Zeng Weining Xie |
| author_facet | Yan Hong Xinrong Chen Ling Wang Fan Zhang ZiYing Zeng Weining Xie |
| author_sort | Yan Hong |
| collection | DOAJ |
| description | BackgroundMetabolic dysfunction-associated fatty liver disease (MAFLD) is a prevalent and progressive liver disorder closely linked to obesity and metabolic dysregulation. Traditional anthropometric measures such as body mass index (BMI) are limited in their ability to capture fat distribution and associated risk. This study aimed to develop and validate machine learning (ML) models for predicting MAFLD using detailed body composition metrics and to explore the relative contributions of adipose tissue features through explainable ML techniques.MethodsData from the 2017–2018 National Health and Nutrition Examination Survey (NHANES) were used to construct predictive models based on anthropometric, demographic, lifestyle, and clinical variables. Six ML algorithms were implemented: decision tree (DT), support vector machine (SVM), generalized linear model (GLM), gradient boosting machine (GBM), random forest (RF), and XGBoost. The Boruta algorithm was used for feature selection, and model performance was evaluated using cross-validation and a validation set. SHapley Additive exPlanations (SHAP) were employed to interpret feature contributions.ResultsAmong the six models, the GBM algorithm exhibited the best performance, achieving area under the receiver operating characteristic curve (AUC) values of 0.875 (training) and 0.879 (validation), with minimal fluctuations in sensitivity and specificity. SHAP analysis identified visceral adipose tissue (VAT), BMI, and subcutaneous adipose tissue (SAT) as the most influential predictors. VAT had the highest SHAP value, underscoring its central role in MAFLD pathogenesis.ConclusionThis study demonstrates the effectiveness of integrating body composition features with machine learning techniques for MAFLD risk prediction. The GBM model offers robust predictive accuracy and interpretability, with potential applications in clinical decision-making and public health screening strategies. SHAP analysis provides meaningful insights into the relative importance of adiposity measures, reinforcing the value of fat distribution metrics beyond conventional obesity indices. |
| format | Article |
| id | doaj-art-a7b1d2cdea664df8a76d3592c172de18 |
| institution | Kabale University |
| issn | 2296-861X |
| language | English |
| publishDate | 2025-06-01 |
| publisher | Frontiers Media S.A. |
| record_format | Article |
| series | Frontiers in Nutrition |
| spelling | doaj-art-a7b1d2cdea664df8a76d3592c172de182025-08-20T03:27:33ZengFrontiers Media S.A.Frontiers in Nutrition2296-861X2025-06-011210.3389/fnut.2025.16162291616229Machine learning prediction of metabolic dysfunction-associated fatty liver disease risk in American adults using body composition: explainable analysis based on SHapley Additive exPlanationsYan Hong0Xinrong Chen1Ling Wang2Fan Zhang3ZiYing Zeng4Weining Xie5Affiliated Guangdong Hospital of Integrated Traditional Chinese and Western Medicine of Guangzhou University of Chinese Medicine, Guangzhou University of Chinese Medicine, Foshan, ChinaFirst Clinical Medical College, Guangzhou University of Chinese Medicine, Guangzhou, ChinaFirst Affiliated Hospital of Guangxi University of Chinese Medicine, Nanning, ChinaAffiliated Guangdong Hospital of Integrated Traditional Chinese and Western Medicine of Guangzhou University of Chinese Medicine, Guangzhou University of Chinese Medicine, Foshan, ChinaAffiliated Guangdong Hospital of Integrated Traditional Chinese and Western Medicine of Guangzhou University of Chinese Medicine, Guangzhou University of Chinese Medicine, Foshan, ChinaInfectious Disease Department, Guangdong Provincial Hospital of Integrated Traditional Chinese and Western Medicine, Foshan, ChinaBackgroundMetabolic dysfunction-associated fatty liver disease (MAFLD) is a prevalent and progressive liver disorder closely linked to obesity and metabolic dysregulation. Traditional anthropometric measures such as body mass index (BMI) are limited in their ability to capture fat distribution and associated risk. This study aimed to develop and validate machine learning (ML) models for predicting MAFLD using detailed body composition metrics and to explore the relative contributions of adipose tissue features through explainable ML techniques.MethodsData from the 2017–2018 National Health and Nutrition Examination Survey (NHANES) were used to construct predictive models based on anthropometric, demographic, lifestyle, and clinical variables. Six ML algorithms were implemented: decision tree (DT), support vector machine (SVM), generalized linear model (GLM), gradient boosting machine (GBM), random forest (RF), and XGBoost. The Boruta algorithm was used for feature selection, and model performance was evaluated using cross-validation and a validation set. SHapley Additive exPlanations (SHAP) were employed to interpret feature contributions.ResultsAmong the six models, the GBM algorithm exhibited the best performance, achieving area under the receiver operating characteristic curve (AUC) values of 0.875 (training) and 0.879 (validation), with minimal fluctuations in sensitivity and specificity. SHAP analysis identified visceral adipose tissue (VAT), BMI, and subcutaneous adipose tissue (SAT) as the most influential predictors. VAT had the highest SHAP value, underscoring its central role in MAFLD pathogenesis.ConclusionThis study demonstrates the effectiveness of integrating body composition features with machine learning techniques for MAFLD risk prediction. The GBM model offers robust predictive accuracy and interpretability, with potential applications in clinical decision-making and public health screening strategies. SHAP analysis provides meaningful insights into the relative importance of adiposity measures, reinforcing the value of fat distribution metrics beyond conventional obesity indices.https://www.frontiersin.org/articles/10.3389/fnut.2025.1616229/fullMAFLDbody compositionmachine learningSHAPNHANES |
| spellingShingle | Yan Hong Xinrong Chen Ling Wang Fan Zhang ZiYing Zeng Weining Xie Machine learning prediction of metabolic dysfunction-associated fatty liver disease risk in American adults using body composition: explainable analysis based on SHapley Additive exPlanations Frontiers in Nutrition MAFLD body composition machine learning SHAP NHANES |
| title | Machine learning prediction of metabolic dysfunction-associated fatty liver disease risk in American adults using body composition: explainable analysis based on SHapley Additive exPlanations |
| title_full | Machine learning prediction of metabolic dysfunction-associated fatty liver disease risk in American adults using body composition: explainable analysis based on SHapley Additive exPlanations |
| title_fullStr | Machine learning prediction of metabolic dysfunction-associated fatty liver disease risk in American adults using body composition: explainable analysis based on SHapley Additive exPlanations |
| title_full_unstemmed | Machine learning prediction of metabolic dysfunction-associated fatty liver disease risk in American adults using body composition: explainable analysis based on SHapley Additive exPlanations |
| title_short | Machine learning prediction of metabolic dysfunction-associated fatty liver disease risk in American adults using body composition: explainable analysis based on SHapley Additive exPlanations |
| title_sort | machine learning prediction of metabolic dysfunction associated fatty liver disease risk in american adults using body composition explainable analysis based on shapley additive explanations |
| topic | MAFLD body composition machine learning SHAP NHANES |
| url | https://www.frontiersin.org/articles/10.3389/fnut.2025.1616229/full |
| work_keys_str_mv | AT yanhong machinelearningpredictionofmetabolicdysfunctionassociatedfattyliverdiseaseriskinamericanadultsusingbodycompositionexplainableanalysisbasedonshapleyadditiveexplanations AT xinrongchen machinelearningpredictionofmetabolicdysfunctionassociatedfattyliverdiseaseriskinamericanadultsusingbodycompositionexplainableanalysisbasedonshapleyadditiveexplanations AT lingwang machinelearningpredictionofmetabolicdysfunctionassociatedfattyliverdiseaseriskinamericanadultsusingbodycompositionexplainableanalysisbasedonshapleyadditiveexplanations AT fanzhang machinelearningpredictionofmetabolicdysfunctionassociatedfattyliverdiseaseriskinamericanadultsusingbodycompositionexplainableanalysisbasedonshapleyadditiveexplanations AT ziyingzeng machinelearningpredictionofmetabolicdysfunctionassociatedfattyliverdiseaseriskinamericanadultsusingbodycompositionexplainableanalysisbasedonshapleyadditiveexplanations AT weiningxie machinelearningpredictionofmetabolicdysfunctionassociatedfattyliverdiseaseriskinamericanadultsusingbodycompositionexplainableanalysisbasedonshapleyadditiveexplanations |