Predicting the risk of lean non-alcoholic fatty liver disease based on interpretable machine models in a Chinese T2DM population
BackgroundNon-alcoholic fatty liver disease (NAFLD) is the most common chronic liver disease, seriously threatening the public health. Although the proportion of patients with lean NAFLD is lower than that of patients with obese NALFD, it should not be overlooked. This study aimed to construct inter...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Frontiers Media S.A.
2025-07-01
|
| Series: | Frontiers in Endocrinology |
| Subjects: | |
| Online Access: | https://www.frontiersin.org/articles/10.3389/fendo.2025.1626203/full |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849321175301226496 |
|---|---|
| author | Shixue Bao Qiankai Jin Tieqiao Wang Yushan Mao Guoqing Huang |
| author_facet | Shixue Bao Qiankai Jin Tieqiao Wang Yushan Mao Guoqing Huang |
| author_sort | Shixue Bao |
| collection | DOAJ |
| description | BackgroundNon-alcoholic fatty liver disease (NAFLD) is the most common chronic liver disease, seriously threatening the public health. Although the proportion of patients with lean NAFLD is lower than that of patients with obese NALFD, it should not be overlooked. This study aimed to construct interpretable machine learning models for predicting lean NAFLD risk in type 2 diabetes mellitus (T2DM) patients.MethodsThis study enrolled 1,553 T2DM individuals who received health care at the First Affiliated Hospital of Ningbo University, Ningbo, China, from November 2019 to November 2024. Feature screening was performed using the Boruta algorithm and the Least Absolute Shrinkage and Selection Operator (LASSO). Linear discriminant analysis (LDA), logistic regression (LR), Naive Bayes (NB), random forest (RF), support vector machine (SVM), and extreme gradient boosting (XGboost) were used in constructing risk prediction models for lean NAFLD in T2DM patients. The area under the receiver operating characteristic curve (AUC) was used to assess the predictive capacity of the model. Additionally, we employed SHapley Additive exPlanations (SHAP) analysis to unveil the specific contributions of individual features in the machine learning model to the prediction results.ResultsThe prevalence of lean NAFLD in the study population was 20.3%. Eight variables, including age, body mass index (BMI), and alanine aminotransferase (ALT), were identified as independent risk factors for lean NAFLD. Ten predictive factors, including BMI, ALT, and aspartate aminotransferase (AST), were screened for the construction of risk prediction models. The random forest model demonstrated superior performance compared to alternative machine learning (ML) algorithms, achieving an AUC of 0.739 (95% confidence interval [CI]: 0.676–0.802) in the training set, and it also exhibited the best predictive value in the internal validation set with an AUC of 0.789 (95% CI: 0.722–0.856). In addition, the SHAP method identified TG, ALT, GGT, BMI, and UA as the top five variables influencing the predictions of the RF model.ConclusionThe construction of lean NAFLD risk models based on the Chinese T2DM population, particularly the RF model, facilitates its early prevention and intervention, thereby reducing the risks of intrahepatic and extrahepatic adverse outcomes. |
| format | Article |
| id | doaj-art-4ed66aa7fa3d417cb49f15bc4ac206d8 |
| institution | Kabale University |
| issn | 1664-2392 |
| language | English |
| publishDate | 2025-07-01 |
| publisher | Frontiers Media S.A. |
| record_format | Article |
| series | Frontiers in Endocrinology |
| spelling | doaj-art-4ed66aa7fa3d417cb49f15bc4ac206d82025-08-20T03:49:50ZengFrontiers Media S.A.Frontiers in Endocrinology1664-23922025-07-011610.3389/fendo.2025.16262031626203Predicting the risk of lean non-alcoholic fatty liver disease based on interpretable machine models in a Chinese T2DM populationShixue Bao0Qiankai Jin1Tieqiao Wang2Yushan Mao3Guoqing Huang4Department of Endocrinology, The First Affiliated Hospital of Ningbo University, Ningbo, Zhejiang, ChinaDepartment of Endocrinology, Beilun People's Hospital, Ningbo, Zhejiang, ChinaDepartment of Endocrinology, The First Affiliated Hospital of Ningbo University, Ningbo, Zhejiang, ChinaDepartment of Endocrinology, The First Affiliated Hospital of Ningbo University, Ningbo, Zhejiang, ChinaDepartment of Endocrinology, The First Affiliated Hospital of Ningbo University, Ningbo, Zhejiang, ChinaBackgroundNon-alcoholic fatty liver disease (NAFLD) is the most common chronic liver disease, seriously threatening the public health. Although the proportion of patients with lean NAFLD is lower than that of patients with obese NALFD, it should not be overlooked. This study aimed to construct interpretable machine learning models for predicting lean NAFLD risk in type 2 diabetes mellitus (T2DM) patients.MethodsThis study enrolled 1,553 T2DM individuals who received health care at the First Affiliated Hospital of Ningbo University, Ningbo, China, from November 2019 to November 2024. Feature screening was performed using the Boruta algorithm and the Least Absolute Shrinkage and Selection Operator (LASSO). Linear discriminant analysis (LDA), logistic regression (LR), Naive Bayes (NB), random forest (RF), support vector machine (SVM), and extreme gradient boosting (XGboost) were used in constructing risk prediction models for lean NAFLD in T2DM patients. The area under the receiver operating characteristic curve (AUC) was used to assess the predictive capacity of the model. Additionally, we employed SHapley Additive exPlanations (SHAP) analysis to unveil the specific contributions of individual features in the machine learning model to the prediction results.ResultsThe prevalence of lean NAFLD in the study population was 20.3%. Eight variables, including age, body mass index (BMI), and alanine aminotransferase (ALT), were identified as independent risk factors for lean NAFLD. Ten predictive factors, including BMI, ALT, and aspartate aminotransferase (AST), were screened for the construction of risk prediction models. The random forest model demonstrated superior performance compared to alternative machine learning (ML) algorithms, achieving an AUC of 0.739 (95% confidence interval [CI]: 0.676–0.802) in the training set, and it also exhibited the best predictive value in the internal validation set with an AUC of 0.789 (95% CI: 0.722–0.856). In addition, the SHAP method identified TG, ALT, GGT, BMI, and UA as the top five variables influencing the predictions of the RF model.ConclusionThe construction of lean NAFLD risk models based on the Chinese T2DM population, particularly the RF model, facilitates its early prevention and intervention, thereby reducing the risks of intrahepatic and extrahepatic adverse outcomes.https://www.frontiersin.org/articles/10.3389/fendo.2025.1626203/fulllean non-alcoholic fatty liver diseasetype 2 diabetes mellitusinterpretable machine learningprediction modelpredict risk |
| spellingShingle | Shixue Bao Qiankai Jin Tieqiao Wang Yushan Mao Guoqing Huang Predicting the risk of lean non-alcoholic fatty liver disease based on interpretable machine models in a Chinese T2DM population Frontiers in Endocrinology lean non-alcoholic fatty liver disease type 2 diabetes mellitus interpretable machine learning prediction model predict risk |
| title | Predicting the risk of lean non-alcoholic fatty liver disease based on interpretable machine models in a Chinese T2DM population |
| title_full | Predicting the risk of lean non-alcoholic fatty liver disease based on interpretable machine models in a Chinese T2DM population |
| title_fullStr | Predicting the risk of lean non-alcoholic fatty liver disease based on interpretable machine models in a Chinese T2DM population |
| title_full_unstemmed | Predicting the risk of lean non-alcoholic fatty liver disease based on interpretable machine models in a Chinese T2DM population |
| title_short | Predicting the risk of lean non-alcoholic fatty liver disease based on interpretable machine models in a Chinese T2DM population |
| title_sort | predicting the risk of lean non alcoholic fatty liver disease based on interpretable machine models in a chinese t2dm population |
| topic | lean non-alcoholic fatty liver disease type 2 diabetes mellitus interpretable machine learning prediction model predict risk |
| url | https://www.frontiersin.org/articles/10.3389/fendo.2025.1626203/full |
| work_keys_str_mv | AT shixuebao predictingtheriskofleannonalcoholicfattyliverdiseasebasedoninterpretablemachinemodelsinachineset2dmpopulation AT qiankaijin predictingtheriskofleannonalcoholicfattyliverdiseasebasedoninterpretablemachinemodelsinachineset2dmpopulation AT tieqiaowang predictingtheriskofleannonalcoholicfattyliverdiseasebasedoninterpretablemachinemodelsinachineset2dmpopulation AT yushanmao predictingtheriskofleannonalcoholicfattyliverdiseasebasedoninterpretablemachinemodelsinachineset2dmpopulation AT guoqinghuang predictingtheriskofleannonalcoholicfattyliverdiseasebasedoninterpretablemachinemodelsinachineset2dmpopulation |