Identification of progression-related genes and construction of prognostic model for chronic kidney disease by machine learning

BackgroundEarly diagnosis and intervention for chronic kidney disease (CKD) can significantly improve patient’s quality of life and prognosis. Besides routine laboratory indicators and medical history, risk prediction models can predict CKD outcome. However, there is currently a lack of CKD prognost...

Full description

Saved in:
Bibliographic Details
Main Authors: Bingkun Zhou, Hu Zhou, Xiaodong Huang, Shijie Liu
Format: Article
Language:English
Published: Frontiers Media S.A. 2025-08-01
Series:Frontiers in Cell and Developmental Biology
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fcell.2025.1627355/full
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849238720766541824
author Bingkun Zhou
Hu Zhou
Xiaodong Huang
Shijie Liu
author_facet Bingkun Zhou
Hu Zhou
Xiaodong Huang
Shijie Liu
author_sort Bingkun Zhou
collection DOAJ
description BackgroundEarly diagnosis and intervention for chronic kidney disease (CKD) can significantly improve patient’s quality of life and prognosis. Besides routine laboratory indicators and medical history, risk prediction models can predict CKD outcome. However, there is currently a lack of CKD prognostic prediction models based on transcriptomics and machine learning.MethodsUtilizing weighted correlation network analysis (WGCNA) and random forest algorithms in GSE137570, three core gene sets of different sizes were constructed, which were externally validated in GSE66494 and GSE180394, and evaluated for their predictive performance in GSE45980 by receiver operating characteristic (ROC) curves. Predictive models were built using Cox regression, LASSO regression, and logistic regression in GSE60861. And the reliability of human CKD transcriptomic analysis and the feasibility of functional studies were validated in a mouse UUO model.ResultsCombining WGCNA and differential gene analysis, 9 genes positively associated with CKD occurrence and development and 20 genes negatively associated with that were identified. By random forest algorithm, three gene sets were constructed: minimal gene set (CCL2, SUCLG1, ACADM), medium gene set (CCL2, GGT6, PCK2, SFXN2, SLC34A3, ALPL, GLTPD2, ACADM, SUCLG1), and maximal gene set (CCL2, MMP7, GGT6, PCK2, SFXN2, SLC34A3, ALPL, GLTPD2, ACADM, SUCLG1). In external validation, the maximal plage score had best classification performance for CKD (AUC:0.767) in GSE66494 and in GSE180394 (AUC:0.760), the medium plage score achieved a predictive performance for CKD progression (AUC = 0.758) in GSE45980. In the multivariate model, Cox regression analysis constructed a risk model with only minimal z-score, further LASSO regression analysis included gender and minimal z-score, but logistic regression multivariate analysis failed to be constructed with any score. A high degree of similarity between mouse CKD and human CKD in KEGG enrichment was observed in the mouse unilateral ureteral obstruction model, and the core genes related to the occurrence and progression of human CKD remained diagnostically valuable in mice.ConclusionThis study provides a transcriptomics-based risk prediction model for the occurrence and development of CKD based on machine learning, offering potential target genes for the further experimental research of CKD.
format Article
id doaj-art-be52ef3e2337497dbf3283f3168d8fed
institution Kabale University
issn 2296-634X
language English
publishDate 2025-08-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Cell and Developmental Biology
spelling doaj-art-be52ef3e2337497dbf3283f3168d8fed2025-08-20T04:01:25ZengFrontiers Media S.A.Frontiers in Cell and Developmental Biology2296-634X2025-08-011310.3389/fcell.2025.16273551627355Identification of progression-related genes and construction of prognostic model for chronic kidney disease by machine learningBingkun Zhou0Hu Zhou1Xiaodong Huang2Shijie Liu3Department of Kidney Transplantation, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, Guangdong, ChinaDepartment of Medicine, Nephrology Division, The Third Affiliated Hospital, Sun Yat-sen University, Guangzhou, ChinaDepartment of Urology, Sun Yat-sen Memorial Hospital, Sun Yat-sen University, Guangzhou, ChinaDepartment of Kidney Transplantation, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, Guangdong, ChinaBackgroundEarly diagnosis and intervention for chronic kidney disease (CKD) can significantly improve patient’s quality of life and prognosis. Besides routine laboratory indicators and medical history, risk prediction models can predict CKD outcome. However, there is currently a lack of CKD prognostic prediction models based on transcriptomics and machine learning.MethodsUtilizing weighted correlation network analysis (WGCNA) and random forest algorithms in GSE137570, three core gene sets of different sizes were constructed, which were externally validated in GSE66494 and GSE180394, and evaluated for their predictive performance in GSE45980 by receiver operating characteristic (ROC) curves. Predictive models were built using Cox regression, LASSO regression, and logistic regression in GSE60861. And the reliability of human CKD transcriptomic analysis and the feasibility of functional studies were validated in a mouse UUO model.ResultsCombining WGCNA and differential gene analysis, 9 genes positively associated with CKD occurrence and development and 20 genes negatively associated with that were identified. By random forest algorithm, three gene sets were constructed: minimal gene set (CCL2, SUCLG1, ACADM), medium gene set (CCL2, GGT6, PCK2, SFXN2, SLC34A3, ALPL, GLTPD2, ACADM, SUCLG1), and maximal gene set (CCL2, MMP7, GGT6, PCK2, SFXN2, SLC34A3, ALPL, GLTPD2, ACADM, SUCLG1). In external validation, the maximal plage score had best classification performance for CKD (AUC:0.767) in GSE66494 and in GSE180394 (AUC:0.760), the medium plage score achieved a predictive performance for CKD progression (AUC = 0.758) in GSE45980. In the multivariate model, Cox regression analysis constructed a risk model with only minimal z-score, further LASSO regression analysis included gender and minimal z-score, but logistic regression multivariate analysis failed to be constructed with any score. A high degree of similarity between mouse CKD and human CKD in KEGG enrichment was observed in the mouse unilateral ureteral obstruction model, and the core genes related to the occurrence and progression of human CKD remained diagnostically valuable in mice.ConclusionThis study provides a transcriptomics-based risk prediction model for the occurrence and development of CKD based on machine learning, offering potential target genes for the further experimental research of CKD.https://www.frontiersin.org/articles/10.3389/fcell.2025.1627355/fullchronic kidney diseasebioinformaticstranscriptomicsmachine learningpredictive model
spellingShingle Bingkun Zhou
Hu Zhou
Xiaodong Huang
Shijie Liu
Identification of progression-related genes and construction of prognostic model for chronic kidney disease by machine learning
Frontiers in Cell and Developmental Biology
chronic kidney disease
bioinformatics
transcriptomics
machine learning
predictive model
title Identification of progression-related genes and construction of prognostic model for chronic kidney disease by machine learning
title_full Identification of progression-related genes and construction of prognostic model for chronic kidney disease by machine learning
title_fullStr Identification of progression-related genes and construction of prognostic model for chronic kidney disease by machine learning
title_full_unstemmed Identification of progression-related genes and construction of prognostic model for chronic kidney disease by machine learning
title_short Identification of progression-related genes and construction of prognostic model for chronic kidney disease by machine learning
title_sort identification of progression related genes and construction of prognostic model for chronic kidney disease by machine learning
topic chronic kidney disease
bioinformatics
transcriptomics
machine learning
predictive model
url https://www.frontiersin.org/articles/10.3389/fcell.2025.1627355/full
work_keys_str_mv AT bingkunzhou identificationofprogressionrelatedgenesandconstructionofprognosticmodelforchronickidneydiseasebymachinelearning
AT huzhou identificationofprogressionrelatedgenesandconstructionofprognosticmodelforchronickidneydiseasebymachinelearning
AT xiaodonghuang identificationofprogressionrelatedgenesandconstructionofprognosticmodelforchronickidneydiseasebymachinelearning
AT shijieliu identificationofprogressionrelatedgenesandconstructionofprognosticmodelforchronickidneydiseasebymachinelearning