Identification of progression-related genes and construction of prognostic model for chronic kidney disease by machine learning
BackgroundEarly diagnosis and intervention for chronic kidney disease (CKD) can significantly improve patient’s quality of life and prognosis. Besides routine laboratory indicators and medical history, risk prediction models can predict CKD outcome. However, there is currently a lack of CKD prognost...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Frontiers Media S.A.
2025-08-01
|
| Series: | Frontiers in Cell and Developmental Biology |
| Subjects: | |
| Online Access: | https://www.frontiersin.org/articles/10.3389/fcell.2025.1627355/full |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849238720766541824 |
|---|---|
| author | Bingkun Zhou Hu Zhou Xiaodong Huang Shijie Liu |
| author_facet | Bingkun Zhou Hu Zhou Xiaodong Huang Shijie Liu |
| author_sort | Bingkun Zhou |
| collection | DOAJ |
| description | BackgroundEarly diagnosis and intervention for chronic kidney disease (CKD) can significantly improve patient’s quality of life and prognosis. Besides routine laboratory indicators and medical history, risk prediction models can predict CKD outcome. However, there is currently a lack of CKD prognostic prediction models based on transcriptomics and machine learning.MethodsUtilizing weighted correlation network analysis (WGCNA) and random forest algorithms in GSE137570, three core gene sets of different sizes were constructed, which were externally validated in GSE66494 and GSE180394, and evaluated for their predictive performance in GSE45980 by receiver operating characteristic (ROC) curves. Predictive models were built using Cox regression, LASSO regression, and logistic regression in GSE60861. And the reliability of human CKD transcriptomic analysis and the feasibility of functional studies were validated in a mouse UUO model.ResultsCombining WGCNA and differential gene analysis, 9 genes positively associated with CKD occurrence and development and 20 genes negatively associated with that were identified. By random forest algorithm, three gene sets were constructed: minimal gene set (CCL2, SUCLG1, ACADM), medium gene set (CCL2, GGT6, PCK2, SFXN2, SLC34A3, ALPL, GLTPD2, ACADM, SUCLG1), and maximal gene set (CCL2, MMP7, GGT6, PCK2, SFXN2, SLC34A3, ALPL, GLTPD2, ACADM, SUCLG1). In external validation, the maximal plage score had best classification performance for CKD (AUC:0.767) in GSE66494 and in GSE180394 (AUC:0.760), the medium plage score achieved a predictive performance for CKD progression (AUC = 0.758) in GSE45980. In the multivariate model, Cox regression analysis constructed a risk model with only minimal z-score, further LASSO regression analysis included gender and minimal z-score, but logistic regression multivariate analysis failed to be constructed with any score. A high degree of similarity between mouse CKD and human CKD in KEGG enrichment was observed in the mouse unilateral ureteral obstruction model, and the core genes related to the occurrence and progression of human CKD remained diagnostically valuable in mice.ConclusionThis study provides a transcriptomics-based risk prediction model for the occurrence and development of CKD based on machine learning, offering potential target genes for the further experimental research of CKD. |
| format | Article |
| id | doaj-art-be52ef3e2337497dbf3283f3168d8fed |
| institution | Kabale University |
| issn | 2296-634X |
| language | English |
| publishDate | 2025-08-01 |
| publisher | Frontiers Media S.A. |
| record_format | Article |
| series | Frontiers in Cell and Developmental Biology |
| spelling | doaj-art-be52ef3e2337497dbf3283f3168d8fed2025-08-20T04:01:25ZengFrontiers Media S.A.Frontiers in Cell and Developmental Biology2296-634X2025-08-011310.3389/fcell.2025.16273551627355Identification of progression-related genes and construction of prognostic model for chronic kidney disease by machine learningBingkun Zhou0Hu Zhou1Xiaodong Huang2Shijie Liu3Department of Kidney Transplantation, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, Guangdong, ChinaDepartment of Medicine, Nephrology Division, The Third Affiliated Hospital, Sun Yat-sen University, Guangzhou, ChinaDepartment of Urology, Sun Yat-sen Memorial Hospital, Sun Yat-sen University, Guangzhou, ChinaDepartment of Kidney Transplantation, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, Guangdong, ChinaBackgroundEarly diagnosis and intervention for chronic kidney disease (CKD) can significantly improve patient’s quality of life and prognosis. Besides routine laboratory indicators and medical history, risk prediction models can predict CKD outcome. However, there is currently a lack of CKD prognostic prediction models based on transcriptomics and machine learning.MethodsUtilizing weighted correlation network analysis (WGCNA) and random forest algorithms in GSE137570, three core gene sets of different sizes were constructed, which were externally validated in GSE66494 and GSE180394, and evaluated for their predictive performance in GSE45980 by receiver operating characteristic (ROC) curves. Predictive models were built using Cox regression, LASSO regression, and logistic regression in GSE60861. And the reliability of human CKD transcriptomic analysis and the feasibility of functional studies were validated in a mouse UUO model.ResultsCombining WGCNA and differential gene analysis, 9 genes positively associated with CKD occurrence and development and 20 genes negatively associated with that were identified. By random forest algorithm, three gene sets were constructed: minimal gene set (CCL2, SUCLG1, ACADM), medium gene set (CCL2, GGT6, PCK2, SFXN2, SLC34A3, ALPL, GLTPD2, ACADM, SUCLG1), and maximal gene set (CCL2, MMP7, GGT6, PCK2, SFXN2, SLC34A3, ALPL, GLTPD2, ACADM, SUCLG1). In external validation, the maximal plage score had best classification performance for CKD (AUC:0.767) in GSE66494 and in GSE180394 (AUC:0.760), the medium plage score achieved a predictive performance for CKD progression (AUC = 0.758) in GSE45980. In the multivariate model, Cox regression analysis constructed a risk model with only minimal z-score, further LASSO regression analysis included gender and minimal z-score, but logistic regression multivariate analysis failed to be constructed with any score. A high degree of similarity between mouse CKD and human CKD in KEGG enrichment was observed in the mouse unilateral ureteral obstruction model, and the core genes related to the occurrence and progression of human CKD remained diagnostically valuable in mice.ConclusionThis study provides a transcriptomics-based risk prediction model for the occurrence and development of CKD based on machine learning, offering potential target genes for the further experimental research of CKD.https://www.frontiersin.org/articles/10.3389/fcell.2025.1627355/fullchronic kidney diseasebioinformaticstranscriptomicsmachine learningpredictive model |
| spellingShingle | Bingkun Zhou Hu Zhou Xiaodong Huang Shijie Liu Identification of progression-related genes and construction of prognostic model for chronic kidney disease by machine learning Frontiers in Cell and Developmental Biology chronic kidney disease bioinformatics transcriptomics machine learning predictive model |
| title | Identification of progression-related genes and construction of prognostic model for chronic kidney disease by machine learning |
| title_full | Identification of progression-related genes and construction of prognostic model for chronic kidney disease by machine learning |
| title_fullStr | Identification of progression-related genes and construction of prognostic model for chronic kidney disease by machine learning |
| title_full_unstemmed | Identification of progression-related genes and construction of prognostic model for chronic kidney disease by machine learning |
| title_short | Identification of progression-related genes and construction of prognostic model for chronic kidney disease by machine learning |
| title_sort | identification of progression related genes and construction of prognostic model for chronic kidney disease by machine learning |
| topic | chronic kidney disease bioinformatics transcriptomics machine learning predictive model |
| url | https://www.frontiersin.org/articles/10.3389/fcell.2025.1627355/full |
| work_keys_str_mv | AT bingkunzhou identificationofprogressionrelatedgenesandconstructionofprognosticmodelforchronickidneydiseasebymachinelearning AT huzhou identificationofprogressionrelatedgenesandconstructionofprognosticmodelforchronickidneydiseasebymachinelearning AT xiaodonghuang identificationofprogressionrelatedgenesandconstructionofprognosticmodelforchronickidneydiseasebymachinelearning AT shijieliu identificationofprogressionrelatedgenesandconstructionofprognosticmodelforchronickidneydiseasebymachinelearning |