Development of risk models for early detection and prediction of chronic kidney disease in clinical settings

Abstract Chronic kidney disease (CKD) imposes a high burden with high mortality and morbidity rates. Early detection of CKD is imperative in preventing the adverse outcomes attributed to the later stages. Therefore, this study aims to utilize machine learning techniques to predict CKD at early stage...

Full description

Saved in:
Bibliographic Details
Main Authors: Pegah Bahrami, Davoud Tanbakuchi, Monavar Afzalaghaee, Majid Ghayour-Mobarhan, Habibollah Esmaily
Format: Article
Language:English
Published: Nature Portfolio 2024-12-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-024-83973-5
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841559530714955776
author Pegah Bahrami
Davoud Tanbakuchi
Monavar Afzalaghaee
Majid Ghayour-Mobarhan
Habibollah Esmaily
author_facet Pegah Bahrami
Davoud Tanbakuchi
Monavar Afzalaghaee
Majid Ghayour-Mobarhan
Habibollah Esmaily
author_sort Pegah Bahrami
collection DOAJ
description Abstract Chronic kidney disease (CKD) imposes a high burden with high mortality and morbidity rates. Early detection of CKD is imperative in preventing the adverse outcomes attributed to the later stages. Therefore, this study aims to utilize machine learning techniques to predict CKD at early stages. This study uses data obtained from a large longitudinal cohort study. The features include patients’ sociodemographic, anthropometric, and laboratory tests that are mostly associated with CKD based on national and international studies. Missing data and outliers were deleted using listwise and interquartile range techniques, respectively. Data initially remained imbalanced to investigate the ability of models to work on imbalanced datasets. Stratified K-folds cross-validation, a robust approach that performs well on imbalanced data, was further performed to enhance the splitting. Interestingly, an interaction was found between age and gender where contrasting data was generated, therefore, to avoid this interaction gender-specific algorithms were developed. Four main algorithms and four algorithms using the stratified K-folds cross-validation technique, consisting of gender-specific Random Forest and feedforward Neural Networks were developed using the preprocessed data of 6855 participants. The RF model in women exhibited the highest AUC of 0.90 followed closely by 0.89 in their NN model. Both models constructed for men yielded an AUC of 0.88. Sensitivity scores were higher in men compared to women. Models demonstrated subpar results regarding specificity, however, the high precision and F1 scores, make the models extremely valuable in a clinical setting to accurately identify CKD cases while minimizing false positive diagnoses. Moreover, the results from stratified K-fold cross-validation indicated that the NN models were more sensitive to the imbalanced dataset and demonstrated a marked increase in performance, particularly specificity, after this approach. These data offer valuable insights for the development of future risk stratification models for CKD.
format Article
id doaj-art-1764a68cff85497a904806d86417dfa6
institution Kabale University
issn 2045-2322
language English
publishDate 2024-12-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-1764a68cff85497a904806d86417dfa62025-01-05T12:25:04ZengNature PortfolioScientific Reports2045-23222024-12-0114111010.1038/s41598-024-83973-5Development of risk models for early detection and prediction of chronic kidney disease in clinical settingsPegah Bahrami0Davoud Tanbakuchi1Monavar Afzalaghaee2Majid Ghayour-Mobarhan3Habibollah Esmaily4School of Medicine, Mashhad University of Medical SciencesSchool of Medicine, Mashhad University of Medical SciencesDepartment of Statistics and Epidemiology, Faculty of Health, Mashhad University of Medical SciencesInternational UNESCO center for Health-Related Basic Sciences and Human Nutrition, Mashhad University of Medical SciencesDepartment of Biostatistics, School of Health, Mashhad University of Medical SciencesAbstract Chronic kidney disease (CKD) imposes a high burden with high mortality and morbidity rates. Early detection of CKD is imperative in preventing the adverse outcomes attributed to the later stages. Therefore, this study aims to utilize machine learning techniques to predict CKD at early stages. This study uses data obtained from a large longitudinal cohort study. The features include patients’ sociodemographic, anthropometric, and laboratory tests that are mostly associated with CKD based on national and international studies. Missing data and outliers were deleted using listwise and interquartile range techniques, respectively. Data initially remained imbalanced to investigate the ability of models to work on imbalanced datasets. Stratified K-folds cross-validation, a robust approach that performs well on imbalanced data, was further performed to enhance the splitting. Interestingly, an interaction was found between age and gender where contrasting data was generated, therefore, to avoid this interaction gender-specific algorithms were developed. Four main algorithms and four algorithms using the stratified K-folds cross-validation technique, consisting of gender-specific Random Forest and feedforward Neural Networks were developed using the preprocessed data of 6855 participants. The RF model in women exhibited the highest AUC of 0.90 followed closely by 0.89 in their NN model. Both models constructed for men yielded an AUC of 0.88. Sensitivity scores were higher in men compared to women. Models demonstrated subpar results regarding specificity, however, the high precision and F1 scores, make the models extremely valuable in a clinical setting to accurately identify CKD cases while minimizing false positive diagnoses. Moreover, the results from stratified K-fold cross-validation indicated that the NN models were more sensitive to the imbalanced dataset and demonstrated a marked increase in performance, particularly specificity, after this approach. These data offer valuable insights for the development of future risk stratification models for CKD.https://doi.org/10.1038/s41598-024-83973-5Chronic kidney diseaseRisk factorsEarly diagnosis
spellingShingle Pegah Bahrami
Davoud Tanbakuchi
Monavar Afzalaghaee
Majid Ghayour-Mobarhan
Habibollah Esmaily
Development of risk models for early detection and prediction of chronic kidney disease in clinical settings
Scientific Reports
Chronic kidney disease
Risk factors
Early diagnosis
title Development of risk models for early detection and prediction of chronic kidney disease in clinical settings
title_full Development of risk models for early detection and prediction of chronic kidney disease in clinical settings
title_fullStr Development of risk models for early detection and prediction of chronic kidney disease in clinical settings
title_full_unstemmed Development of risk models for early detection and prediction of chronic kidney disease in clinical settings
title_short Development of risk models for early detection and prediction of chronic kidney disease in clinical settings
title_sort development of risk models for early detection and prediction of chronic kidney disease in clinical settings
topic Chronic kidney disease
Risk factors
Early diagnosis
url https://doi.org/10.1038/s41598-024-83973-5
work_keys_str_mv AT pegahbahrami developmentofriskmodelsforearlydetectionandpredictionofchronickidneydiseaseinclinicalsettings
AT davoudtanbakuchi developmentofriskmodelsforearlydetectionandpredictionofchronickidneydiseaseinclinicalsettings
AT monavarafzalaghaee developmentofriskmodelsforearlydetectionandpredictionofchronickidneydiseaseinclinicalsettings
AT majidghayourmobarhan developmentofriskmodelsforearlydetectionandpredictionofchronickidneydiseaseinclinicalsettings
AT habibollahesmaily developmentofriskmodelsforearlydetectionandpredictionofchronickidneydiseaseinclinicalsettings