A novel machine learning methodology for the systematic extraction of chronic kidney disease comorbidities from abstracts

BackgroundChronic Kidney Disease (CKD) is a global health concern and is frequently underdiagnosed due to its subtle initial symptoms, contributing to increasing morbidity and mortality. A comprehensive understanding of CKD comorbidities could lead to the identification of risk-groups, more effectiv...

Full description

Saved in:
Bibliographic Details
Main Authors: Eszter Sághy, Mostafa Elsharkawy, Frank Moriarty, Sándor Kovács, István Wittmann, Antal Zemplényi
Format: Article
Language:English
Published: Frontiers Media S.A. 2025-02-01
Series:Frontiers in Digital Health
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fdgth.2025.1495879/full
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832542318483210240
author Eszter Sághy
Mostafa Elsharkawy
Frank Moriarty
Sándor Kovács
István Wittmann
Antal Zemplényi
Antal Zemplényi
author_facet Eszter Sághy
Mostafa Elsharkawy
Frank Moriarty
Sándor Kovács
István Wittmann
Antal Zemplényi
Antal Zemplényi
author_sort Eszter Sághy
collection DOAJ
description BackgroundChronic Kidney Disease (CKD) is a global health concern and is frequently underdiagnosed due to its subtle initial symptoms, contributing to increasing morbidity and mortality. A comprehensive understanding of CKD comorbidities could lead to the identification of risk-groups, more effective treatment and improved patient outcomes. Our research presents a two-fold objective: developing an effective machine learning (ML) workflow for text classification and entity relation extraction and assembling a broad list of diseases influencing CKD development and progression.MethodsWe analysed 39,680 abstracts with CKD in the title from the Embase library. Abstracts about a disease affecting CKD development and/or progression were selected by multiple ML classifiers trained on a human-labelled sample. The best classifier was further trained with active learning. Disease names in question were extracted from the selected abstracts using a novel entity relation extraction methodology. The resulting disease list and their corresponding abstracts were manually checked and a final disease list was created.FindingsThe SVM model gave the best results and was chosen for further training with active learning. This optimised ML workflow enabled us to discern 68 comorbidities across 15 ICD-10 disease groups contributing to CKD progression or development. The reading of the ML-selected abstracts showed that some diseases have direct causal effect on CKD, while others, like schizophrenia, has indirect causal effect on CKD.InterpretationThese findings have the potential to guide future CKD investigations, by facilitating the inclusion of a broader array of comorbidities in CKD prognostic models. Ultimately, our study enhances understanding of prognostic comorbidities and supports clinical practice by enabling improved patient monitoring, preventive strategies, and early detection for individuals at higher CKD development or progression risk.
format Article
id doaj-art-8e032fe720da47e0a4557d656eb5ab39
institution Kabale University
issn 2673-253X
language English
publishDate 2025-02-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Digital Health
spelling doaj-art-8e032fe720da47e0a4557d656eb5ab392025-02-04T06:32:07ZengFrontiers Media S.A.Frontiers in Digital Health2673-253X2025-02-01710.3389/fdgth.2025.14958791495879A novel machine learning methodology for the systematic extraction of chronic kidney disease comorbidities from abstractsEszter Sághy0Mostafa Elsharkawy1Frank Moriarty2Sándor Kovács3István Wittmann4Antal Zemplényi5Antal Zemplényi6Faculty of Pharmacy, University of Pécs, Pécs, HungaryFaculty of Sciences, University of Pécs, Pécs, HungarySchool of Pharmacy and Biomolecular Sciences, Royal College of Surgeons in Ireland, Dublin, IrelandFaculty of Pharmacy, University of Pécs, Pécs, HungaryMedical School, University of Pécs, Pécs, HungaryFaculty of Pharmacy, University of Pécs, Pécs, HungarySkaggs School of Pharmacy and Pharmaceutical Sciences, University of Colorado, Anschutz Medical Campus, Denver, CO, United StatesBackgroundChronic Kidney Disease (CKD) is a global health concern and is frequently underdiagnosed due to its subtle initial symptoms, contributing to increasing morbidity and mortality. A comprehensive understanding of CKD comorbidities could lead to the identification of risk-groups, more effective treatment and improved patient outcomes. Our research presents a two-fold objective: developing an effective machine learning (ML) workflow for text classification and entity relation extraction and assembling a broad list of diseases influencing CKD development and progression.MethodsWe analysed 39,680 abstracts with CKD in the title from the Embase library. Abstracts about a disease affecting CKD development and/or progression were selected by multiple ML classifiers trained on a human-labelled sample. The best classifier was further trained with active learning. Disease names in question were extracted from the selected abstracts using a novel entity relation extraction methodology. The resulting disease list and their corresponding abstracts were manually checked and a final disease list was created.FindingsThe SVM model gave the best results and was chosen for further training with active learning. This optimised ML workflow enabled us to discern 68 comorbidities across 15 ICD-10 disease groups contributing to CKD progression or development. The reading of the ML-selected abstracts showed that some diseases have direct causal effect on CKD, while others, like schizophrenia, has indirect causal effect on CKD.InterpretationThese findings have the potential to guide future CKD investigations, by facilitating the inclusion of a broader array of comorbidities in CKD prognostic models. Ultimately, our study enhances understanding of prognostic comorbidities and supports clinical practice by enabling improved patient monitoring, preventive strategies, and early detection for individuals at higher CKD development or progression risk.https://www.frontiersin.org/articles/10.3389/fdgth.2025.1495879/fullchronic kidney diseasecomorbiditiessystematic literature reviewmachine learningactive learningnamed entity recognition
spellingShingle Eszter Sághy
Mostafa Elsharkawy
Frank Moriarty
Sándor Kovács
István Wittmann
Antal Zemplényi
Antal Zemplényi
A novel machine learning methodology for the systematic extraction of chronic kidney disease comorbidities from abstracts
Frontiers in Digital Health
chronic kidney disease
comorbidities
systematic literature review
machine learning
active learning
named entity recognition
title A novel machine learning methodology for the systematic extraction of chronic kidney disease comorbidities from abstracts
title_full A novel machine learning methodology for the systematic extraction of chronic kidney disease comorbidities from abstracts
title_fullStr A novel machine learning methodology for the systematic extraction of chronic kidney disease comorbidities from abstracts
title_full_unstemmed A novel machine learning methodology for the systematic extraction of chronic kidney disease comorbidities from abstracts
title_short A novel machine learning methodology for the systematic extraction of chronic kidney disease comorbidities from abstracts
title_sort novel machine learning methodology for the systematic extraction of chronic kidney disease comorbidities from abstracts
topic chronic kidney disease
comorbidities
systematic literature review
machine learning
active learning
named entity recognition
url https://www.frontiersin.org/articles/10.3389/fdgth.2025.1495879/full
work_keys_str_mv AT esztersaghy anovelmachinelearningmethodologyforthesystematicextractionofchronickidneydiseasecomorbiditiesfromabstracts
AT mostafaelsharkawy anovelmachinelearningmethodologyforthesystematicextractionofchronickidneydiseasecomorbiditiesfromabstracts
AT frankmoriarty anovelmachinelearningmethodologyforthesystematicextractionofchronickidneydiseasecomorbiditiesfromabstracts
AT sandorkovacs anovelmachinelearningmethodologyforthesystematicextractionofchronickidneydiseasecomorbiditiesfromabstracts
AT istvanwittmann anovelmachinelearningmethodologyforthesystematicextractionofchronickidneydiseasecomorbiditiesfromabstracts
AT antalzemplenyi anovelmachinelearningmethodologyforthesystematicextractionofchronickidneydiseasecomorbiditiesfromabstracts
AT antalzemplenyi anovelmachinelearningmethodologyforthesystematicextractionofchronickidneydiseasecomorbiditiesfromabstracts
AT esztersaghy novelmachinelearningmethodologyforthesystematicextractionofchronickidneydiseasecomorbiditiesfromabstracts
AT mostafaelsharkawy novelmachinelearningmethodologyforthesystematicextractionofchronickidneydiseasecomorbiditiesfromabstracts
AT frankmoriarty novelmachinelearningmethodologyforthesystematicextractionofchronickidneydiseasecomorbiditiesfromabstracts
AT sandorkovacs novelmachinelearningmethodologyforthesystematicextractionofchronickidneydiseasecomorbiditiesfromabstracts
AT istvanwittmann novelmachinelearningmethodologyforthesystematicextractionofchronickidneydiseasecomorbiditiesfromabstracts
AT antalzemplenyi novelmachinelearningmethodologyforthesystematicextractionofchronickidneydiseasecomorbiditiesfromabstracts
AT antalzemplenyi novelmachinelearningmethodologyforthesystematicextractionofchronickidneydiseasecomorbiditiesfromabstracts