Gene expression knowledge graph for patient representation and diabetes prediction

Abstract Diabetes is a worldwide health issue affecting millions of people. Machine learning methods have shown promising results in improving diabetes prediction, particularly through the analysis of gene expression data. While gene expression data can provide valuable insights, challenges arise fr...

Full description

Saved in:
Bibliographic Details
Main Authors: Rita T. Sousa, Heiko Paulheim
Format: Article
Language:English
Published: BMC 2025-03-01
Series:Journal of Biomedical Semantics
Subjects:
Online Access:https://doi.org/10.1186/s13326-025-00325-6
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849762034937233408
author Rita T. Sousa
Heiko Paulheim
author_facet Rita T. Sousa
Heiko Paulheim
author_sort Rita T. Sousa
collection DOAJ
description Abstract Diabetes is a worldwide health issue affecting millions of people. Machine learning methods have shown promising results in improving diabetes prediction, particularly through the analysis of gene expression data. While gene expression data can provide valuable insights, challenges arise from the fact that the number of patients in expression datasets is usually limited, and the data from different datasets with different gene expressions cannot be easily combined. This work proposes a novel approach to address these challenges by integrating multiple gene expression datasets and domain-specific knowledge using knowledge graphs, a unique tool for biomedical data integration, and to learn uniform patient representations for subjects contained in different incompatible datasets. Different strategies and KG embedding methods are explored to generate vector representations, serving as inputs for a classifier. Extensive experiments demonstrate the efficacy of our approach, revealing weighted F1-score improvements in diabetes prediction up to 13% when integrating multiple gene expression datasets and domain-specific knowledge about protein functions and interactions.
format Article
id doaj-art-94dc05af3fd74774b67971bee81f3a55
institution DOAJ
issn 2041-1480
language English
publishDate 2025-03-01
publisher BMC
record_format Article
series Journal of Biomedical Semantics
spelling doaj-art-94dc05af3fd74774b67971bee81f3a552025-08-20T03:05:50ZengBMCJournal of Biomedical Semantics2041-14802025-03-0116111610.1186/s13326-025-00325-6Gene expression knowledge graph for patient representation and diabetes predictionRita T. Sousa0Heiko Paulheim1Data and Web Science Group, University of MannheimData and Web Science Group, University of MannheimAbstract Diabetes is a worldwide health issue affecting millions of people. Machine learning methods have shown promising results in improving diabetes prediction, particularly through the analysis of gene expression data. While gene expression data can provide valuable insights, challenges arise from the fact that the number of patients in expression datasets is usually limited, and the data from different datasets with different gene expressions cannot be easily combined. This work proposes a novel approach to address these challenges by integrating multiple gene expression datasets and domain-specific knowledge using knowledge graphs, a unique tool for biomedical data integration, and to learn uniform patient representations for subjects contained in different incompatible datasets. Different strategies and KG embedding methods are explored to generate vector representations, serving as inputs for a classifier. Extensive experiments demonstrate the efficacy of our approach, revealing weighted F1-score improvements in diabetes prediction up to 13% when integrating multiple gene expression datasets and domain-specific knowledge about protein functions and interactions.https://doi.org/10.1186/s13326-025-00325-6Diabetes predictionExpression dataKnowledge graphOntologyKnowledge graph embeddingRepresentation learning
spellingShingle Rita T. Sousa
Heiko Paulheim
Gene expression knowledge graph for patient representation and diabetes prediction
Journal of Biomedical Semantics
Diabetes prediction
Expression data
Knowledge graph
Ontology
Knowledge graph embedding
Representation learning
title Gene expression knowledge graph for patient representation and diabetes prediction
title_full Gene expression knowledge graph for patient representation and diabetes prediction
title_fullStr Gene expression knowledge graph for patient representation and diabetes prediction
title_full_unstemmed Gene expression knowledge graph for patient representation and diabetes prediction
title_short Gene expression knowledge graph for patient representation and diabetes prediction
title_sort gene expression knowledge graph for patient representation and diabetes prediction
topic Diabetes prediction
Expression data
Knowledge graph
Ontology
Knowledge graph embedding
Representation learning
url https://doi.org/10.1186/s13326-025-00325-6
work_keys_str_mv AT ritatsousa geneexpressionknowledgegraphforpatientrepresentationanddiabetesprediction
AT heikopaulheim geneexpressionknowledgegraphforpatientrepresentationanddiabetesprediction