Gene expression knowledge graph for patient representation and diabetes prediction
Abstract Diabetes is a worldwide health issue affecting millions of people. Machine learning methods have shown promising results in improving diabetes prediction, particularly through the analysis of gene expression data. While gene expression data can provide valuable insights, challenges arise fr...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
BMC
2025-03-01
|
| Series: | Journal of Biomedical Semantics |
| Subjects: | |
| Online Access: | https://doi.org/10.1186/s13326-025-00325-6 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849762034937233408 |
|---|---|
| author | Rita T. Sousa Heiko Paulheim |
| author_facet | Rita T. Sousa Heiko Paulheim |
| author_sort | Rita T. Sousa |
| collection | DOAJ |
| description | Abstract Diabetes is a worldwide health issue affecting millions of people. Machine learning methods have shown promising results in improving diabetes prediction, particularly through the analysis of gene expression data. While gene expression data can provide valuable insights, challenges arise from the fact that the number of patients in expression datasets is usually limited, and the data from different datasets with different gene expressions cannot be easily combined. This work proposes a novel approach to address these challenges by integrating multiple gene expression datasets and domain-specific knowledge using knowledge graphs, a unique tool for biomedical data integration, and to learn uniform patient representations for subjects contained in different incompatible datasets. Different strategies and KG embedding methods are explored to generate vector representations, serving as inputs for a classifier. Extensive experiments demonstrate the efficacy of our approach, revealing weighted F1-score improvements in diabetes prediction up to 13% when integrating multiple gene expression datasets and domain-specific knowledge about protein functions and interactions. |
| format | Article |
| id | doaj-art-94dc05af3fd74774b67971bee81f3a55 |
| institution | DOAJ |
| issn | 2041-1480 |
| language | English |
| publishDate | 2025-03-01 |
| publisher | BMC |
| record_format | Article |
| series | Journal of Biomedical Semantics |
| spelling | doaj-art-94dc05af3fd74774b67971bee81f3a552025-08-20T03:05:50ZengBMCJournal of Biomedical Semantics2041-14802025-03-0116111610.1186/s13326-025-00325-6Gene expression knowledge graph for patient representation and diabetes predictionRita T. Sousa0Heiko Paulheim1Data and Web Science Group, University of MannheimData and Web Science Group, University of MannheimAbstract Diabetes is a worldwide health issue affecting millions of people. Machine learning methods have shown promising results in improving diabetes prediction, particularly through the analysis of gene expression data. While gene expression data can provide valuable insights, challenges arise from the fact that the number of patients in expression datasets is usually limited, and the data from different datasets with different gene expressions cannot be easily combined. This work proposes a novel approach to address these challenges by integrating multiple gene expression datasets and domain-specific knowledge using knowledge graphs, a unique tool for biomedical data integration, and to learn uniform patient representations for subjects contained in different incompatible datasets. Different strategies and KG embedding methods are explored to generate vector representations, serving as inputs for a classifier. Extensive experiments demonstrate the efficacy of our approach, revealing weighted F1-score improvements in diabetes prediction up to 13% when integrating multiple gene expression datasets and domain-specific knowledge about protein functions and interactions.https://doi.org/10.1186/s13326-025-00325-6Diabetes predictionExpression dataKnowledge graphOntologyKnowledge graph embeddingRepresentation learning |
| spellingShingle | Rita T. Sousa Heiko Paulheim Gene expression knowledge graph for patient representation and diabetes prediction Journal of Biomedical Semantics Diabetes prediction Expression data Knowledge graph Ontology Knowledge graph embedding Representation learning |
| title | Gene expression knowledge graph for patient representation and diabetes prediction |
| title_full | Gene expression knowledge graph for patient representation and diabetes prediction |
| title_fullStr | Gene expression knowledge graph for patient representation and diabetes prediction |
| title_full_unstemmed | Gene expression knowledge graph for patient representation and diabetes prediction |
| title_short | Gene expression knowledge graph for patient representation and diabetes prediction |
| title_sort | gene expression knowledge graph for patient representation and diabetes prediction |
| topic | Diabetes prediction Expression data Knowledge graph Ontology Knowledge graph embedding Representation learning |
| url | https://doi.org/10.1186/s13326-025-00325-6 |
| work_keys_str_mv | AT ritatsousa geneexpressionknowledgegraphforpatientrepresentationanddiabetesprediction AT heikopaulheim geneexpressionknowledgegraphforpatientrepresentationanddiabetesprediction |