Federated SPARQL query performance evaluation for exploring disease model mouse: combining gene expression, orthology, and disease knowledge graphs

Abstract Background The RIKEN BRC develops and maintains the RIKEN BioResource MetaDatabase to help users explore appropriate target bioresources for their experiments and prepare precise and high-quality data infrastructures. The Swiss Institute of Bioinformatics develops two databases across multi...

Full description

Saved in:
Bibliographic Details
Main Authors: Tatsuya Kushida, Tarcisio Mendes de Farias, Ana C. Sima, Christophe Dessimoz, Hirokazu Chiba, Frederic B. Bastian, Hiroshi Masuya
Format: Article
Language:English
Published: BMC 2025-05-01
Series:BMC Medical Informatics and Decision Making
Subjects:
Online Access:https://doi.org/10.1186/s12911-025-03013-8
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850273452906250240
author Tatsuya Kushida
Tarcisio Mendes de Farias
Ana C. Sima
Christophe Dessimoz
Hirokazu Chiba
Frederic B. Bastian
Hiroshi Masuya
author_facet Tatsuya Kushida
Tarcisio Mendes de Farias
Ana C. Sima
Christophe Dessimoz
Hirokazu Chiba
Frederic B. Bastian
Hiroshi Masuya
author_sort Tatsuya Kushida
collection DOAJ
description Abstract Background The RIKEN BRC develops and maintains the RIKEN BioResource MetaDatabase to help users explore appropriate target bioresources for their experiments and prepare precise and high-quality data infrastructures. The Swiss Institute of Bioinformatics develops two databases across multi-species for the study of gene expression and orthology: Bgee and Orthologous MAtrix (OMA, an orthology database). Methods This study combines the RIKEN BioResource data with Resource Description Framework (RDF) datasets from Bgee, a gene expression database, the OMA, the DisGeNET, a human gene-disease association, Mouse Genome Informatics (MGI), UniProt, and four disease ontologies in the RIKEN BioResource MetaDatabase. Our aim is to evaluate the distributed SPARQL query performance when exploring which model organisms are most appropriate for specific medical science research applications across the aforementioned interoperable datasets. More precisely in our biomedical use cases, we investigate disease-related genes, as well as anatomical parts where these genes are expressed and subsequently identify appropriate bioresource candidates available for specific disease research applications. Results We illustrate the above through two use cases targeting either Alzheimer’s disease or melanoma. We identified 14 Alzheimer’s disease-related genes that were expressed in the prefrontal cortex (e.g., APP and APOE) and 55 RIKEN bioresources, which were genetically modified mice related to these genes, predicted to be relevant to Alzheimer’s disease research. Furthermore, executing a transitive search for the Uberon terms by using the Property Paths function, we identified 14 melanoma-related genes (e.g., HRAS and PTEN), and 12 anatomical parts in which these genes were expressed, such as the “skin of limb” as an example. Finally, we compared the performance of the federated SPARQL query via the remote Bgee SPARQL endpoint with the performance of a centralized SPARQL query using the Bgee dataset as part of the RIKEN BioResource MetaDatabase. Conclusions As a result, we confirmed that the performance of the federated approach degraded. We concluded that we reduced the degradation of the query performance of the federated approach from the BioResource MetaDatabase to the SIB by refining the transferred data through a subquery and enhancing the server specifications thereby optimizing the triple store query evaluation.
format Article
id doaj-art-cfee2c8ffa1e4b5db6df712db840b986
institution OA Journals
issn 1472-6947
language English
publishDate 2025-05-01
publisher BMC
record_format Article
series BMC Medical Informatics and Decision Making
spelling doaj-art-cfee2c8ffa1e4b5db6df712db840b9862025-08-20T01:51:28ZengBMCBMC Medical Informatics and Decision Making1472-69472025-05-0125S11110.1186/s12911-025-03013-8Federated SPARQL query performance evaluation for exploring disease model mouse: combining gene expression, orthology, and disease knowledge graphsTatsuya Kushida0Tarcisio Mendes de Farias1Ana C. Sima2Christophe Dessimoz3Hirokazu Chiba4Frederic B. Bastian5Hiroshi Masuya6BioResource Research Center, RIKENSIB Swiss Institute of BioinformaticsSIB Swiss Institute of BioinformaticsSIB Swiss Institute of BioinformaticsDatabase Center for Life Science, DS, ROISSIB Swiss Institute of BioinformaticsBioResource Research Center, RIKENAbstract Background The RIKEN BRC develops and maintains the RIKEN BioResource MetaDatabase to help users explore appropriate target bioresources for their experiments and prepare precise and high-quality data infrastructures. The Swiss Institute of Bioinformatics develops two databases across multi-species for the study of gene expression and orthology: Bgee and Orthologous MAtrix (OMA, an orthology database). Methods This study combines the RIKEN BioResource data with Resource Description Framework (RDF) datasets from Bgee, a gene expression database, the OMA, the DisGeNET, a human gene-disease association, Mouse Genome Informatics (MGI), UniProt, and four disease ontologies in the RIKEN BioResource MetaDatabase. Our aim is to evaluate the distributed SPARQL query performance when exploring which model organisms are most appropriate for specific medical science research applications across the aforementioned interoperable datasets. More precisely in our biomedical use cases, we investigate disease-related genes, as well as anatomical parts where these genes are expressed and subsequently identify appropriate bioresource candidates available for specific disease research applications. Results We illustrate the above through two use cases targeting either Alzheimer’s disease or melanoma. We identified 14 Alzheimer’s disease-related genes that were expressed in the prefrontal cortex (e.g., APP and APOE) and 55 RIKEN bioresources, which were genetically modified mice related to these genes, predicted to be relevant to Alzheimer’s disease research. Furthermore, executing a transitive search for the Uberon terms by using the Property Paths function, we identified 14 melanoma-related genes (e.g., HRAS and PTEN), and 12 anatomical parts in which these genes were expressed, such as the “skin of limb” as an example. Finally, we compared the performance of the federated SPARQL query via the remote Bgee SPARQL endpoint with the performance of a centralized SPARQL query using the Bgee dataset as part of the RIKEN BioResource MetaDatabase. Conclusions As a result, we confirmed that the performance of the federated approach degraded. We concluded that we reduced the degradation of the query performance of the federated approach from the BioResource MetaDatabase to the SIB by refining the transferred data through a subquery and enhancing the server specifications thereby optimizing the triple store query evaluation.https://doi.org/10.1186/s12911-025-03013-8Database integrationGene-disease associationGene expressionKnowledge graphModel organismOntology
spellingShingle Tatsuya Kushida
Tarcisio Mendes de Farias
Ana C. Sima
Christophe Dessimoz
Hirokazu Chiba
Frederic B. Bastian
Hiroshi Masuya
Federated SPARQL query performance evaluation for exploring disease model mouse: combining gene expression, orthology, and disease knowledge graphs
BMC Medical Informatics and Decision Making
Database integration
Gene-disease association
Gene expression
Knowledge graph
Model organism
Ontology
title Federated SPARQL query performance evaluation for exploring disease model mouse: combining gene expression, orthology, and disease knowledge graphs
title_full Federated SPARQL query performance evaluation for exploring disease model mouse: combining gene expression, orthology, and disease knowledge graphs
title_fullStr Federated SPARQL query performance evaluation for exploring disease model mouse: combining gene expression, orthology, and disease knowledge graphs
title_full_unstemmed Federated SPARQL query performance evaluation for exploring disease model mouse: combining gene expression, orthology, and disease knowledge graphs
title_short Federated SPARQL query performance evaluation for exploring disease model mouse: combining gene expression, orthology, and disease knowledge graphs
title_sort federated sparql query performance evaluation for exploring disease model mouse combining gene expression orthology and disease knowledge graphs
topic Database integration
Gene-disease association
Gene expression
Knowledge graph
Model organism
Ontology
url https://doi.org/10.1186/s12911-025-03013-8
work_keys_str_mv AT tatsuyakushida federatedsparqlqueryperformanceevaluationforexploringdiseasemodelmousecombininggeneexpressionorthologyanddiseaseknowledgegraphs
AT tarcisiomendesdefarias federatedsparqlqueryperformanceevaluationforexploringdiseasemodelmousecombininggeneexpressionorthologyanddiseaseknowledgegraphs
AT anacsima federatedsparqlqueryperformanceevaluationforexploringdiseasemodelmousecombininggeneexpressionorthologyanddiseaseknowledgegraphs
AT christophedessimoz federatedsparqlqueryperformanceevaluationforexploringdiseasemodelmousecombininggeneexpressionorthologyanddiseaseknowledgegraphs
AT hirokazuchiba federatedsparqlqueryperformanceevaluationforexploringdiseasemodelmousecombininggeneexpressionorthologyanddiseaseknowledgegraphs
AT fredericbbastian federatedsparqlqueryperformanceevaluationforexploringdiseasemodelmousecombininggeneexpressionorthologyanddiseaseknowledgegraphs
AT hiroshimasuya federatedsparqlqueryperformanceevaluationforexploringdiseasemodelmousecombininggeneexpressionorthologyanddiseaseknowledgegraphs